For this tutorial we will use a simple filesystem crawler that stores the information it finds in the UDM. Before you head off and start working on the code, please check that you meet the following two requirements:

  1. MongoDB is installed on your system
  2. You have a tool such as RoboMongo to inspect the data in the database

Now, create a new robot and copy-paste the following code and change the value of "startFolder" to an existing path on your machine.

use System, File, Document, ContentType;

var startFolder = "/path/to/some/files";

initializeDecorator();
initializeContentType();
crawlDocuments(startFolder);

function initializeDecorator() {
	ContentType.decorator("file", {
		"size" : {
			"type" : "NUMBER",
			"required" : true
		},
		"uri" : {
			"type" : "STRING",
			"required": true
		}
	});	
}

function initializeContentType() {
	ContentType.save("document", ["file"]);
}

function crawlDocuments(folder) {
	foreach(file in File.iterateFiles(folder, true)) {
		System.print(file);
		var document = Document.new("document", {
			"file" : {
				"size" : File.getSize(file),
				"uri" : file,
			}
		});
		Document.save(document);
	}	
}

Breakdown

Let's quickly break down what this robot is doing. As you can see we use a number of plugins, relevant of which are ContentType and Document:

  • ContentType: This plugin lets you define (and store) content types
  • Document: This plugin lets you save (and retrieve) documents

The robot starts off with running initializeDecorator(). In this function we define the fields "size" and "uri", and group them in a decorator called "file". Next, initializeContentType() is called, which defines the content type "document" to consist of one decorator, which is "file".

Finally, crawlDocuments() is called, which recursively loops over all documents in the specified folder and uses Document.new() and Document.save() to respectively create and then save documents to UDM.

Validation

One important aspect of the UDM is that it applies data validation. In this example we have told the UDM that the field "size" is supposed to be a number. Let's change the file size value we try to save from "size" : File.getSize(file), to "size": "really BIG",

Now, run again! The robot will stop running and you should now see the message: "Input is not a valid document. Validation failed: Expected field [size] to be of type NUMBER". Hurray! They validation worked as expected. Ok, revert back to the working example because we have more to show to you.

A second thing the validation will do is that it checks wether you have supplied all expected fields and all expected decorators. Try it out!

Optional fields and decorators

The nice thing is that while decorators are enforced strictly, you actually still have a lot of freedom. For instance you can add an extra field to a decorator. It wil be treated as an optional field and validation will succeed no matter the value:

var document = Document.new("document", {
			"file" : {
				"size" : File.getSize(file),
				"uri" : file,
				"extension": "pdf"
			}
		});

Likewise you can also add your own custom decorators. This is particularly handy for adding processing information that is not available right at the start. In all cases: try to use validation as much as possible to keep your database clean!

Document structure

While we have already saved a document, we have forgotten to take a look at what is actually stored in the database. The Document.new() function wraps the information you provide into an object that complies with the UDM. Let's extend the example with a new function:

function saveOneDocument() {
	var document = Document.new("document", {
		"file" : {
			"size" : 1000,
			"uri" : "/folder/document.txt"
		}
	});
	var id = Document.save(document);
	return(id);
}

Now, put a breakpoint on the last line of the function, and make sure the function runs, by adding a call to it in the code. Run it in the debugger and inspect the document variable. You will see the following structure: 

{
	"contentType": "document",
	"source" : {
		"current" : {
			"file" : {
				"size" : 1000,
				"uri" : "/folder/document.txt"
			}
		},
		"versions" : []
	},
	"target" : {
		"current" : {
			"file" : {
				"size" : 1000,
				"uri" : "/folder/document.txt"
			}
		},
		"versions" : []	
	}
}

 Let's start at the root level. There are three entries: contenttype, source and target:

  • contenttype states which contenttype the item has. It is the same as you used in the ContenTtype.save construct
  • source is the representation of the data the first time we saved it to database
  • target is the representation of the data we are working on

The idea behind source and target is that you will always want to be able to view the original representation of the document as you extracted it, even after doing operations on it. That's the purpose of the source entry. It is filled the first time you save the document and should not be touched again after. You should always perform your transformations/modifications on the target entry. The Document.new() function will automatically create a copy of your data in both source and target.

At the second level we see two entries: current and versions. Inside current you see the data you actually entered, but versions is an empty list. Generally you will mostly be using the "current" entry, but when you have a system that supports versioning of documents, the version entry is used to store the versions of the document. You can directly initialize a document with versions using the optional third parameter of the Document.new() function.

Fetch a document

A datamodel is not much use if you can't fetch information that you have previously stored. The simplest way to get a document is by ID. Whenever you save a document, the ID of that document is returned to you:

var id = Document.save(document);

You can later on use that ID to fetch the document from the database:

var newDocument = Document.get(id);

Let's extend the previous example with a new function:

function editDocument(id) {
	var document = Document.get(id);
	document.target.current.file.size = 2000;
	Document.save(document);
}

Change your code such that it first creates a new document using the saveOneDocument() function, and then use the id you received to call the editDocument() function. When you now run your code, a new document is created, then using the id, the document is again fetched from the database, the file.size property in the target entry is updated (remember: we don't edit the source entry!) and the document is saved again.

Fetching multiple documents

In a lot of situations you will not have the id of a document readily available, let alone of all your documents. So instead of fetching a single document by ID, you can actually fetch all documents using the Document.find() function:

function findAllDocuments() {
	foreach(document in Document.find()) {
		System.print(document.target.current.file.uri);
	}
}

You can also provide your own criteria to the find function so that it only returns documents that match those criteria. In the following example we look up all documents for which the file size is greater-or-equal (gte) than 100.000.

function findLargeDocuments() {
	foreach(document in Document.find({"target.current.file.size": {"$gte":100000}})) {
		System.print(document.target.current.file.uri :: " - " :: document.target.current.file.size);
	}
}

See the MongoDB query documentation for reference on how the queries work. Please note that the special "ObjectID()" function shown in the mongo documentation cannot be used in Xill queries.