The extraction of Corsa can be split into 2 parts, database and document extraction.

Database extraction

Since the document management system Corsa is using a relational database, extracting data from it involves tying columns together. In many tables there is usually an object id and an object type. The object type can be poststuk, dossier, case, etc. Each object id is unique for each type. This means that the same object id can be used for multiple object types.

Document extraction

Documents are stored on a fileshare by the Document Server. The directory structure is usually as follows (depends on configuration):

$share$/ds_files/<database name>/<parent type>/<file category>/<hashed folders>/<hashed filename>.<extension>

The parent type is either S (poststuk) or A (agenda). The file category is arc (pdf), nat (original) or ocr. The hashing is done by converting the object id to a hexadecimal value. Before converting, for poststuk the object id has to be 10 characters long and for agenda it has to be 30 characters long. Any object id that has less characters, needs spaces prefixed to the object id. The hashed value has the version number of the document, minus the first 0, appended to the hexadecimal value. After that the extension is appended and the entire value is split into chuncks of 8 hexadecimal characters to create the hashed folder and filename.

As an example, let's take a poststuk that has object id 10IN12345 and a TIF document with version 0001.

Step 1: prefix the object id with spaces until it has a length of 10 characters. Result: 10IN12345
Step 2: convert to a hexadecimal value. Result: 203130494e3132333435
Step 3: append the document version number without the first 0 and its extension. Result: 203130494e3132333435001.TIF
Step 4: split into chunks of 8 characters. Result: 20313049/4e313233/3435001.TIF

Now we can generate the paths for both the original and archive copies, namely:
$share$/ds_files/ds_prod/S/nat/20313049/4e313233/3435001.TIF and $share$/ds_files/ds_prod/S/arc/20313049/4e313233/3435001.PDF


There is a set of robots available which are attached as a zip file to this article.
At this moment these robots do use the UDM, but they do not use it according to the UDM design philosophy. We are in the process of rewriting them, but still believe this will be a good set of robots to get started with.