Introduction

Before reading this document, it is recommended to read the Introduction to OpenText Content Server first.

This document describes the setup and the solution used for building the OpenText Extract Connector. The connector is capable of creating a generic data model to accommodate the storage of OpenText data into a MongoDB instance, recursively extracting all the objects from an OpenText hierarchical structure, storing associated object metadata, version extraction, exporting the permissions, the audit trails and the member information. As an addition to the OpenText Extract Connector, a set of Xill robots was created to generate Excel reports with valuable aggregates for analysis. The transformation of the extracted records to allow the loading into a target system is outside the scope of this document.

Prerequisites

In order to successfully run the OpenText connector scripts, the following requirements must be met:

  • The scripts code must be opened in Xill IDE 3.0
  • A MongoDB instance must be running locally on the standard port (127.0.0.1:27017)
  • Access to an OTCS (OpenText Content Server) user environment is required
  • Access to the Content Server API is required; the following REST API urls can be used in Windows: http://{servername}/OTCS/cs.exe/api/v1, http://{servername}/OTCS/llisapi.dll/api/v1/
  • Administrator privileges for the OTCS are required in order to have access to the user account information and object permissions

Before running any of the connector extract robots, the access credentials to the OpenText Content Server must be set in the loadSettings() routine of the"settings.xill" file:  

 "username" : server username, "password" : server password 

  The extraction phase starts by recursively crawling one or several OpenText root items. Therefore, these entry point(s) for the crawler must be specified beforehand in the loadSettings() routine of the"settings.xill" file.  

"roots" : ["152504", "2000", "135985"] 

  Unified Data Model Setup

Most of the decorators created for the OpenText Extract Connector are standard decorators. In addition to the standard decorators, a separate decorator is created for OpenText-specific objects. 

var opentext = {
"id" : {
"type" : "NUMBER",
"required" : true
},

"type" : {
"type" : "NUMBER",
"required" : true
},

"typeName" : {
"type" : "STRING",
"required" : true
}
}; 

The "opentext" decorator contains an "id", a "type" and a "typename" field. Every OpenText object has an "id" attached to it at its creation. Objects have a numeric id in OpenText. Moreover, every OpenText object is assigned to a "type". The type has a numeric id and a name. A complete list of OpenText object types is attached as appendix to this connector's documentation. The decorator definitions are used to define the following content types: "opentextfile", "opentextfolder", "opentextitem", "opentextcategory", "opentextuser" and "opentextgroup".

As discussed in the introduction to OpenText Content Server, the OpenText Content Server provides a series of OpenText-specific content types that are normally designed for use inside OpenText. The "opentextcategory" is such an example. It is set in UDM as a separate content type, "opentextcategory", since it facilitates the storage of the extracted metadata in MongoDB. Except for the "opentextcategory", all the special OpenText content types are stored in MongoDB by using the "opentextitem" UDM content type. The reason for that is twofold. First, it is difficult to create separate UDM content types for various special OpenText items that would be generic enough to be used without requiring changes when importing to different target systems. That is because what is being extracted as content of the special OpenText item is heavily dependent on the content type from the target system it is imported into. Secondly, a generic content type for all OpenText special items offers enough flexibility to be able to perform an extraction without being dependent on the target system specifications.

Solution

After setting up the Unified Data Model, the following solution steps are followed:

  • Setup of the local disk cache
  • Extraction of the OpenText object content to the local disk cache
  • Extraction of the users & groups information 
  • Load of the OpenText content to UDM
  • Generation of Excel reports for analysis purposes 

The "00b - initialize local filesystem.xill" robot creates a local cache folder to store all assets. A provision has been made in the code that requires one to re-run the disk cache setup in case the "settings.xill" file is changed.If present, old cache files will be deleted to avoid side-effects from mixing extractions performed using different settings files.This is an option that can be disabled by setting the settingsHashChangeCheck to false.

Once the local disk cache is setup, the OpenText content is being extracted and stored in the created local folder. The extraction is realized by means of REST API calls. Since the extraction is not possible without authentication, the first REST call authenticates a user with the provided credentials and returns a OTCS ticket if the user is authorized. The code below shows the OpenText authentication process. The code can be found in the "commonsOT.xill" robot.

function createNewTicket() {
var tries = 10;
var otheader = "a";
while (tries > 0) {
tries = tries - 1;
var body = "username=" ::settings().username :: "&password=":: settings().password;
var contentType = "application/x-www-form-urlencoded";
var response = REST.post(settings().apiUrl:: "auth", {"headers": {"timeout": "3000", "Content-Type": contentType}}, body);
otheader = response.body.ticket;
if (String.length(otheader) == 64){
tries = -1;
break;
}
}




if(String.length(otheader) != 64){
System.print("Getting opentext ticket failed after 10 tries! is your network connection up?", "error");
}
var ticket = {
"expires" : Date.change(Date.now(), settings().ticketTTL) ,
"otheader" : otheader
};
return(ticket);
}

The ticket returned by the authentication REST call is subsequently used in all the REST calls that are required to perform the extraction. The extraction is executed in two phases: first, the content from the OpenText environment is extracted to a local disk folder, subsequently the content extracted to the local disk is stored into UDM tables on a MongoDB instance. There is a big advantage of performing the extraction this way. The extraction of the content from the OpenText Content Server is an expensive operation. However, if extracting the content to the local disk, there is no need to run the extraction twice if a change to the data model is required.    

Robot 01 - store migration to files to filesystem.xill stores on a local disk folder the content of all the OpenText content types found under the OpenText root folder(s). For each OpenText content type, a number of json format and xml files are stored containing different kind of information regarding the OpenText content type. For example, a "nodeinfo" file contains in a json format the basic metadata of an OpenText item: the creation date, the author, the name, the type of the OpenText object, etc.

{
"create_date" : "2015-12-03T03:10:02",
"create_user_id" : 119809,
"description" : "",
"description_multilingual" : {
"en" : ""
},
"guid" : null,
"icon" : "/img/webdoc/folder.gif",
"icon_large" : "/img/webdoc/folder_large.gif",
"id" : 152504,
"modify_date" : "2016-01-28T02:14:28",
"modify_user_id" : 119809,
"name" : "FINANCE",
"name_multilingual" : {
"en" : "FINANCE"
},
"owner_group_id" : 1001,
"owner_user_id" : 119809,
"parent_id" : 135985,
"reserved" : false,
"reserved_date" : null,
"reserved_user_id" : 0,
"type" : 0,
"type_name" : "Folder",
"versions_control_advanced" : false,
"volume_id" : -2000
}

Except for the "nodeinfo" file, a OpenText item may also have a "nodeversions", "nodeaudit", "nodepermissions" and "categoryinfo" files extracted. That is, except for the basic metadata, the object-specific category metadata is being extracted into the "categoryinfo" file. The "nodeversions" file includes the list of all document versions together with version-related information: version id, version number, document name and others. The "nodeaudit" file enumerates all the audit events performed on the document, for example: version added, version opened, renamed, etc. The ACL (Access Control List) is also provided in the "nodepermissions" file. The ACL contains all the permissions attached to an object: see, create, delete, edit, modify etc. Regarding permissions, it has to be mentioned that they were not extracted with a REST API call, since the REST call does not provide a complete ACL list, but by using the Livelink's XML Export feature. Below a code snippet is provided showing the extraction of permissions with the XML Export.

var permissionSource = "";
var nodePermissionUri = nodeUri :: ".nodepermissions";
var permissionUrl = contentServerUrl :: "?func=ll&objAction=XMLExport&objId=" :: nodeID :: "&nodeinfo&permissions";




Web.download(permissionUrl, nodePermissionUri, loginPage);
permissionSource = File.getText(nodePermissionUri);

Further, all associated binary files are stored in a folder called "binaries". The content of the OpenText special items (content types that are designed especially for use in OpenText) is also stored in xml files in the "content" folder. As such, when creating a similar content type in a new system, the relevant content information can be easily retrieved from the "content" folder.

As a last remark for the extraction, the extraction robot is also endowed with resume functionality. The implemented crawler is capable of resuming an incomplete extraction operation by only crawling folders that have not yet been completed.

Robot 02 - store user data to MongoDB.xill extracts information regarding user accounts and user groups from the OpenText system. This information is directly stored in MongoDB using the "opentextuser" and "opentextgroup" content types.

Robot 03 - store migration data to MongoDB.xill stores the extracted OpenText content into the database. The robot traverses recursively the cache folder and processes the corresponding xmls and json format files. It reads the information from every file, selects the relevant data and stores it in MongoDB by applying the appropriate UDM definitions (see the code below).

if (versionInfo != null){
if (File.exists(hashLocation)){
hash = Hash.fileToMD5(hashLocation);
}
var currentversion = 
{
"version" : versionNumber,
"opentext": {
"id" : nodeInfo.id,
"type" : nodeInfo.type,
"typeName" : nodeInfo.type_name
},
"document": {
"title": nodeInfo.name,
"created" : Date.parse(nodeInfo.create_date, "yyyy-MM-dd'T'HH:mm:ss"),
"modified" : Date.parse(nodeInfo.modify_date, "yyyy-MM-dd'T'HH:mm:ss"),
"createdby" : nodeInfo.create_user_id, 
"modifiedby": nodeInfo.modify_user_id 
},
"file" : {
"parentid" : parentID,
"parentpath" : parentPath,
"name" : nodeInfo.name, 
"extension" : versionInfo[(versionInfoLength)-1].file_type,
"size" : versionInfo[(versionInfoLength)-1].file_size,
"hash" : hash
},
"migration" : {
"sourcesystem": settings().sourceSystem, 
"action": "bot02",
"timestamp": Date.now()
},
"author" : {
"userid": authorID,
"name": authorName,
"firstName": nodeInfo.first_name,
"lastName": nodeInfo.last_name,
"email": authorEmail
},
"permissions" : {
"users" : userPermissions,
"groups" : groupPermissions
},
"audit" : {
"events" : auditEvents
}
};




var ID = createDocument("opentextfile", currentversion, versions);
...
}

There is one exception when it comes to storing OpenText Content Server data into MongoDB. Unlike basic metadata, the category metadata is stored by defining category decorators on-the-fly. This is done because it is difficult to find a standard way of defining a category. The number of attributes in a category as well as the type of the category attributes vary with every new category. Therefore, when the category metadata of a document has to be stored in the database, a decorator is created on the spot.

The Xill IDE Reports folder contains a set of Xill robots used for analysis purposes. It is very convenient to use data stored in UDM to generate a meaningful overview on the extracted content. UDM is designed to be a generic data model and therefore, most of the code created in these robots can be used not only for analysis of the OpenText Content Server content, but also for analysis of the content of other systems ( for example, Fileshare analysis). The results of the analysis are mostly obtained by aggregating the data extracted from the Mongo database (see the example below).

var results = Mongo.aggregate("documents", 
[
{ "$match": { "contenttype": fileContentType } },
{ 
"$group": {
"_id": "$source.current.file.extension", 
"count": { "$sum": 1},
"size": {"$sum" : "$source.current.file.size"}
}
}
]);
var kb = 1024;
var mb = kb * kb;
foreach (i, result in results){
Excel.setCell(sheet, 1, i + 2, result._id);
Excel.setCell(sheet, 2, i + 2, result.count);
Excel.setCell(sheet, 3, i + 2, result.size/mb);
}

At the end, all the analysis results are stored in Excel sheets. Since it is very helpful to visualize the aggregate results by means of graphs, but the XILL IDE does not yet support the creation of Excel graphs from the code, an Excel sheet template is being used with empty graphs by default. The graphs are populated with data by running the Xill robots from the Reports folder.      

Closing remarks

The connector presented in this document provides an easily configurable and powerful solution for extracting content from OTCS (OpenText Content Server) environments. However, it contains (on purpose) restricted functionality when it comes to processing information contained within non-standard OpenText objects, such as Agendas, Channels, Discussions, Task Lists  etc. The processing of these objects has to be tailored accordingly for different target systems. It is only logical to add a complex processing functionality for these source objects when the target content types of the migration are known. If you are also interested in how content can be imported in OTCS, it is recommended to read further the Importing to OpenText Content Server Connector document.