This page describes a way to export pages data and file metadata from Hippo CMS 7 to MySQL, UDM (MongoDB) and some Excel reports. The attached zip file contains (apart from robots) example XML, which makes the connector fully demonstrateable for anyone.
- Installed software:
- Xill IDE 3.1 and the MySQL plugin
- MySQL server and some program to view the contents of your databases
- MongoDB and a program like RoboMongo
- MS Excel or another program that can view .xlsx spreadsheets
- For using the connector on the Hippo installation of your project, you will need access to its console
- XML and XPath
Unified Data Model (UDM) setup
The definition of content types and custom decorators is done in the robot /transform/ContentTypes.xill and the StandardDecorators robot is in the com folder. You can look up the details there.
parent. That way, we are able to save the complete node hierarchy. However, parent.id is now filled with the parent's Hippo ID, while it should (according to the newest standard decorator specifications) actually be the parent's UDM ID from MongoDB.
revision. NOTE: 'revision' got replaced by new standard decorators 'created' and 'modified' in the meantime.
hippodecorator has been defined to store the id, name and path of every item in the source. This id is also what's filled in for children in their parent.id field. You should be able to use the hippo decorator for any Hippo source system.
image. These decorators are project-specific, so you will need to make your own if you want to export the content types from another Hippo installation.
webdecorator. It contains general fields that most web pages of the demo project (can) have. This might also apply to your CMS, for the most fields. The web decorator contains the website name, introduction, related links and files, and also the page body. This body is stored in the
contentfield. Hippo CMS builds the page bodies from different types of components called content blocks, which is why the content field is of the LIST type. Scroll down to the Solution chapter to see how this content field is filled.
- Export XML from Hippo console to local export folder
- Split XML in individual page/file XML files and store basic information and structure in MySQL (after this, some reports can already be made)
- Transform from the individual XML files, with some help from the MySQL tables, to the universal data model
Export XML from Hippo console
contentnode at once is almost certainly impossible, but how deep must you go? Once you chose a node, the system will start building the XML file, and with a little luck present you with a 'save file' option soon. If it hits the limit though, you will get an error page and probably have to wait a bit before you can try again.
Since you probably can not extract the assets, documents and gallery nodes at once, Xillio's export connector has been made so that you are flexible to use the Hippo exporter from any deeper level, as long as you structure your export files exactly the way it is in Hippo. This means that if you extract the node 'ecer' (see picture), you have to save the xml file at the local path [project folder]/Export/assets/ecer.xml.
This way, the connector will be able to determine the correct parent-child relationships. This is also explained in the text file in the Export folder of the connector zip file.
saveBinary. Now all items can be opened individually from local file system, which has a couple of benefits, related to the working memory of Java software as well as the human mind. While opening an item when you know its id is now very straightforward and fast, you still need a way to easily see what item is where and how they are all related. That is why we also used a relational database.
Store overview in MySQL
Transformation to UDM
udmTemplate()is built to be re-used in later extractions from different CMSs. This is possible because it takes any XML node and a matching JSON template. The JSON template is filled with the decorator/field structure exactly like it should be entered in MongoDB, including instructions on where each field should acquire its content. Usually, this instruction is an XPath that
udmTemplate()can execute on the XML, but it can also be a 'special' field that is also given to the function as parameter. Furthermore, there is also the sub field support that we need for Hippo's content blocks.
Web.content and content lists
contentfield of the
webdecorator. Since this could be seen as the most complex UDM transformation step and this could also be easily modified for other hippo installations and even completely different CMSs, it is given its own sub chapter here. This is a small part of the JSON template, included in the zip file in the folder bots/JSON templates:
Let's go through these lines, step by step.
- 'content' is the field name inside the
- 'xpath' means that this XPath should be executed (by
udmTemplate()) to get the value of this field. By default, this value is what will be filled in in the field, except if there are any other (sibling) instructions for further processing. In this case, the result could for instance be a list of three content block nodes.
- '_contentList' means that the value of the previous step is an intermediate result, which should be broken down in sub fields. A list follows of all the possible blocks of sub fields. All coming XPaths will be executed on each result of the XPath above.
- 'type' holds the value of the content element type, which is in this case "text"
- '_condition' contains a mechanism to check whether a content element is of this specific type
- 'xpath' contains the XPath to execute for the check
- 'result' contains the string that should be compared with the result of the XPath. If they are the same, then the current content element is indeed of this ("text") type
- 'fields' contains all sub fields. They are processed like normal, elementary fields
- 'title' is one of the sub field names, and under that its XPath.
- 'body' is the other sub field relevant for the "text" content block, again with the XPath to get the content.
- The next content block type ("image") starts here. It has a different
_conditionresult and different fields, but it works the same. That is why you should be able to insert your content blocks too, even if the fields are very different.
- 'xpath' means that this XPath should be executed (by