Extract

AEM is build on JCR (java content repository) and Apache Sling technology for it's REST services. For the extraction of data from AEM we make use of the various functions that AEM offers in it's REST API.


As a starting point of an extraction, it is usually a good idea to first have an understanding of which contenttypes are available. We can easily request this through the query builder using the following URL (try it in your browser!):

 

http://localhost:4502/bin/querybuilder.json?path=%2Fcontent%2Fgeometrixx&type=cq:Page&orderby=path&p.limit=-1&p.hits=selective&p.properties=jcr:path%20jcr:content/sling:resourceType


 

This example will give us a list with all content types including page count for each type in the default available demo site: /content/geometrixx.


If we actually want to get some page contents, we can use the following URL:

 

http://localhost:4502/content/geometrixx/en.3.json


 

Here we can adjust the number '3' to set the level of detail we want to get. If we want to get everything from that level and downwards we can even use 'infinity', but well, be carefull with that as it can easily become too big for both server and client to handle.


Import

Creating items in AEM works using the same mechanisms as fetching items, except that instead of using a GET request, POST and PUT requests are used. As a payload, a JSON or XML file in the exact same format as the ones produced in the extraction can be used to create the new items. Special care needs to be taken as AEM does not perform content validation and will accept any well-formed payload that is offered without checking if it actually makes sense.


Performance

The AEM REST API is lightweight and scales linearly with hardware. Single threaded we see a baseline extraction performance of 100 pages per second and up, and a baseline import performance of 70 pages per second and up.