This article contains some information that has proven relevant in migration projects involving Drupal 7 as source- as well as target system. More information on this subject might be added over time and can also be requested with a ticket on this website.

Description

Drupal is a widely used open source CMS with a programming-free (basic) setup and many modules to extend its functionality. It has an active developing community and there is much to read about it on the Drupal website and several other sources. For an introduction, you could also start at the Wikipedia page about Drupal.

Content Types

For a content migration, it is obviously important to know the structure of the content you are exporting or importing to. If you have a user account with the proper rights in a Drupal CMS, you can view (and change) the content types and all their fields in the Content types page, which you can access via the Structure page or the url /admin/structure/types. Information you can find there include:

  • The field names (the 'nice' names and their back-end names)
  • The type of content each can hold, like a single text line, a date, a reference to a page of a certain type etc.
  • The set of possible values if it is an enumeration type
  • The vocabulary of terms if it is a taxonomy field
  • The amount of allowed values per field

Building connectors for migration

Connecting with Drupal's content can be done in different ways. There are several Drupal apis and modules available, and it's also possible to extract manually from a MySQL database dump. Xillio has imported content in Drupal with the Migrate module, which you can read about in the Drupal import connector. We have also extracted directly from a Drupal database. Here is a little explanation about that, because there is no full export connector page on this website yet.

Extracting from MySQL

There are probably a couple of hundred tables in your Drupal database. The starting point is usually the 'node' table, because it has a row for every page, along with some core information like title, node id (nid), content type, language, status and creation date. Every page has these fields, so they are not content type dependent. The contents of the other fields are usually stored in tables that are named table_data_field_[technical field name]. For fields with a taxonomy type, a tid (taxonomy term id) will be stored there, in a field named something like field_tax_keywords_tid. In the tables that start with taxonomy_ you should find all the information about what terms of what type there are and what the hierachy is. So by the nid and the tid you will be able to extract what page has what taxonomy term for a specific field. For other referencing fields, this works similarly, while the content of a free text field like a description should be found directly in a varchar/text type column of the relevant field_data table.