Migration bliss

By Erik, Sun, 04/15/2012 - 20:39
I have recently been working on a few projects that involve migrating legacy data to Drupal. In the past, I have used node_import, node_export, user_import, feeds, and migrate (version 1.x) and each has its pros and cons. But this project had a little more complexity and some pieces that I wasn't sure of how to approach with any of these methods. So I decided it was time to dive into migrate version 2.x. Having used migrate-6.x-1.x for a couple of projects, I loved the way it worked. I was able to get everything done that I needed and rarely needed to extend it with any custom functions. That, plus the fact that I have not taken the full plunge into OO PHP programming. I understand the concepts and have dabbled, but I'm still not as facile at just getting things done. And since the Migrate UI is there for running imports, but not for defining them, you have to be willing to crack open a bit of code. But the allure proved too strong and I jumped in. Turns out that Migrate 2.x is sooooooo powerful that the extra brain cells to wrap my head around extending and classes was well worth it. As many have noted, even when Drupal itself isn't being really object oriented, many of the fundamental ideas of how it's built are the same. You build and then build on top instead of rewriting. To get started, I looked for some tutorials online. The migrate documentation is getting filled out and better all the time, but documentation almost always is a little daunting to start. I did find a good, concise tutorial at Gizra.com which in conjunction with the documentation led to a fairly quick start.

Step 1: Define a custom module

To get started with migrate v2, you will write your mappings and processes in a custom module, so I created a module in modules/custom/mymigration mymigration.info:
; $Id$
name = My Migration Tools
description = Migrate module helper functions
package = custom modules
core = 6.x
dependencies[] = migrate
mymigration.module: 2, ); return $api; } ?> Then with enabling this module and migrate (and its D6 dependencies), we are good to get started.
drush dl migrate autoload dbtng elements
drush en mymigration migrate
To understand a basic migration script, really reading through the Migration classes page of the documentation is helpful. For my particular migration, I had a couple of issues crop up:
  1. Source data was in a different database
  2. Query was more complex than a simple select
  3. CCK "link" didn't have a MigrateFieldHandler
  4. Source data needed to be massaged on way in
  5. Files already existed

Grabbing data from a different database

So, the source data was in a MySQL database on the same server, so I didn't feel the need to import it into the Drupal database. To grab this data, I created a new database connection in my settings.php file. Now, in the migration module, I can create my $query object using a new connection instead of the more common db_select() function: select('ITEM', 'old_items'); ?> In Drupal 6.x this is part of the required migrate dependency dbtng. One part that confused me for a bit is that the getConnection has a first parameter of 'default'. This is not the same as the $db_url['default']. In this case, it's the database "target" which is separate from the connection "key" and is mostly used in master/slave database instances. For basic usage, it should always be set to 'default'.

More complexitation in the query

The above $query is equivalent to the following sql statement for my database: SELECT * FROM `ITEM` `old_items`;, but we actually need to have the following statement built:
SELECT DISTINCT old_items.itemid, old_items.title, old_items.title, old_items.creator, old_items.identifier FROM `ITEM` `old_items` INNER JOIN `CATEGORY_ITEM` `c` ON old_items.itemid = c.itemid WHERE old_items.status <> 'deleted';
This translates to the following DBTNG code: select('ITEM', 'old_items'); $query->leftJoin('CATEGORY_ITEM', 'c', 'old_items.itemid=c.itemid'); $query->condition('status', 'deleted', '<>'); $query->fields('old_items' ,array( 'itemid', 'title', 'creator', 'identifier', )->distinct(); ?>

The missing link

One of the fields, 'identifier' is actually a CCK Link field in the new Drupal content-type. It turns out that link doesn't have a MigrateFieldHandler class yet. I found an issue in the link issue queue which had a patch for Drupal 7 link fields, but not for Drupal 6. So I wrote out a version for link-6.x-2.x-dev and posted it as a patch.
cd modules/contrib/link
wget http://drupal.org/files/issues/link_migrate-1010850-10-d6.patch
patch -p1 < link_migrate-1010850-10-d6.patch

Bliss with a good massage

The identifier field in the original database had links to the actual resource of that the node was describing. For us, this was either an external URL or an internal URL pointing at a file (pdf, ppt, etc.). We wanted to migrate the internal resources to be actual attachments as a filefield while maintaining the external URLs in a link field. The trick is to use Migrate's prepareRow() to affect the data. Typically to map a field, you add a field mapping: addFieldMapping('drupal_field_machine_name', 'source_data_field'); ?> For the two fields, we want to use pieces of the same source: addFieldMapping('field_lr_location', 'identifier'); $this->addFieldMapping('field_lr_file', 'identifier'); ?> But what we really want is to create new fields on the $row before the import happens and fill them with the correct data. identifier, 0, 31) == 'http://www.oldsite.org/library/'){ $row->loc_file = trim(substr($row->identifier, 31)); $row->loc_url = ''; } else { $row->loc_url = trim($row->identifier); $row->loc_file = ''; } } ?> Then in the mappings, we can use the new "fields". addFieldMapping('field_lr_location', 'loc_url'); $this->addFieldMapping('field_lr_file', 'loc_file'); ?>

Thanks, I've already got my files

Trying to sort all of this out, I had already set up my files/library folder to accommodate all of the PDFs etc. that would be attached to the nodes and set the filefield to store them there. Also, I had copied over all of the necessary PDFs and didn't need to move/copy the files again, just to create the files table records and include the new fids into the nodes. When creating the arguments for the filefield mapping handler, you specify two parameters, where the files should go and which "file function" you want to use. Most of the examples I read used either 'file_copy' or 'file_move' which basically do exactly what you'd expect. But there are a few more including 'file_link' which is described in MigrateFileFieldHandler()->prepare() in fields.inc as: So my full mapper for the file field is: addFieldMapping('field_lr_file', 'loc_file') ->arguments($arguments); ?>