Building a custom migration in Drupal 8, Part 6: Custom Source Plugins

In the last post, we went beyond simple node migrations and leveraged the power of the Drupal 8 migration system to break up and reorganize our content into Paragraphs. We created a separate migration for each source field we wanted to convert into a paragraph entity, and then an additional migration that created nodes using the paragraphs. We used psudofields and some creative use of plugins to migrate things just the way we wanted. It sounds like we've handled everything, but we can still go further.

Refining our content model

At the beginning of the series, we did a content audit to uncover how we really used our site, not just how we thought we used it. Sometimes this means that older content types go away, others are merged together, and some are broken apart. For my site, I discovered that most of the images on my site belonged to a the Picture node type. Instead of being singular in purpose, however, I discovered instead that I was using it in three very different ways:

  • As a standalone photo gallery -- mostly pictures from when I traveled a lot for business.
  • As a piece of artwork.
  • As an attachment for another node -- which was a good idea back in Drupal 4.7!

Earlier in this series, we created a migration for Picture nodes that imported everything into new Gallery nodes. This worked, but it didn't solve the underlying problem. The Gallery node type is best suited to the first case as a standalone collection of photos. Almost half of the Picture nodes on my Drupal 7 site were of artwork. Furthermore, they were difficult to find on the site as there was no way to differentiate them from regular photos or attachments. What my content audit suggested instead was to create a new content type specifically for artwork -- Creations. Instead of a single image field, the Creation node type would rely on multi-media Paragraphs. 

This left me with the following migration targets:

  • Picture nodes that were photo galleries would be migrated as Gallery nodes.
  • Picture nodes that were artwork would be migrated as Creation nodes.
  • Picture nodes that were used as attachments would be made into image paragraphs in their parent content type.

Given the nature of how pictures were attached to content on my site, I decided to only focus on the above two cases, leaving the last for a manual, post-migration process. We already have a migration for the first case too, although we'll need to modify it as we break up the old Picture node type. 

Dividing a source node type

It's a simple matter to create the separate destination types in Drupal 8. You only need to point and click. Separating those two types in a migration, however, was a more complex matter. During our audit it was easy for us to see and feel what each Picture node should be on the new site. The Migration system, however, needs something more concrete. There's a lot of different ways to do this. You could create a static list of all the nodes to migrate to each type, but that creates a maintenance problem if new content is still being added to your old site. A better approach would be to augment your existing Drupal 7 content so each node can self-identify as their target Drupal 8 content type. 

How can you make a node self-identify? With a field, of course! The most exacting approach would be to create something like a text list field with a value for each target content type. This would be the most precise, but it would require a combination of sensible defaults, scripting, and manual processes to accomplish. Editing some 400 Picture nodes manually is an arduous task, but it could be done. If you have a lot of nodes to migrate, this might not be practical. In that case, it might be best to take advantage of something you already have done. On my site, it turned out the majority of Picture nodes that were artwork were tagged as...well, "art". I just needed to make sure the remaining Picture nodes that were pieces of art were tagged as such. This took a couple of hours, but it was ultimately less time than editing each and every one.

Once we have a concrete and machine friendly way to identify if a Picture node is to be a Gallery or a Creation, we need to (re)build our migrations. Your first thought might be to leverage the fact that the type in a migration is treated like any other field. In theory, you could use field_tags of the Picture node type to select the type field of our target node type through a combination of an iterator and a static_map plugin. That alone is tricky, but what if we need different field mappings for our target node types? The Gallery node type relies on base fields, but Creations relies on Paragraphs. Now our migrations need to be vastly different for each target node type. 

Custom source plugins

The problem is that the D7_node source plugin can only restrict the content it provides by type, it can't filter by additional criteria such as field value. If we could filter by field value, our migrations become so much simpler. We could create a separate migration for each filtered type, the drush ms command would return an accurate number of items to migrate, and the migrations would process less items needlessly.

Fortunately, it's not difficult to create a custom source plugin in Drupal 8. We only need to subclass the D7_node source plugin we've been using all this time, adding the field-filtering functionality we need. First, let's open the D7_node​ source plugin class again to see how it works: <drupal_root>/core/modules/node/src/Plugin/migrate/source/d7/Node.php​ 

There's a lot going on in this class, but the most important part for us is the query() method. This method uses the Database API to prepare a select query that is later used by Migrate to fetch the actual fields from the database. Notice that unlike a normal database query -- where we use db_select() or an injected service -- we use the select() method on the base class. Why? Remember all the way back to the beginning of the series. We set up a database for our Drupal 7 site that was parallel to our Drupal 8 one. By using the select() method on the base class, the migration always uses the correct source database. 

Now that we have a better idea on how the D7_node source plugin works, we can create our own:

  1. Under <drupal_root>/modules/custom/ create a new module named yoursite_migrate. Drupal Console can help make this step much easier.
  2. Under <drupal_root>/modules/custom/yoursite_migrate, create directories along the following path: src/Plugin/migrate/source.
  3. Create a new file: <drupal_root>/modules/custom/yoursite_migrate/src/Plugin/migrate/source/D7NodeWithTerm.php
  4. Open the new file and create a new PHP class in it, D7NodeWithTerm:
namespace Drupal\yoursite_migrate\Plugin\migrate\source;
use Drupal\node\Plugin\migrate\source\d7\Node as D7_node;

/**
 * Drupal 7 node source from database.
 */
class D7NodeWithTerm extends D7_node {
  /**
   * {@inheritdoc}
   */
  public function query() {
  }
}

Notice that we imported the D7_node source plugin with an alternate class name. This way we won't confuse ourselves about with class named "Node" we're using. Next, we need to use a unique annotation for Drupal 8 to pick up the new source plugin, so add that to the class docblock:

/**
 * Drupal 7 node source from database.
 *
 * @MigrateSource(
 *   id = "yoursite_node_with_term",
 *   source_provider = "node"
 * )
 */

Writing the query method

With the module created and the class filled out, we can focus on the core of our source plugin, the query() method. Let's first start by getting the required fields from the node and node_revision table:

$query = $this->select('node_revision', 'nr');
$query->innerJoin('node', 'n', static::JOIN);

$query->fields('n', [
    'nid',
    'type',
    'language',
    'status',
    'created',
    'changed',
    'comment',
    'promote',
    'sticky',
    'tnid',
    'translate',
  ])
  ->fields('nr', [
    'vid',
    'title',
    'log',
    'timestamp',
  ]);

Both the node and the node_revision table have a uid​ field. Since revisions may be authored later by different users, the values may not be the same. As such, we use column aliases to keep them unique:

$query->addField('n', 'uid', 'node_uid');
$query->addField('nr', 'uid', 'revision_uid');

The D7_node source plugin had the ability to filter results based the node_type. The value was passed to the source plugin as a parameter in the migration *.yml. Source plugins are provided these parameters in the configuration class variable, keyed by the parameter name. To make our source plugin filter by the node type, we need to add a database query condition against the node table's type field:

if (isset($this->configuration['node_type'])) {
  $query->condition('n.type', $this->configuration['node_type']);
}

Finally, we need to return the query for Migrate to use when fetching rows:

return $query;

At this point,we could enable our new module in the Drupal UI. Provided there are no errors, we should now have a working, custom source plugin!

Adding the term filter

Setting up the module and creating the basic source plugin was monotonous, but now we get to the hardest part. We need to modify our database query so that it only returns rows with a given tag. This part gets a little complicated if you aren't familiar with the Drupal 7 database. 

In Drupal 7, the tags for a node are stored in a field. Each field for a node corresponds to a unique table named after the field's machine name. For a field with the name field_tags, there's a table named field_data_field_tags that contains the data we want. (There's also a field_revision_field_tags​ table, which we can ignore as we are only migrating the most recent revision.) The field's table contains two important columns for us, the entity_id, and the field_tags_tid. Combined, these tell us if a node is tagged with a particular term or not. 

Waitaminute, what about the term name? Do we have to use the term ID (tid)? Fortunately no. The term name is stored in the taxonomy_term_data​ table under the name column, keyed by the tid. So, the chain goes node.nid to field_data_field_tags.entity_id. Then field_data_field_tags.field_tags_tid to ​​taxonomy_term_data.tid. And finally, ​taxonomy_term_data.tid to taxonomy_term_data.name. Now that we know how to link the term name to a node, we can add that to our source plugin.

Right now we're already doing an INNER JOIN on the node_revision and the node table in our source plugin:

$query = $this->select('node_revision', 'nr');
$query->innerJoin('node', 'n', static::JOIN);

To bring in terms, we add another INNER JOIN with the tag field data table, using the first link in our chain of tables:

$query->innerJoin('field_data_field_tags', 'tn', 'n.nid = tn.entity_id');

To keep things short, we alias the field_data_field_tags table as tn​. To bring in the term name, we do another INNER JOIN, this time on the taxonomy_term_data table:

$query->innerJoin('taxonomy_term_data', 'td', 'tn.field_tags_tid = td.tid AND nr.vid = tn.revision_id');

Neither of these JOINs actually let us specify what term to filter on. We've only brought in tags into our result set. To specify which tag to filter on, we need to add another condition just before the return statement of our query() method:

if (isset($this->configuration['term'])) {
  $query->condition('td.name', $this->configuration['term']);
}

That's it!

Using the custom source plugin

Now that we have our custom source plugin, we can put it to use. We know that anything that's tagged with the art tag needs to be migrated as a Creation node, and not a Gallery node. So, we create a new migration, migrate_plus.migration.yoursite_creation.yml. I won't go over the Creation migration​ in detail since you've seen similar things already. The important part is where we use our new source plugin:

source:
  plugin: yoursite_node_with_term
  node_type: picture
  term: "art"

Notice the additional parameter, term. This is passed to our custom source plugin and stored in the class' configuration class variable. Do a drush cim to import your changes. If we check drush ms, we should not only see our new Creation migration, but it should also have the exact number of nodes that are tagged as art

$ drush cim -y
...
$ drush ms

Group: YourSite group (yoursite)    Status  Total  Imported  Unprocessed  Last imported       
yoursite_role                      Idle    4      0         4            N/A
yoursite_user                      Idle    18     0         18           N/A
yoursite_file                      Idle    507    0         507          N/A
yoursite_gallery                   Idle    134    0         134          N/A
yoursite_creation                  Idle    60     0         134          N/A
yoursite_blog_text_paragraph       Idle    1050   0         1050         N/A
yoursite_blog_image_paragraph      Idle    1050   0         1050         N/A
yoursite_blog                      Idle    1050   0         1050         N/A

Handling the inverse case

So now we have a new migration for Picture nodes tagged as art that are migrated as Creation nodes. This works, but now we have a problem with our Gallery migration. Since it used the D7_node source plugin for Picture nodes, it would import both art and non-art tagged Pictures as Galleries, resulting in duplicated content! This is definitely not what we want. While we could tag the remaining Pictures in our Drupal 7 site as..."not-art"...there's always the possibility we'll miss a node somewhere. What we need instead is another custom source plugin, only this time it provides rows for content that is NOT tagged with a given tag.

Instead of starting from scratch like we did with our D7NodeWithTerm class, let's just copy that class into a new file, and rename it to D7NodeWithout Term. Make sure to rename the file accordingly. 

When trying to write the query for the inverse case, I had a lot of difficulty. There didn't seem to be an explicit way to say "JOIN these tables only if it doesn't have this value". Instead, what I found that worked was to use a a different sort of JOIN, a LEFT JOIN. INNER JOIN provides a result set where the table on the "left" and the table on the "right" share a common, linking value. A LEFT JOIN provides *all* of the left table's row, only appending the right table's rows to rows where it has a value in common with the left table. Still confused? This issue on Stack Overflow has an excellent image that demonstrates different database joins.

First, modify the start of our query() method to do a LEFT JOIN on the field_data_field_tags table:

$query = $this->select('node_revision', 'nr');
$query->innerJoin('node', 'n', static::JOIN);
$query->leftJoin('field_data_field_tags', 'tn', 'n.nid = tn.entity_id AND tn.field_tags_tid = ' . $this->configuration['tid']); 

Note the condition of our LEFT JOIN. We link the node table's nid to the field_data_field_tags table's entity_id like we did before. We also specify that the tid needs to correspond to something passed to the source plugin by configuration. For some reason, I couldn't sort out how to get the join with taxonomy_term_data to work, so we're stuck to using the term's ID rather than it's name. We continue by selecting all the fields like we did before:

$query->fields('n', [
    'nid',
    'type',
    'language',
    'status',
    'created',
    'changed',
    'comment',
    'promote',
    'sticky',
    'tnid',
    'translate',
  ])
  ->fields('nr', [
    'vid',
    'title',
    'log',
    'timestamp',
  ]);
$query->addField('n', 'uid', 'node_uid');
$query->addField('nr', 'uid', 'revision_uid');

But the above doesn't completely work. If the left table is our node table, and the right table is field_data_field_tags table, a LEFT JOIN just brings in all of the node table's rows. We're no better off! You're right, to fix this problem we need to use a bit of a trick. When you do a JOIN between two tables, the result is effectively another table that has all of the left and right table's columns. For LEFT JOINs, the columns from the left table will always be populated. The columns from the right table, however, will be NULL when there isn't a matching value. We can exploit this to create our source plugin.

What we do is add a condition that only allows for the tid column to be NULL. This discards all of the results of our LEFT JOIN where the right table *does* have a value. This leaves only node table entries that don't have a our tag.

$query->isNull('tn.field_tags_tid');

We finish our query() method with a node_type condition and a return like we did before:

if (isset($this->configuration['node_type'])) {
  $query->condition('n.type', $this->configuration['node_type']);
}

return $query;

Using the inverse case source plugin

Using our without term source plugin is pretty easy. We just need to modify our migrate_plus.migration.yoursite_gallery.yml file to use the new plugin:

source:
  plugin: deninet_node_without_term
  node_type: picture
  tid: '25'

Where the tid of 25 corresponds to our "art" tag. You know the drill at this point:

$ drush cim -y
...
$ drush ms

Group: YourSite group (yoursite)    Status  Total  Imported  Unprocessed  Last imported       
yoursite_role                      Idle    4      0         4            N/A
yoursite_user                      Idle    18     0         18           N/A
yoursite_file                      Idle    507    0         507          N/A
yoursite_gallery                   Idle    74     0         74          N/A
yoursite_creation                  Idle    60     0         74          N/A
yoursite_blog_text_paragraph       Idle    1050   0         1050         N/A
yoursite_blog_image_paragraph      Idle    1050   0         1050         N/A
yoursite_blog                      Idle    1050   0         1050         N/A

Yay! Now our Gallery and our Creation migration now only import the content they should, and the total is same as the total number of picture nodes to migrate. Congradulations! You now have separated your source content into separate content types!

Conclusion

In this part, we've discussed methods to separate your Drupal 7 site content so that we migrate it to different Drupal 8 content types. We created two custom source plugins that filter out the nodes to migrate by the presence of lack of a tag assigned to the content. This gives opens up a lot of possibilities for us to further divide our content and make it work better for us. So where do we go from here? There's tons of stuff you can do with migrations that I didn't cover, importing from flat files, recurring web service imports, and so much more. 

My goal throughout this series was to help others with the most common and needed migrations -- Drupal to Drupal migrations. The advanced topics I touched on (Paragraphs, custom source plugins) enable you to build more modern websites beyond the tools provided out of the box by Drupal core. Thank you everyone for reading!

Thanks to our sponsors!

This post was created with the support of my wonderful supporters on Patreon:

  • Alina Mackenzie​
  • Karoly Negyesi
  • Chris Weber

If you like this post, consider becoming a supporter at patreon.com/socketwench.

Thank you!!!