Improving WP_Query performance for multiple taxonomies

I have a solution, but it’s a really ugly one. I’d love to hear a better one, but I’m not sure it’s possible.

WP_Query::get_posts() calls parse_tax_query() twice: first near the start, then again just before getting the SQL from it. There’s no single hook that lets me intercept and adjust the value of $tax_query in time to adjust the SQL, so instead I had to do it in two parts.

  • An action on pre_get_posts, near the start of get_posts() recognises taxonomy queries with 'relation' => 'OR' and simplifies them so that they’ll only generate one join on wp_term_relationships. At the same time it stores the resolved list of IDs from all the taxonomies inside the ‘WP_Query’ object to use later.

  • A filter on posts_where_paged, much later in query_posts(), checks for a saved list of IDs and replaces the condition on the join.

Here’s the code:

add_action('pre_get_posts', 'wp_query__pre');
function wp_query__pre ($wp_query) {
  if (!isset($wp_query->query['tax_query'])) return;
  if ($wp_query->query['tax_query']['relation'] != 'OR') return;

  $allterms = array();
  foreach ($wp_query->tax_query->queries as $query) {
    $tax = $query['taxonomy'];
    $terms = $query['terms'];
    $wp_query->tax_query->_transform_terms($terms, $query['taxonomy'], $query['field'], 'term_taxonomy_id');
    $allterms = array_merge($allterms, $terms);
  }

  $tax_query = array(array(
      'taxonomy' => $tax,
      'terms' => $terms,
      'operator' => 'IN',
      'include_children' => 0,
      'field' => 'term_taxonomy_id',
    ));

  $wp_query->query['tax_query'] = $tax_query;
  $wp_query->query_vars['tax_query'] = $tax_query;
  $wp_query->tax_query = new WP_Tax_Query($tax_query);
  $wp_query->saved_tax_terms = $allterms;
}

add_filter('posts_where_paged', 'wp_query__where', 10, 2);
function wp_query__where ($where, $wp_query) {
  if (!empty($wp_query->saved_tax_terms)) {
    $terms = implode(", ", $wp_query->dft_tax_terms);
    $where = preg_replace("!term_taxonomy_id IN \([^)]*\)!", "term_taxonomy_id IN ($terms)", $where);
  }
  return $where;
}

Note that the code hasn’t been tested thoroughly yet, and no doubt contains all manner of bugs. I probably won’t cope with any more complicated queries.

The upside of this approach is that it doesn’t require the rest of your code to know anything about it. Just code a taxonomy query with OR and it’ll get picked up and optimised. It didn’t provide quite as big a speed boost as I was hoping, but it was a definite improvement.

I’m wondering if the WordPress team shouldn’t include something like this in the core code.


Update: As befits such an ugly hack, this is broken on WordPress 3.2. I’m looking into a fix.