How to uniquely identify queries?

Question

tl;dr: build affordable unique identifiers for queries is possible, but is pretty useless.

If you’re looking for unique identifiers for query objects, then you can use spl_object_hash, but IMHO is pretty useless: it is non-predictable, and also you get different id even if query vars are identical, but set on 2 different query objects.
Only reason to use it can be obtain a statistic like: “During this request XX different WP_Query objects were instantiated.”

If you pointed your attention to query variables is probably because you are looking an id to identify query results, i.e. when you can obtain the same id from 2 different query objects then the results should be the same for both queries.

In that case I’ve a bad surprise for you: that will never happen, because no matter the technique you use, if you build your query id based on query vars you don’t take into account 'posts_*' filters (‘posts_where’, ‘posts_join’ and so on, they are 19 IIRC).

Proof of concept:

$query1 = new WP_Query( 'post_type=post' );

add_filter( 'posts_where', function() { return ' AND 1 = 0'; } );

$query2 = new WP_Query( 'post_type=post' );

$query1 and $query2 are identical, so a virtual callback returning an unique id based on query vars will return identical id for the 2 queries, but first query returns all published posts, second query returns nothing.

This is the reason why query vars are not a good starting point to build unique query id.

If what matters is query results, you should concentrate on query sql request: when request is the same results will be the same, in addition request is a string, so easily hashable.

A good candidate hook for the scope should be one when the request is completely built, but not yet performed, ‘split_the_query’ filter hook should be fine.

// prepare our ids storage
add_action( 'init', function() {
   global $query_ids;
   $query_ids = array();
});

// use the filter to build and store the id
add_filter( 'split_the_query', function( $split_the_query, $query ) {
  $hash = md5( $query->request );
  $hash .= $split_the_query ? '_split' : '';
  global $query_ids;
  if ( ! in_array( $hash, $query_ids, TRUE ) ) {
    $query_ids[] = $hash; // store hash if not already stored
  }
  return $split_the_query; // return $split_the_query as is
}, PHP_INT_MAX, 2 );

Now, we have a way to build unique query identifiers: when we can obtain the same identifier from 2 queries, then results of the 2 queries are the same.

But what we can do with that? Maybe a statistic like:

add_action('shutdown', function() {
   global $query_ids;
   echo count( $query_ids ) . " different WP_Query requests were performed.";
});

I can’t find what is the usefulness of a statistic like that, but maybe someone can.

I know you are thinking about to use the query identifier to cache the query, and don’t trigger a database request when id is the same, and it’s a good idea, but… you can’t.

The problem is that when WordPress has built the request, then immediately runs it, giving you no chance to short-circuit it!

Or better, once WordPress uses $wpdb->get_results to perform the query, and once $wpdb is a global variable, a chance is to replace that object with a modified wpdb class capable of caching, it’s a good idea, but in that case make no sense build an unique id for the queries once all the cache happen on $wpdb based on sql requests ran there.

In substance, problem is that WordPress suffers by 2 big problems regarding WP_Query logic (among others about WP_Query messy code):

process of building the query and process of running the query are not separable
there is no way to short-circuit the query: once WP_Query::get_posts() method has been called there is no way to prevent the sql query to be ran

Until those 2 issues stay there, IMHO create an unique id for queries is something that can be used only for eccentric statistics…

Related Posts:

Leave a Comment Cancel reply