Can’t get custom rewrite tag, query var, permastruct (permalink structure), and rewrite rule to work properly together

After a lot of inspection of rewrites happening under the hood by hooking into filters and logging variable values, I managed to solve the problem!

Queries and query rewriting

When a query occurs, WordPress will serve the content using the correct template as soon as it has enough info to unambiguously determine what the template and post are. For the non-custom post type, WordPress just needs to know the post slug. For custom post types, it needs to know both the post slug and the post type; so for a podcast, the query needs to specify post_type=podcast and e.g. name=news-for-august. This is because post slugs are unique for a given post type, but needn’t be unique across post types, so the slug itself doesn’t suffice to identify the post. Also, the post type must be known in order for the right template to be selected. Thus, a request like /?post_type=podcast&name=news-for-august can be resolved and renders the post correctly.

Additionally, when a post type is registered, a rewrite tag and a query variable are registered that allow this querying to be condensed. For example, for my podcast post type, the rewrite tag is %podcast% (not %postname% like it is for non-custom posts), and the query var is podcast=; this acts as an abbreviated form of post_type plus name. For example, the request /?podcast=news-for-august is internally rewritten to /?podcast=news-for-august&post_type=podcast&name=news-for-august, and thus results in that post being served.

This explains the following problem:

Strangely, in the context of podcast permalinks, the %postname% tag isn’t being populated like it is for regular blog posts.

Also, regarding the following…

When CPT UI registers the custom post type podcast, it also adds a permastruct with the name podcast. Since my permastruct for posts (set in [Settings > Permalinks > Custom Structure]) is /articles/%post_id%/%postname%, the podcast permastruct is /articles/podcast/%postname%.

… the default permalink structure is actually /articles/podcast/%podcast%.

When a post ID is specified in a query (via p=), it takes precedence over any post_type and/or name variable; if those variables don’t agree with the specified ID, a redirect occurs. Indeed, it seems that a redirect always occurs if an ID is specified, e.g. if the ID of the podcast post News for August is 50, then /?p=50 is also internally rewritten to /?post_type=podcast&name=news-for-august, which results in a redirect to the permalink for that post.

We can take advantage of this behaviour to ensure that a redirect to the permalink occurs for the other URL formats that we want to implement.

Adjusting the permastruct and tag substitution

We’ll adjust the permastruct to use %podcast% rather than %postname%:

function wpse373987_add_tag_and_permastruct() {
    /** Define the tag */
    add_rewrite_tag( '%podcast_episode_number%', '([0-9]+)' );

    /** Override the default permastruct for the podcast post type */
    add_permastruct(
        'podcast',
        'podcasts/%podcast_episode_number%/%podcast%',   // This line changed
        [ 'with_front' => false ]
    );

    /** Define podcast shortlinks */
    add_rewrite_rule( '^([0-9]+)/?', [ 'podcast_episode_number' => '$matches[1]' ], 'top' );
}
add_action( 'init', 'wpse373987_add_tag_and_permastruct' );

Since we are no longer using the %postname% tag in our permastruct, we also no longer need to substitute %postname% for the slug; this is done correctly with the %podcast% tag automatically. Filtering on post_link is also not necessary, since post_type_link is what is used for custom post types:

function wpse373987_handle_tag_substitution( $permalink, $post ) {
    // Do nothing if the tag isn't present
    if ( strpos( $permalink, '%podcast_episode_number%' ) === false ) {
        return $permalink;
    }
    
    $fallback = '_';
    
    $episode_number="";
    if ( function_exists( 'get_field' ) && $post->post_type === 'podcast' ) {
        $episode_number = get_field( 'episode_number', $post->ID, true );
    }
    if ( ! $episode_number ) {
        $episode_number = $fallback;
    }

    $permalink = str_replace( '%podcast_episode_number%', $episode_number, $permalink );
    
    // The following line is now not needed.
    // $permalink = str_replace( '%postname%', $post->post_name, $permalink );

    return $permalink;
}
add_filter( 'post_type_link', 'wpse373987_handle_tag_substitution', 100, 2 );
// The following line is not needed.
// add_filter( 'post_link', 'wpse373987_handle_tag_substitution', 100, 2 );

Adjusting our query rewriting

After making the above two adjustments, the permalink of podcasts is of the form /podcasts/<episode_number>/<episode_title>, and content is served correctly from that URL, because it internally resolves to the query /?post_type=podcast&name=<episode_title>&podcast_episode_number=<episode_number>, which contains the post_type and name variables needed to determine which post to serve and which template to use.

However, for the other URL formats, i.e.:

  • /podcasts/<episode_number>/<incorrect_title>;
  • /podcasts/<episode_number>; and
  • /<episode_number>;

we still need to define how to resolve <episode_number> to a podcast, which we do by hooking into the request filter. Previously, we were rewriting all queries for podcasts to the form /?p=<podcast_post_id>, including when we visit the permalink URL, which is what was causing the 404 errors. This is because WordPress does not issue a redirect for a query of that form if the client is visiting the permalink URL — instead, the query processing continues, and WordPress just gives up with a 404 once it realises that the query doesn’t contain post_type and name (since our query rewrite removed those) and it therefore can’t determine which post to serve, nor which template to use.

Therefore, we should only rewrite queries to the form /?p=<podcast_post_id> when the URL we are currently visiting is not the permalink. The content is already already served correctly at the permalink URL; we just want to redirect other URLs to the permalink, which we can do by rewriting the query to just have the post ID, as we were before, but not when the client visits the permalink URL itself.

Also, rather than returning [ 'p' => '-1' ] to cause a 404 response when we need to, the proper way to do this is to return [ 'error' => 404 ].

Here is the modified filter:

function wpse373987_handle_query_var( $query_vars ) {
    /** Ignore requests that don't concern us. */
    if ( ! isset( $query_vars['podcast_episode_number'] ) ) {
        return $query_vars;
    }

    /** Validate the episode number; it must be an unsigned integer. */
    if ( preg_match( '/^[0-9]+$/', $query_vars['podcast_episode_number'] ) !== 1 ) {
        /** The episode number is invalid; respond with a 404 Not Found. */
        return [ 'error' => 404 ];
    }

    /**
     * Episode number, with any leading zeroes stripped;
     * they must be stripped for the SQL query to work.
     */
    $episode_number = (int)( $query_vars['podcast_episode_number'] );

    global $wpdb;
    
    /** Array of IDs of posts that have the given episode number */
    $post_ids = $wpdb->get_col(
        $wpdb->prepare(
            "SELECT post_id FROM {$wpdb->postmeta} WHERE
                    meta_key = 'episode_number'
                AND meta_value = %d
            ORDER BY post_id ASC",
            
            $episode_number
        )
    );

    /** String representing `$post_ids` in SQL syntax */
    $sql_post_ids = "('" . implode( "','", $post_ids ) . "')";

    // The logic after this point has been adjusted.

    /**
     * Determine the ID and name of the published podcast with the given episode
     * number (and lowest ID, if multiple such podcasts exist).
     */
    $podcast = $wpdb->get_row(
        "SELECT id, post_name AS name FROM {$wpdb->posts} WHERE
                id IN {$sql_post_ids}
            AND post_type="podcast"
            AND post_status="publish"
        ORDER BY id ASC"
    );

    /**
     * If there are no published podcasts with the given episode number,
     * respond with 404.
     */
    if ( $podcast === null ) {
        return [ 'error' => 404 ];
    }

    /**
     * If the podcast name specified in the query doesn't correspond to the
     * episode number specified in the query, we need to redirect to the right
     * page, based on the episode number (ignoring the specified name). We do
     * this by issuing a query for the post ID; that query will then redirect
     * to the podcast's permalink, where we won't take action.
     * 
     * Else, the specified name matches the specified episode number,
     * so we are already at the podcast's permalink, and thus do nothing.
     */
    if (    ! isset( $query_vars['name'] )
        ||  $query_vars['name'] !== $podcast->name
    ) {
        return [ 'p' => $podcast->id ];
    }

    return $query_vars;
}
add_filter( 'request', 'wpse373987_handle_query_var', 100 );

Result

Great, it works!

URLs of the form /podcasts/<episode_number>, followed by an incorrect slug or no slug, will redirect to the permalink of the podcast with that episode number. Shortlinks are now also correctly handled by the rewrite rule that we added in add_tag_and_permastruct(); it resolves URLs of the form /<episode_number> to a query of the form /?podcast_episode_number=<episode_number>. Queries of this form are already handled by our request filter, handle_query_var(), rewriting them to the form /?p=<post_id>, which WordPress then redirects to the corresponding podcast permalink. All sorted!