Cyrillic characters in rewrite rules cause 404 Not Found errors

I ended up following @Kolya Korobochkin advice and added uppercase and lowercase versions of the rewrite rules that include escaped octets.

$regular_page_uri = get_page_uri( $page->ID );

$uppercase_page_uri = preg_replace_callback(
    '/%[0-9a-zA-Z]{2}/',
    create_function( '$x', 'return strtoupper( $x[0] );' ),
    $regular_page_uri
);

The Percent Encode Capital Letter plugin uses a similar approach to convert the octets in every URL to the their uppercase version. However, the plugin is outdated and may not be doing the conversion in all the necessary places. Also, I believe most users of the plugin I’m working on won’t have URL with encoded octets, so running preg_replace_callback for all URLs is an unnecessary effort.

Using uppercase characters to represent octets is the recommended way of doing percent-encoding (See RFC3986, Section 2.1). So a better solution would be to get WordPress to update their utf8_uri_encode function to escape octets using them. Most browsers I tested all seem to keep the original case, while others, like Safari, convert it to uppercase, if they get a chance.

Leave a Comment