root-relative links for multiple parked domains

I tracked the function calls in my theme (a child theme of Oenology by Chip Bennett) and wp-include that generate the links, found the home_url() function, and then wrote these functions for my site:

function gregory_make_relative($url="") {
    return preg_replace( '#^https?://[^/]+/#iu', "https://wordpress.stackexchange.com/", $url, 1 );
}
add_filter( 'home_url', 'gregory_make_relative', 11, 1 );

I intentionally grep-searched to include the / after the domain so that I could still use home_url('') to return the Blog’s specified domain, and specify true canonical url’s to that domain in the <head> via WordPress SEO using the following functions (i.e., the canonical links will be the same regardless of the domain used to load the page; i.e., no ‘duplicate content’):

function gregory_wpseo_canonical_add_domain( $canonical ) {
    return home_url('').$canonical;
}
add_filter( 'wpseo_canonical', 'gregory_wpseo_canonical_add_domain', 10, 1 );

so far, it works really well but I wonder if it’ll adversely affect Feeds or the e-Commerce solution I eventually implement. comments, notes, tips, warnings are welcome 🙂

Update

a simple home_url() (i.e., no path specified) is used throughout the system, so the trailing / in the grep-search couldn’t be used. I had to remove it and find another way to specify the domain in the canonical url’s. so, a little more research through wp-includes, and now the functions look like this:

function gregory_make_relative($url="") {
    return preg_replace( '#^https?://[^/]+#iu', '', $url, 1 );
}
add_filter( 'home_url', 'gregory_make_relative', 11, 1 );

function gregory_wpseo_canonical_add_domain( $canonical ) {
    // get_option() is defined in wp-includes/functions.php and is used by get_home_url()
    return get_option('home').$canonical;
}
add_filter( 'wpseo_canonical', 'gregory_wpseo_canonical_add_domain', 10, 1 );

Update 2

it’s harder than it first appeared 😉 my code now looks like this. feeds are affected. somewhere in the chain of functions that produces the links, feeds seems to be using _SERVER[‘HTTP_HOST’]. I’ll have to examine my options there.

/* FILTERS TO PRODUCE ROOT-RELATIVE URLs */
// define WP_SITEURL because the formula in wp-includes/functions::wp_guess_url() makes a
// false assumption and appends $_SERVER['REQUEST_URI'] to the base_url.
define('WP_SITEURL', 'http://my.domain.hk/', true );

// strip the domain
function gregory_make_relative( $url="" ) {
    return preg_replace( '#^https?://[^/]+#iu', '', $url, 1 );
}
add_filter( 'site_url', 'gregory_make_relative', 11, 1 );
add_filter( 'home_url', 'gregory_make_relative', 11, 1 );
add_filter( 'template_directory_uri', 'gregory_make_relative', 11, 1 );
add_filter( 'stylesheet_directory_uri', 'gregory_make_relative', 11, 1 );
add_filter( 'script_loader_src', 'gregory_make_relative', 11, 1 );

function gregory_make_stylehref_relative( $tag='' ) {
    // $wp_styles->do_item() passes this along to the filter:
    // "<link rel="$rel" id='$handle-rtl-css' $title href="https://wordpress.stackexchange.com/questions/50512/$rtl_href" type="text/css" media="$media" />\n"
    $matches = array();
    if( !preg_match( '#^(.+ +href=\')(.+)(\' +type=.+)$#iu', $tag, &$matches ))
        return $tag;
    $matches[2] = gregory_make_relative($matches[2]);
    return $matches[1].$matches[2].$matches[3];
}
add_filter( 'style_loader_tag', 'gregory_make_stylehref_relative', 11, 1 );

function gregory_wpseo_canonical_add_domain( $canonical="" ) {
    // get_option is defined in wp-includes/functions.php and is used by get_home_url() to get the home url.
    return get_option('home').$canonical;
}
add_filter( 'wpseo_canonical', 'gregory_wpseo_canonical_add_domain', 10, 1 );

Update 3 — WPSEO

there’s an option in WPSEO to add text/links before and after RSS posts including the option to use placeholders. one of those placeholders is %%BLOGLINK%%. unfortunately, with the root-relative filters in place, %%BLOGLINK%% produced an empty string which was not useful in the feeds. this code fixes that problem (noting that my other choice was to simply hardcode the link in WPSEO’s RSS settings, probably the smarter thing to do 🙂

// a change for WPSEO's %%BLOGLINK%% code.
// I changed get_bloginfo() to get_bloginfo_rss() in wpseo/frontend/class-frontend.php to allow this.
// without these changes, %%BLOGLINK%% is printed into the rss feeds as an empty string.
function gregory_set_domain_in_rss_urls( $info, $show ) {
    // copied from wp-includes/general-template.php::get_bloginfo()
    $url = true;
    if (strpos($show, 'url') === false &&
        strpos($show, 'directory') === false &&
        strpos($show, 'home') === false)
        $url = false;

    return ( !$url || !empty($info) ? $info : "https://wordpress.stackexchange.com/" );
}
add_filter( 'get_bloginfo_rss', 'gregory_set_domain_in_rss_urls', 11, 2 );

(I have since decided to hardcode my RSS after-post message, so the above filter has been disabled in my theme.)

Update 4 — preg_replace() becomes preg_match()

I’ve updated the gregory_make_stylehref_relative() function shown in Update 2 to use preg_match() instead of preg_replace(). this is the old code:

    $href = preg_replace( '#^(.+ +href=\')(.+)(\' +type=.+)$#iu', '$2', $tag );
    $href = gregory_make_relative($href);
    return preg_replace( '#^(.+ +href=\')(.+)(\' +type=.+)$#iu', '$1'.$href.'$3', $tag );

Update 5 — get_avatar()

another filter, this time for get_avatar() so that the schema and current domain are included in the path to wp’s blank.gif. without these, the gif won’t be found and loaded. without the schema, Gravatar will load it’s own corporate avatar instead.

function gregory_avatar_add_domain( $avatar ) {
    // look for urlencode( includes_url('images/blank.gif')) in the $avatar string.
    // if found, encode the schema and domain, insert it into the $avatar string.
    $gif = includes_url('images/blank.gif'); // from get_avatar()
    if( preg_match( '|^https?://|i', $gif ))
        // the url already includes the schema (and domain).
        return $avatar;
    $gif = urlencode($gif);
    $schema = is_ssl() ? 'https://' : 'http://'; // from wp_guess_url()
    $domain = urlencode( $schema . $_SERVER['HTTP_HOST'] );
    return str_replace( $gif, $domain.$gif, $avatar );
}
add_filter( 'get_avatar', 'gregory_avatar_add_domain', 11, 1 );

Update 6 — redirect_canonical()

the filters were causing problems with incoming url’s that involved query strings. the url’s scheme://domain would get chopped to just ://domain. it took a few hours to pinpoint the problem, but it was in wp-includes/canonical.php::redirect_canonical() and a filter hook there made it possible to correct the problem. here’s the filter:

function gregory_redirect_canonical_addScheme( $redirect_url ) {
    // redirect_canonical() requires (but doesn't need) a fully qualified url.
    // get_permalink() in the second iteration of redirect_canonical()
    // (redirect_canonical() calls itself) usually switches to the domain specified in the
    // WP Settings. but if we do the same, redirect_canonical() doesn't recognise the
    // permalink as a redirect, and WP doesn't update the url in the User Agent's Address bar.
    if( preg_match( '|^https?://|i', $redirect_url ))
        // fully qualified url. leave it alone.
        return $redirect_url;
    if( substr( $redirect_url, 0, 3 ) == '://' )
        // no scheme specified in $redirect during the
        // second pass through redirect_canonical().
        return (is_ssl() ? 'https' : 'http') . $redirect_url;
    if( substr( $redirect_url, 0, 1 ) == "https://wordpress.stackexchange.com/" )
        // root-relative url.
        return (is_ssl() ? 'https://' : 'http://').$_SERVER['HTTP_HOST'].$redirect_url;
    // relative url. not root-relative.
    return (is_ssl() ? 'https://' : 'http://').$_SERVER['HTTP_HOST']."https://wordpress.stackexchange.com/".$redirect_url;
}
add_filter( 'redirect_canonical', 'gregory_redirect_canonical_addScheme', 11, 1 );

please note that I have not marked my question as answered just yet, because I don’t know what consequences these filters will have further down the road. more time and testing is required.

cheers,
Gregory
(WordPress 3.3.2)

Leave a Comment