Extract subdomain and relative address from a url

I encountered a similar requirement, and could not find a ready-made solution for this, so I created a function that is based on the standard PHP function parse_url() and added to this over time to extract everything that I could think of.

Below is my code and two examples of the output. This will extract the sub-domain, root domain, tld, extension, path, absolute address and more:

/**
 * Parse and check the URL Sets the following array parameters
 * scheme, host, port, user, pass, path, query, fragment, dirname, basename, filename, extension, domain, 
 * domainX, absolute address
 * @param string $url of the site
 * @param string $retdata if true then return the parsed URL data otherwise set the $urldata class variable
 * @return array|mixed|boolean
 */
function parseURL($url,$retdata=true){
    $url = substr($url,0,4)=='http'? $url: 'http://'.$url; //assume http if not supplied
    if ($urldata = parse_url(str_replace('&','&',$url))){
        $path_parts = pathinfo($urldata['host']);
        $tmp = explode('.',$urldata['host']); $n = count($tmp);
        if ($n>=2){
            if ($n==4 || ($n==3 && strlen($tmp[($n-2)])<=3)){
                $urldata['domain'] = $tmp[($n-3)].".".$tmp[($n-2)].".".$tmp[($n-1)];
                $urldata['tld'] = $tmp[($n-2)].".".$tmp[($n-1)]; //top-level domain
                $urldata['root'] = $tmp[($n-3)]; //second-level domain
                $urldata['subdomain'] = $n==4? $tmp[0]: ($n==3 && strlen($tmp[($n-2)])<=3)? $tmp[0]: '';
            } else {
                $urldata['domain'] = $tmp[($n-2)].".".$tmp[($n-1)];
                $urldata['tld'] = $tmp[($n-1)];
                $urldata['root'] = $tmp[($n-2)];
                $urldata['subdomain'] = $n==3? $tmp[0]: '';
            }
        }
        //$urldata['dirname'] = $path_parts['dirname'];
        $urldata['basename'] = $path_parts['basename'];
        $urldata['filename'] = $path_parts['filename'];
        $urldata['extension'] = $path_parts['extension'];
        $urldata['base'] = $urldata['scheme']."://".$urldata['host'];
        $urldata['abs'] = (isset($urldata['path']) && strlen($urldata['path']))? $urldata['path']: "https://wordpress.stackexchange.com/";
        $urldata['abs'] .= (isset($urldata['query']) && strlen($urldata['query']))? '?'.$urldata['query']: '';
        //Set data
        if ($retdata){
            return $urldata;
        } else {
            $this->urldata = $urldata;
            return true;
        }
    } else {
        //invalid URL
        return false;
    }
}

Example 1: if you submit your example url (https://developer.wordpress.org/reference/functions/wp_parse_url/) the output will be as follows:

  array (
    'scheme' => 'https',
    'host' => 'developer.wordpress.org',
    'path' => '/reference/functions/wp_parse_url/',
    'domain' => 'wordpress.org',
    'tld' => 'org',
    'root' => 'wordpress',
    'subdomain' => 'developer',
    'basename' => 'developer.wordpress.org',
    'filename' => 'developer.wordpress',
    'extension' => 'org',
    'base' => 'https://developer.wordpress.org',
    'abs' => '/reference/functions/wp_parse_url/',
  )

Example 2: Some other fictitious url with more in it http://dev.yoursite.com/some/other/directory/index.php?pg=7 – the output now will be:

  array (
    'scheme' => 'http',
    'host' => 'dev.yoursite.com',
    'path' => '/some/other/directory/index.php',
    'query' => 'pg=7',
    'domain' => 'yoursite.com',
    'tld' => 'com',
    'root' => 'yoursite',
    'subdomain' => 'dev',
    'basename' => 'dev.yoursite.com',
    'filename' => 'dev.yoursite',
    'extension' => 'com',
    'base' => 'http://dev.yoursite.com',
    'abs' => '/some/other/directory/index.php?pg=7',
  )

Probably more information than you want and some of the information is redundant but you can modify the function slightly to get exactly what you need or you can use it as-is and use the parts of the array that you need.

Note: if you submit https://developer.wordpress.org to the wordpress or PHP built in url parse functions, ‘path’ will not be defined in the output. The parseURL() function will set path to “https://wordpress.stackexchange.com/”.

tech