Adding current user’s ID to the end of PDF hyperlinks in post content

Here’s a solution that I came up with that will add the user query argument and the current user’s ID to PDF links, e.g.:

http://mysite/wp-content/uploads/2018/12/My_PDF_File.pdf?user=54

This code works by inspecting the post’s content using the the_content filter, then parsing the content using DOMDocument. I’ve run into various gotchas using DOMDocument and much of this code deals with handling those edge cases. Also note that we’ll bail immediately if a user is not logged in, since there’s nothing to do in that case.

The real meat and potatoes comes towards the end of wpse_append_current_user_id_to_pdf_links() where links are extracted from the content, checked to ensure they are PDFs, then add_query_arg() is used to append the user query argument and value.

/**
 * Filters post content and appends user query argument and user ID value
 * to PDF links.
 * 
 * @param string $content post content 
 */
add_filter( 'the_content', 'wpse_append_current_user_id_to_pdf_links' );
function wpse_append_current_user_id_to_pdf_links( $content ) {
    // Bail if there is no content to work with.
    if ( ! $content ) {
        return $content;
    }

    // Get the current user.
    $current_user = wp_get_current_user();

    // Bail if there is nobody logged in.
    if ( ! $current_user->exists() ) {
        return $content;
    }

    // Create an instance of DOMDocument.
    $dom = new \DOMDocument();

    // Supress errors due to malformed HTML.
    // See http://stackoverflow.com/a/17559716/3059883
    $libxml_previous_state = libxml_use_internal_errors( true );

    // Populate $dom with $content, making sure to handle UTF-8, otherwise
    // problems will occur with UTF-8 characters.
    // Also, make sure that the doctype and HTML tags are not added to our HTML fragment. http://stackoverflow.com/a/22490902/3059883
    $dom->loadHTML( mb_convert_encoding( $content, 'HTML-ENTITIES', 'UTF-8' ), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );

    // Restore previous state of libxml_use_internal_errors() now that we're done.
    libxml_use_internal_errors( $libxml_previous_state );

    // Create an instance of DOMXpath.
    $xpath = new \DOMXpath( $dom );

    // Get all links.
    $links = $xpath->query( "//a" );

    // Process the links.
    foreach ( $links as $link ) {
        $link_href = $link->getAttribute( 'href' );

        // Get the extension for this link.
        $extension = pathinfo( $link_href, PATHINFO_EXTENSION );

        // Only process PDF links.
        if ( 'pdf' === strtolower( $extension ) ) {
            $link->setAttribute( 'href', add_query_arg( 'user', $current_user->ID, $link_href ) );
        }   
    }

    // Save and return updated HTML.
    $new_content = $dom->saveHTML();
    return $new_content;
}