Shortcode parsed incorrectly because of heredoc

A workaround might be:

[code]
$foo = <<<EOT
  ....
EOT;
[/code]

where we change <<< to &lt;&lt;&lt;. We could do this automatically before do_shortcode filters the content and then replace it again afterwards.

I tested this version:

[code]
$foo = <<<EOT
....
EOT;<!---->
[/code]

and it seems to parse the shortocde’s content correctly. But then we would need to remove the extra <!----> part, after do_shortcode has filtered the content. This approach, with the HTML comment, would display the correct HTML source, but the rendering would be problematic, except within a <textarea>.

Peeking into the /wp-includes/shortcodes.php file

The problem using an unclosed HTML tag, within the shortcode’s content, seems to be that the do_shortcodes_in_html_tags() parser thinks [/code] is a part of it.

This part of do_shortcodes_in_html_tags() seems to influense this behavior:

// Looks like we found some crazy unfiltered HTML.  Skipping it for sanity.
$element = strtr( $element, $trans );

where [/code] is replaced to &#91;/code&#93;.

This happens just before the shortcode regex-replacements.

We would see &#91;/code&#93; from the output of the_content(), if this part:

$content = unescape_invalid_shortcodes( $content );

wouldn’t run at the end of the do_shortcode() function.

So it’s like we didn’t close the shortcode.

HTML Encoding The Code Blocks

We could HTML encode the content of the code blocks with:

add_filter( 'the_content', function( $content )
{
    if( has_shortcode( $content, 'code' ) ) 
        $content = preg_replace_callback( 
            '#\[code\](.*?)\[/code\]#s', 
            'wpse_code_filter', 
            $content 
        );

    return $content;

}, 1 );

where our custom callback is defined as:

function wpse_code_filter( $matches )
{
    return esc_html( $matches[0] );
}

Note that there are still things to be sorted out like

  • content filters within the core, e.g. the wpautop filter,
  • content filters from 3rd party plugins
  • shortcodes inside the codeblock,
  • etc.

It’s informative to look at plugins like SyntaxHighligther Evolved to see the what kind of workarounds would be needed. (I’m not related to that plugin).

Alternatives

Another approach is to move the code outside of the content editor and store it in e.g.

  • custom fields,
  • custom tables
  • custom post type (e.g. stored as excerpts).

For the latter case, this kind of shortcode comes to mind (PHP 7+):

add_shortcode( 'codeblock', function( Array $atts, String $content )
{
    // Setup default attributes
    $atts = shortcode_atts( [ 'id' => 0 ], $atts, 'codeblock_shortcode' );

    // We only want to target the 'codeblock' post type
    if( empty( $atts['id'] ) || 'codeblock' !== get_post_type( $atts['id'] ) ) 
        return $content;

    // Pre wrapped and HTML escaped post-excerpt from the 'codeblock' post type
    return sprintf( 
            '<pre>%s</pre>',
            esc_html( 
                get_post_field(
                    'post_excerpt', 
                    $atts['id'], 
                    'raw' 
                )
            )
        );
} );

This might need further testing and adjustments!

Then we could refer to each codeblock with

[codeblock id="123"]

in the content editor.