How to to stop html editor from addig tags to shortcodes, images, etc

1. Filtering the content

Here’s a one-function content filter to meet the above four requirements:

add_filter( 'the_content', 'strip_some_paragraphs', 20 );
function strip_some_paragraphs( $content ) {

    $content = preg_replace(
        '/<p>(([\s]*)|[\s]*(<img[^>]*>|\[[^\]]*\])[\s]*)<\/p>/',
        '$3',
        $content
    );

    return $content;
}

2. Resources for Regular Expressions

3. The RegEx in 1. explained

The actual regex at hand is <p>(([\s]*)|[\s]*(<img[^>]*>|\[[^\]]*\])[\s]*)<\/p>' denotes the beginning and end of a string, as usual, / is the pattern delimiter.

You’ve mentioned 4 cases in which you want to remove <p> tags. So first off, our pattern must start with one such tag <p> and end with its closing companion </p>. That goes for all four cases. Inside, we want to allow for four different options to be valid matches. We group those options in brackets and use the pipe | character to separate them. | matches either side of it and can be strung together. You can think of it as “OR”.

Now for the options:

Let’s begin with the whitespaces. \s denotes the whitespace character class (spaces, tabs, and line breaks). We use the star quantifier [\s]* to match zero or more of the preceding character class.
So now we match all empty paragraph tags. And by chance decreased the cases to match to 3 – zero or more takes care of both <p></p> and <p> </p>. Nice.

As for the other two, we will wrap both in further [\s]*, so that not only <p>[shortcode]</p>, but also for instance

<p>
    [shortcode] </p>

is matched.
What we have left to do now is come up with patterns to match shortcodes and img tags. Here we make use of character class negation. The caret ^ at the beginning of a character class negates it. Hence, [^>] matches any character that is not >.
We start the pattern for the images with an opening tag <img and for the shortcode with a square bracket \[. The latter must be escaped with a slash, since it is a regex special character.
Now we use the above mentioned negated character class with the star quantifier. [^>]* for the img and [^\]]* for the shortcode, matching anything but the respective closing character. Then we match that very closing character once and are done.

So we get <img[^>]* for the images and \[[^\]]*\] for the shortcode.
We wrap those in possible multiple whitespaces: [\s]*<img[^>]*>[\s]* and [\s]*\[[^\]]*\][\s]*
Grouping those two and adding only whitespaces as the first option yields the inside of the brackets and that we finally wrap in the paragraph tags.

For the replacement we use the backreference $3 which takes care of the actual image and shortcode tags not disappearing. In order for the whitespaces not to remain, we’ve made two subgroups of the possible options. Only img and shortcode are targeted by the back reference.

4. Sidenote

This question is borderline of the scope of WPSE – as it’s mostly on PHP & Regular Expressions. It might have better been asked on StackOverflow.
Anyhow, now it’s answered.

Leave a Comment