Sanitize and data validation with apply_filters() function

Question

There’s some confusion here, because not all of these are validation, there are 2 others that are necessary to understand what’s appropriate:

validation
sanitisation
escaping

Sanitisation

Sanitisation makes things clean and well formed

This cleans up the data, e.g. trimming trailing spaces, removing letters in a number field, making an all lowercase field all lowercase, etc

E.g. The user entered " Banana ", so turn it into "Banana"

Sanitisation is the first thing that should happen to input that comes from somewhere else, e.g. when processing a form, sanitise the data before doing anything with it. The same of any data that comes from a remote connection etc

Popular sanitising methods include:

stripping out HTML or particular tags via wp_kses wp_strip_all_tags etc
Removing ranges of characters, such as non-numeric characters, or punctuation
Trimming trailing characters such as spaces etc
Applying bounds such as restricting values to within a range ( this could instead be implemented as a validation check )

Validation

Validation checks if things are valid

Validation checks that the phone number the user entered is actually a phone number. It’s a true or false check.

E.g. Is the fruit the user chose actually a fruit?

This should be done on input after sanitising, and if validation fails, abort, popular methods of validation include:

functions like is_numeric etc
regular expressions, useful for things like phone numbers or URLs, things that have an expected format
checking roles and capabilities to verify the user actually has the ability to do the intended action
nonce checks
whitelists of predefined allowed values
checking values sit within valid ranges, e.g. nobody has a 200 character long telephone number, nor would somebody live in house number -2000000

Escaping

Escaping makes values safe for output, and enforces assumptions

Escape late, escape often.

Escaping isn’t talked about much, but imagine it using the fruit example from above. Escaping is like a conveyor belt with a fruit shaped cut out. You always get something fruit shaped at the end. If a fruit passes through, it’s untouched, but if a malicious actor goes through, a mangled but safe fruit shaped version pops out the end.

Escaping is therefore all about enforcing assumptions. E.g. in an <a> tag, the href attribute should contain a URL. But this might not be the case, escaping allows us to swap the “should contain” to a “will always contain”, providing us a guarantee. This prevents somebody starting their URL with "/> and inserting arbitary HTML.

Escaping should always be done on output, at the very latest point possible to ensure nothing was modified. Escaping is also context sensitive. You would use esc_attr to escape HTML attributes, but if it’s a href or src attribute, we would use esc_url to indicate it’s a URL we’re intending to output.

A Note on Double Escaping and `wp_kses`

You can sanitize and validate multiple times, but you should only escape a value once. This is because double escaping can double encode values, and in some circumstances can allow content to break out of escaping.

wp_kses_post and wp_kses are also unusual in that they can be used to escape, and to sanitise, and can be used multiple times on a value.

A Note on Early Escaping

This is a near mortal sin that can almost undo everything escaping gives you. Once something has been escaped, we know it is safe and secure to output it, but, if we then assign it to a variable, who knows what might happen to it between escaping and output. If that variable gets modified, passed to a function, or piped through a filter, it’s no longer safe, it’s status is a mystery. We can escape it again but now we’ve double escaped, so we might have made safe data dangerous, or mangled good data.

So About Those Filters

Should we sanitize and validate the apply_filters() like the examples below?

It depends on the context

On input:

Sanitise
then validate
if it’s valid then proceed, else reject

On output to the browser/requests/etc:

you can sanitise and validate once fetched from the database if you like, but the #1 priority is to escape, escape only once, and do it on the moment of output. Don’t store it in a variable, that’s early escaping and it’s dangerous
Escape after filtering, not before, who knows what the filter did to a known safe value, once the value is returned from the filter its safety and status are a mystery

absint( apply_filters( 'slug_excerpt_length', 35 ) );

Great we now know this value is definitely a number, and a positive number too. If we prefix this statement with echo then that’s a safe escaped value. Else it’s just sanitisation that’s cleaned up the value.

wp_kses_post( apply_filters( 'slug_excerpt_more', '…' ) );

Great, this is both sanitising and escaping if we immediately output it, but if we save it to a variable, it’s just sanitising.

esc_url( apply_filters( 'slug_login_url', home_url( "https://wordpress.stackexchange.com/" ) ) );

This is escaping, and needs an echo statement. If we assign this to a variable then the escaping was for nought and we introduce a precarious situation.

If the question on the other hand is, should we double check the return values of filters? Yes, that would be wise, but overcautious. In that scenario I would expect that this would be testing for filters that aren’t implemented properly, e.g. returning text where a number is expected. In that scenario, validation is the only option, escaping and sanitisation would be inappropriate.

Exceptions

When using the_content filter, pass the value through wp_kses_post then pass it into the filter and immediatley echo, e.g. echo apply_filters( 'the_content', wp_kses_post( $dangerous ) );
In shortcodes, escape and use output buffers if you must, so that you can return a string