There’s some confusion here, because not all of these are validation, there are 2 others that are necessary to understand what’s appropriate:
- validation
- sanitisation
- escaping
Sanitisation
Sanitisation makes things clean and well formed
This cleans up the data, e.g. trimming trailing spaces, removing letters in a number field, making an all lowercase field all lowercase, etc
E.g. The user entered
" Banana "
, so turn it into"Banana"
Sanitisation is the first thing that should happen to input that comes from somewhere else, e.g. when processing a form, sanitise the data before doing anything with it. The same of any data that comes from a remote connection etc
Popular sanitising methods include:
- stripping out HTML or particular tags via
wp_kses
wp_strip_all_tags
etc - Removing ranges of characters, such as non-numeric characters, or punctuation
- Trimming trailing characters such as spaces etc
- Applying bounds such as restricting values to within a range ( this could instead be implemented as a validation check )
Validation
Validation checks if things are valid
Validation checks that the phone number the user entered is actually a phone number. It’s a true
or false
check.
E.g. Is the fruit the user chose actually a fruit?
This should be done on input after sanitising, and if validation fails, abort, popular methods of validation include:
- functions like
is_numeric
etc - regular expressions, useful for things like phone numbers or URLs, things that have an expected format
- checking roles and capabilities to verify the user actually has the ability to do the intended action
- nonce checks
- whitelists of predefined allowed values
- checking values sit within valid ranges, e.g. nobody has a 200 character long telephone number, nor would somebody live in house number
-2000000
Escaping
Escaping makes values safe for output, and enforces assumptions
Escape late, escape often.
Escaping isn’t talked about much, but imagine it using the fruit example from above. Escaping is like a conveyor belt with a fruit shaped cut out. You always get something fruit shaped at the end. If a fruit passes through, it’s untouched, but if a malicious actor goes through, a mangled but safe fruit shaped version pops out the end.
Escaping is therefore all about enforcing assumptions. E.g. in an <a>
tag, the href
attribute should contain a URL. But this might not be the case, escaping allows us to swap the “should contain” to a “will always contain”, providing us a guarantee. This prevents somebody starting their URL with "/>
and inserting arbitary HTML.
Escaping should always be done on output, at the very latest point possible to ensure nothing was modified. Escaping is also context sensitive. You would use esc_attr
to escape HTML attributes, but if it’s a href
or src
attribute, we would use esc_url
to indicate it’s a URL we’re intending to output.
A Note on Double Escaping and wp_kses
You can sanitize and validate multiple times, but you should only escape a value once. This is because double escaping can double encode values, and in some circumstances can allow content to break out of escaping.
wp_kses_post
and wp_kses
are also unusual in that they can be used to escape, and to sanitise, and can be used multiple times on a value.
A Note on Early Escaping
This is a near mortal sin that can almost undo everything escaping gives you. Once something has been escaped, we know it is safe and secure to output it, but, if we then assign it to a variable, who knows what might happen to it between escaping and output. If that variable gets modified, passed to a function, or piped through a filter, it’s no longer safe, it’s status is a mystery. We can escape it again but now we’ve double escaped, so we might have made safe data dangerous, or mangled good data.
So About Those Filters
Should we sanitize and validate the
apply_filters()
like the examples below?
It depends on the context
On input:
- Sanitise
- then validate
- if it’s valid then proceed, else reject
On output to the browser/requests/etc:
- you can sanitise and validate once fetched from the database if you like, but the #1 priority is to escape, escape only once, and do it on the moment of output. Don’t store it in a variable, that’s early escaping and it’s dangerous
- Escape after filtering, not before, who knows what the filter did to a known safe value, once the value is returned from the filter its safety and status are a mystery
absint( apply_filters( 'slug_excerpt_length', 35 ) );
Great we now know this value is definitely a number, and a positive number too. If we prefix this statement with echo
then that’s a safe escaped value. Else it’s just sanitisation that’s cleaned up the value.
wp_kses_post( apply_filters( 'slug_excerpt_more', '…' ) );
Great, this is both sanitising and escaping if we immediately output it, but if we save it to a variable, it’s just sanitising.
esc_url( apply_filters( 'slug_login_url', home_url( "https://wordpress.stackexchange.com/" ) ) );
This is escaping, and needs an echo
statement. If we assign this to a variable then the escaping was for nought and we introduce a precarious situation.
If the question on the other hand is, should we double check the return values of filters? Yes, that would be wise, but overcautious. In that scenario I would expect that this would be testing for filters that aren’t implemented properly, e.g. returning text where a number is expected. In that scenario, validation is the only option, escaping and sanitisation would be inappropriate.
Exceptions
- When using
the_content
filter, pass the value throughwp_kses_post
then pass it into the filter and immediatley echo, e.g.echo apply_filters( 'the_content', wp_kses_post( $dangerous ) );
- In shortcodes, escape and use output buffers if you must, so that you can return a string