What Do They Do?
wp_filter_nohtml_kses
strips all HTML from a string, that’s it. It does it via the wp_kses
function and it expects slashed data, here’s its implementation:
function wp_filter_nohtml_kses( $data ) {
return addslashes( wp_kses( stripslashes( $data ), 'strip' ) );
}
sanitize_text_field
on the other hand does more than that, the doc says:
- Checks for invalid UTF-8,
- Converts single < characters to entities
- Strips all tags
- Removes line breaks, tabs, and extra whitespace
- Strips octets
Specificially the point of sanitize_text_field
is to sanitize text fields. Since you want to sanitize a text field, you should use sanitize_text_field
.
So Why Does One Article Recommend wp_filter_nohtml_kses
?
It doesn’t.
The author correctly recognised the variable may contain HTML, and searched for a function to do just that, and found wp_filter_nohtml_kses
.
However this is neither the best function for this job, or the appropriate one. sanitize_text_field
targets the actual problem, that the field needs sanitising, and does a more than strip out tags.
The author could easily have chosen wp_strip_all_tags
, but should have gone for sanitize_text_field
.
Which brings us to the main points:
- Not all guides are correct
- When faced with a choice between 2 functions, pick the one that literally says what you want to do, not the obscure one
wp_filter_nohtml_kses
would not have protected against octal characters, and other things thatsanitize_text_field
does, but perhaps the author was unaware of those attack vectors?- Some of the things
sanitize_text_field
does are less about security and more about cleanliness, e.g. stripping extra whitespace
Escaping vs Sanitising
URL’s have esc_url_raw() and so on.
Not quite, esc_url_raw
is an escaping function, escaping is not sanitisation! Though this is a rare exception where it can be used as a sanitising function
- Sanitising data cleans it
- Validating data confirms it
- Escaping data secures it
Sanitise and validate on input, escape on output.
Another way to think of it, is like a gameshow:
- Sanitising is the cleanup they do to get you ready for TV
- Validation is the judge at the end of the round who checks that you did what you were asked
- Escaping is the giant wall with the shaped hole you have to fit through or it’ll push you off the ledge
For example, consider this value: " [email protected] <b>hello!</b> "
- we sanitize with
sanitize_email
, eliminating trailing spaces, etc, giving us[email protected]
- we can validate it with
validate_email
to check if it is indeed an email - At this point we can process the input
Some time passes…
- now we need to output the data, so we escape it
echo esc_url( 'mailto:'.$email );
Now we get a mailto
link with an email, it will always be a link. There is no dithering about it should be a link, it can be a link, it’s supposed to be a link, etc. It is guaranteed to be a link, there is now certainty. No assumptions need to be made.
Lets say that $email
was sanitised and validated, and saved, yet the sites database was mangled or modified during a hack. Now, $email
contains a JS bitcoin miner! Or it did until esc_url
mangled it into an email. The email isn’t usable, but it has the format of an email.
For this reason, escaping is only done on output, uses an escaping function appropriate for the exected output, not the data ( esc_attr
for html attributes, esc_html
for text, esc_url
for URLs, and so on ). Escaping is also done at the latest possible moment, closest to output. This way escaped data can’t be modified after its been escaped, and there’s no confusion about when something was escaped that might cause double escaping.
For this reason, avoid adding HTML to a variable then echoing it at the end, or passing around complex HTML fragments in variables. You have no way to know if they’re safe or not