What is the difference between sanitize_text_field() and wp_filter_nohtml_kses()?

What Do They Do?

wp_filter_nohtml_kses strips all HTML from a string, that’s it. It does it via the wp_kses function and it expects slashed data, here’s its implementation:

function wp_filter_nohtml_kses( $data ) {
    return addslashes( wp_kses( stripslashes( $data ), 'strip' ) );
}

sanitize_text_field on the other hand does more than that, the doc says:

  • Checks for invalid UTF-8,
  • Converts single < characters to entities
  • Strips all tags
  • Removes line breaks, tabs, and extra whitespace
  • Strips octets

Specificially the point of sanitize_text_field is to sanitize text fields. Since you want to sanitize a text field, you should use sanitize_text_field.

So Why Does One Article Recommend wp_filter_nohtml_kses?

It doesn’t.

The author correctly recognised the variable may contain HTML, and searched for a function to do just that, and found wp_filter_nohtml_kses.

However this is neither the best function for this job, or the appropriate one. sanitize_text_field targets the actual problem, that the field needs sanitising, and does a more than strip out tags.

The author could easily have chosen wp_strip_all_tags, but should have gone for sanitize_text_field.

Which brings us to the main points:

  1. Not all guides are correct
  2. When faced with a choice between 2 functions, pick the one that literally says what you want to do, not the obscure one
  3. wp_filter_nohtml_kses would not have protected against octal characters, and other things that sanitize_text_field does, but perhaps the author was unaware of those attack vectors?
  4. Some of the things sanitize_text_field does are less about security and more about cleanliness, e.g. stripping extra whitespace

Escaping vs Sanitising

URL’s have esc_url_raw() and so on.

Not quite, esc_url_raw is an escaping function, escaping is not sanitisation! Though this is a rare exception where it can be used as a sanitising function

  • Sanitising data cleans it
  • Validating data confirms it
  • Escaping data secures it

Sanitise and validate on input, escape on output.

Another way to think of it, is like a gameshow:

  • Sanitising is the cleanup they do to get you ready for TV
  • Validation is the judge at the end of the round who checks that you did what you were asked
  • Escaping is the giant wall with the shaped hole you have to fit through or it’ll push you off the ledge

For example, consider this value: " [email protected] <b>hello!</b> "

  • we sanitize with sanitize_email, eliminating trailing spaces, etc, giving us [email protected]
  • we can validate it with validate_email to check if it is indeed an email
  • At this point we can process the input

Some time passes…

  • now we need to output the data, so we escape it
  • echo esc_url( 'mailto:'.$email );

Now we get a mailto link with an email, it will always be a link. There is no dithering about it should be a link, it can be a link, it’s supposed to be a link, etc. It is guaranteed to be a link, there is now certainty. No assumptions need to be made.

Lets say that $email was sanitised and validated, and saved, yet the sites database was mangled or modified during a hack. Now, $email contains a JS bitcoin miner! Or it did until esc_url mangled it into an email. The email isn’t usable, but it has the format of an email.

For this reason, escaping is only done on output, uses an escaping function appropriate for the exected output, not the data ( esc_attr for html attributes, esc_html for text, esc_url for URLs, and so on ). Escaping is also done at the latest possible moment, closest to output. This way escaped data can’t be modified after its been escaped, and there’s no confusion about when something was escaped that might cause double escaping.

For this reason, avoid adding HTML to a variable then echoing it at the end, or passing around complex HTML fragments in variables. You have no way to know if they’re safe or not