Escaping / encoding data before insert into a database?

You escape on output, what I suspect here is a confusion between escaping sanitizing and validating

  • Sanitise when data arrives. This strips out stuff that shouldn’t be there, e.g. upper case letters in a lower case string, words and letters in a phone number, trailing spaces etc. Sanitising cleans data
    • common sanitising functions include trim, sanitize_key, wp_strip_tags, intval, wp_kses_post, etc
  • Validate data once it’s been sanitised. Is that phone number really a phone number? Did somebody say they were -20 years old? Why does that mans address describe the recipe for cheese pasties? If it doesn’t validate, REJECT the data. If it does, process then store the value.
    • common validation includes regular expressions, is_numeric, checking string length, is_email, etc, validation usually requires context specific checks, e.g. enforcing values sit within a range, or that they appear in a whitelist array
  • Escape when you’re outputting that saved data to the frontend. Do this once and only once, at the moment of output. Don’t return complex HTML fragments in functions, echo them, and escape when you echo. Escaping might mangle your output if unsafe stuff makes it through, but it guarantees that the URL is always a URL, even if it’s a broken URL, that the text is always text, even if it contained a dangerous tag, etc
    • Escaping functions include, but aren’t limited to esc_html, esc_attr, esc_url, wp_kses_post, etc

Note that sanitising and validation happen when data is incoming, data that came from an external source such as the browser.

Escaping happens when data is being sent, to an API/user/browser. Escaping is like the cookie cutter that enforces expectations. If the variable contains a HTML attribute, esc_attr will ensure it’s an attribute, and doesn’t contain anything that’s invalid for an attribute, even if that means mangling the value to make it fit, removing bits. Of course, if you’ve escaped correctly, valid values will never be mangled.

Note that if you escape a value more than once, it can allow carefully crafted data to break out and render malicious output in some circumstances. This is why you only escape on output, and always as late as is possible. Pre-escaped data introduces the headache of having to track what has and hasn’t been escaped, aint nobody got time for that.

With escaping, trust is unnecessary, you should never have to trust that something is safe when you can force it to be safe.