If I understand correctly, WordPress automatically applies a wp_slash() on the global $_POST variable: should this means that for any $_POST variable, prior to saving to the DB, we should first unslash it? Which one of the following solution is the correct one?
Is the $_GET variable also slashed? Do we need to apply the same process?
I searched the WordPress Github repository and found no evidence for it:
https://github.com/search?q=repo%3AWordPress%2FWordPress+wp_slash&type=code&p=1
In truth sanitisation is quite straight forward, especially if we rephrase the question in terms of food. Normally we clean food before we cook it, so it makes no sense to ask if you should clean each bite of food before putting it in your mouth. Sanitising is cleaning data.
Likewise sanitisation is only part of the story. Sanitising is cleaning, but washing a car does not make it edible. You need to sanitise, validate, and escape.
- sanitise at the moment the data is ingested/retrieved
- we do this to clean the data and make it suitable for processing
- e.g. removing trailing spaces, remove letters from values that are numbers, enforcing character case etc
- validate immediately afterwards
- validation tests to see if the value has the appropriate format. E.g. “banana” is a string, and would pass sanitisation, but it is not a valid date. Likewise we can sanitise
" 5 "
to the number5
but5
is too short to be a phone number. - validation returns a true or false, it either is or is not valid
- validation tests to see if the value has the appropriate format. E.g. “banana” is a string, and would pass sanitisation, but it is not a valid date. Likewise we can sanitise
- escape, this is very important but unfortunately forgotten by most. Escaping is like a cookie cutter, it guarantees the output of data. For example
echo esc_url( $url )
always outputs a URL, even if$url
is not a URL. The result may be garbage and of no use, but it will be a URL and thus will be safe to use in places such as<a href="
as we know it can’t break out of that attribute.
Which functions you use to perform these tasks are super specific to the situation and context. There is no catch all sanitize_everything
function, just like there is no universal validate_variable
function, because doing so would require a human being.
At which point should we sanitize data? In the two following examples, I am applying the sanitize_text_field() function before, or after an in_array() check. Which one of them is the more appropriate?
The goal of sanitisation is to clean data before processing, it’s not just about security. Sanitising something does not automatically make it safe.
For example, consider this code:
$input = $_GET['search'];
if ( $input == 'test' ) {
echo 'you searched for test';
}
It would fail if search
was given a value that started with a space, or if the user searched for TEST
, etc. Sanitisation normalises the value before processing, not just to make it secure, but to prevent bugs.
WordPress provides some helper functions to assist with sanitising inputs, but it only does a few things, and you can and should do additional checks based on the type of data you are ingesting.
https://developer.wordpress.org/reference/functions/sanitize_text_field/
- Checks for invalid UTF-8,
- Converts single < characters to entities
- Strips all tags
- Removes line breaks, tabs, and extra whitespace
- Strips percent-encoded characters
As you can see sanitize_text_field
would be inappropriate for a HTML fragment, and doesn’t do enough sanitation for more specific fields such as numerical fields or date fields. It also won’t validate the text field contains the type of value you expect it to contain.
we should first unslash it?
Even if WordPress did this, which I can find no evidence that it does, you would only do this if you needed to do this, which would become very obvious very quickly. If you need to ask the question then either you haven’t attempted to use POST
variables or the answer is no.
On a final note, these nested if statements are unnecessary, and would be much better if they were guards that returned early:
global $theme_options;
if (!empty($query->is_search)) {
if (!empty($theme_options['allowed_search_post_types'])) {
if (isset($_GET['post_type']) && $_GET['post_type'] !== '') {
// code...
vs
function mytheme__customsearch_pre_get_posts($query) {
global $theme_options;
if ( empty($query->is_search ) ) {
return;
}
if ( empty($theme_options['allowed_search_post_types']) ) {
return;
}
if ( empty($_GET['post_type'])) {
return;
}
// code...
The resulting code has a lower cyclomatic complexity, is prone to fewer mistakes and bugs, and is easier to read and extend despite doing the same thing.
Also note the use of tabs for indentation. WP coding standards use a tab shown via 4 spaces, unlike the 6 space configuration used in the code.
You could also have filtered the post type registration to disable search.