Slug Formatting : Acceptable Characters?

No, forward slashes are not allowed to be used in slugs. However, they are automatically stripped out when trying to include slashes in a slug on the post editing screen. Slugs are sanitized by using sanitize_title().

Unfortunately, there is no easy way to explain what is and isn’t allowed in a slug, let alone to give a simple set of rules to which valid slugs must adhere. However, we can summarise the most important characteristics of a slug quite compactly: a slug consists of solely lowercase alphanumeric characters, dashes and underscores, without 2 or more dashes in a row (sequences of underscores are allowed). Furthermore, a slug cannot start or end with a hyphen.

sanitize_title()

Parameters:

  • $title: Title to be sanitized
  • $fallback_title (optional, defaults to empty string): Slug to use if the sanitized $title turns out empty
  • $content (optional, defaults to ‘save’): The operation for which the string is sanitized

The function sanitize_title behaves in the following way (by default, with default $fallback_title and $context arguments, with no external filters applied):

  1. Accented characters such as “é” and “Ö” are replaced by their corresponding “unaccented” characters (in this example, “e” and “O”, respectively)
  2. sanitize_title_with_dashes() is called on the resulting string

The longer (and more complete) explanation can be found below.

sanitize_title_with_dashes()

Besides the $title parameter, sanitize_title_with_dashes() has two more arguments, the first of which isn’t used, and second of which is the content, $context, which is passed from sanitize_title (and is thus ‘save’). sanitize_title_with_dashes() does a lot of things, so I’ll try to explain what sequence of steps it performs to sanitize a string.

  1. All HTML tags are stripped
  2. Percentage signs are removed, except for the ones in octets (such as %20 for space)
  3. The string is converted to lowercase
  4. All HTML entities, such as & are stripped
  5. All dots (.) are replaced by hyphens (-)
  6. Both en dashes and em dashes (WikiPedia on dashes), as well as spaces, are converted to hyphens
  7. Some special characters, such as accents (e.g. the grave accent) are stripped
  8. The “times” character (×) is replaced by “x”
  9. All characters except for alphanumerical characters, spaces, hyphens and underscores are stripped
  10. Sequences of hyphens (2 or more hyphens in a row) are replaced by a single hyphen
  11. Leading and trailing hyphens are stripped

And there you have it, the full title-sanitization process!

Leave a Comment