encoding - Read For Learn

How to stop WordPress using utf8mb4_unicode_520_ci collation?

Not using utf8mb4_unicode…. collation and using a utf8 one instead is a security problem. The right answer is to upgrade your sites and DBs so they all use utf8mb4

Truncating custom fields

See the discussion for Taxonomy Short Description for a better way to shorten a string. I’m not aware of a WP function that is getting truncation right. Here is my code based on the linked discussion: /** * Shortens an UTF-8 encoded string without breaking words. * * @param string $string string to shorten * … Read more

Encode text string being appended as query to URL [closed]

To encode the URL, you could use the PHP urlencode( $url ) function or use the WordPress urlencode_deep( $array | $str ); function. add_shortcode( ‘dynamic_contact_button’, ‘button_product_page’ ); function button_product_page() { global $product; return urlencode( “https://wordpress.stackexchange.com/contact-form/?products=Product:%20” .$product->get_title(). “&#contact_form” ); } links: WordPress – urlencode_deep urlencode

Strange characters – despite everything being UTF-8

This is typically caused when you are copying/pasting MS Word information into the WordPress content editor. WordPress uses something called “Smart Quotes”, via a function named wptexturize(). Ideal Solution The ideal solution would be to go back through your content, and replace all single/double quotes using the keyboard. However, if you’re working with massive copy/pastes, … Read more

If a hacker changed the blog_charset to UTF-7 does that make WordPress vulnerable to further attacks?

< and > are encoded as +ADw- and +AD4- in UTF-7. Now imagine the following: Someone sends +ADw-script+AD4-alert(+ACI-Hello+ACI-)+ADw-/script+AD4- as comment text. It will pass all sanitation unescaped. The database expects and treats all incoming data as UTF-8. Since all UTF-7 streams are valid UTF-8 too, this will never result in a SQL error, and mysql_real_escape … Read more

Random Question Mark Icons In WordPress Text

I am re-designing a WordPress site, and I imported all of the many thousand articles and noticed all of these little icons showing up randomly in the text. Sometimes it takes up 1 character, sometimes a whole sentence. They also show up in the source the same way: �� Tried different themes, as well as … Read more

C# Encoding a text string with line breaks

Yes – it means you’re using \n as the line break instead of \r\n. Notepad only understands the latter. (Note that Environment.NewLine suggested by others is fine if you want the platform default – but if you’re serving from Mono and definitely want \r\n, you should specify it explicitly.)

What is “=C2=A0” in MIME encoded, quoted-printable text?

=C2=A0 represents the bytes C2 A0. Since this is UTF-8, it translates to U+00A0, which is the Unicode for non-breaking space. See UTF-8 (Wikipedia).

Byte and char conversion in Java

A character in Java is a Unicode code-unit which is treated as an unsigned number. So if you perform c = (char)b the value you get is 2^16 – 56 or 65536 – 56. Or more precisely, the byte is first converted to a signed integer with the value 0xFFFFFFC8 using sign extension in a widening conversion. This in turn is … Read more

Python decoding Unicode is not supported

Looks like google.searchGoogle(param) already returns unicode: So what you want is: As a side note, your code expects it to return a utf-8 encoded string so what was the point in decoding it (using unicode()) and encoding back (using .encode()) using the same encoding?

+ More