One for the gurus: upgrade to 3.x messed up only filenames with accented chars

Maybe you should consider option C). Convert all accented characters to normal UTF-8 characters. So EXPRESSÃO.jpg -> EXPRESSAO.jpg I think this would help you a lot, not only when it come sto coding and file systems, but also storing names / references in databases. Update This is a function I use for removing accents. I … Read more

Strange characters – despite everything being UTF-8

This is typically caused when you are copying/pasting MS Word information into the WordPress content editor. WordPress uses something called “Smart Quotes”, via a function named wptexturize(). Ideal Solution The ideal solution would be to go back through your content, and replace all single/double quotes using the keyboard. However, if you’re working with massive copy/pastes, … Read more

Byte and char conversion in Java

A character in Java is a Unicode code-unit which is treated as an unsigned number. So if you perform c = (char)b the value you get is 2^16 – 56 or 65536 – 56. Or more precisely, the byte is first converted to a signed integer with the value 0xFFFFFFC8 using sign extension in a widening conversion. This in turn is … Read more

What is the difference between UTF-8 and Unicode?

To expand on the answers others have given: We’ve got lots of languages with lots of characters that computers should ideally display. Unicode assigns each character a unique number, or code point. Computers deal with such numbers as bytes… skipping a bit of history here and ignoring memory addressing issues, 8-bit computers would treat an … Read more

Is ‘# -*- coding: utf-8 -*-‘ also a comment in Python?

Yes, it is also a comment. And the contents of that comment carry special meaning if located at the top of the file, in the first two lines. From the Encoding declarations documentation: If a comment in the first or second line of the Python script matches the regular expression coding[=:]\s*([-\w.]+), this comment is processed as an encoding declaration; the … Read more