What is the difference between UTF-8 and Unicode?

To expand on the answers others have given: We’ve got lots of languages with lots of characters that computers should ideally display. Unicode assigns each character a unique number, or code point. Computers deal with such numbers as bytes… skipping a bit of history here and ignoring memory addressing issues, 8-bit computers would treat an … Read more

python encoding utf-8

You don’t need to encode data that is already encoded. When you try to do that, Python will first try to decode it to unicode before it can encode it back to UTF-8. That is what is failing here: Just write your data directly to the file, there is no need to encode already-encoded data. If you instead build up unicode values instead, you would … Read more

What does the ‘b’ character do in front of a string literal?

To quote the Python 2.x documentation: A prefix of ‘b’ or ‘B’ is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A ‘u’ or ‘b’ prefix may be followed by an ‘r’ prefix. The Python 3 documentation states: Bytes literals are … Read more

What does the ‘b’ character do in front of a string literal?

To quote the Python 2.x documentation: A prefix of ‘b’ or ‘B’ is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A ‘u’ or ‘b’ prefix may be followed by an ‘r’ prefix. The Python 3 documentation states: Bytes literals are … Read more

What’s the difference between ASCII and Unicode?

ASCII defines 128 characters, which map to the numbers 0–127. Unicode defines (less than) 221 characters, which, similarly, map to numbers 0–221 (though not all numbers are currently assigned, and some are reserved). Unicode is a superset of ASCII, and the numbers 0–127 have the same meaning in ASCII as they have in Unicode. For example, the … Read more