Python – Reading and writing csv files with utf-8 encoding

You report three separate problems. This is a bit of a guess into the blue, because there’s not enough information to be sure, but you should try the following:

  1. input encoding: As suggested in comments, try “utf-8-sig”. This will remove the Byte Order Mark (BOM) from your input.
  2. double quotes: Among the csv parameters, you specify quoting=csv.QUOTE_NONE. This tells the csv library that the CSV table was written without using quotes (for escaping characters that could otherwise be mistaken for field or row separators). However, this is apparently not true, since the input has quotes around each field. Try csv.QUOTE_MINIMAL (the default) or csv.QUOTE_ALL instead.
  3. output encoding: You say the output contains “weird symbols”. I suspect that the output is actually alright, but you are using a tool which doesn’t properly display UTF-8 text by default: many Windows applications (such as Excel) still prefer UTF-16 and localised 8-bit encodings like CP-1255. Like for problem 1, you should try the codec “utf-8-sig”: the BOM is understood as an encoding hint by many viewers/editors.

Leave a Comment