Split function add: \xef\xbb\xbf…\n to my list

Your file contains UTF-8 BOM in the beginning.

To get rid of it, first decode your file contents to unicode.

fp = open("file.txt")
data = fp.read().decode("utf-8-sig").encode("utf-8")

But better don’t encode it back to utf-8, but work with unicoded text. There is a good rule: decode all your input text data to unicode as soon as possible, and work only with unicode; and encode the output data to the required encoding as late as possible. This will save you from many headaches.

To read bigger files in a certain encoding, use io.open or codecs.open.

Also check this.

Use str.strip() or str.rstrip() to get rid of the newline character \n.

Leave a Comment