What’s the difference between UTF-8 and UTF-8 without BOM?

The UTF-8 BOM is a sequence of bytes at the start of a text stream (0xEF, 0xBB, 0xBF) that allows the reader to more reliably guess a file as being encoded in UTF-8. Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary. According to the Unicode standard, the BOM … Read more

Using awk to remove the Byte-order mark

Try this: On the first record (line), remove the BOM characters. Print every record. Or slightly shorter, using the knowledge that the default action in awk is to print the record: 1 is the shortest condition that always evaluates to true, so each record is printed. Enjoy! — ADDENDUM — Unicode Byte Order Mark (BOM) FAQ includes … Read more