What’s the difference between UTF-8 and UTF-8 without BOM?

The UTF-8 BOM is a sequence of bytes at the start of a text stream (0xEF, 0xBB, 0xBF) that allows the reader to more reliably guess a file as being encoded in UTF-8. Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary. According to the Unicode standard, the BOM … Read more

get char value in java

char is actually a numeric type containing the unicode value (UTF-16, to be exact – you need two chars to represent characters outside the BMP) of the character. You can do everything with it that you can do with an int. Character.getNumericValue() tries to interpret the character as a digit.