Skip to content Skip to sidebar Skip to footer

Open A File In The Proper Encoding Automatically

I'm dealing with some problems in a few files about the encoding. We receive files from other company and have to read them (the files are in csv format) Strangely, the files appea

Solution 1:

chardet can help you.

Character encoding auto-detection in Python 2 and 3. As smart as your browser. Open source.

Solution 2:

It won't be "fixed" in python 3, as it's not a fixable problem. Many documents are valid in several encodings, so the only way to determine the proper encoding is to know something about the document. Fortunately, in most cases we do know something about the document, like for instance, most characters will come clustered into distinct unicode blocks. A document in english will mostly contain characters within the first 128 codepoints. A document in russian will contain mostly cyrillic codepoints. Most document will contain spaces and newlines. These clues can be used to help you make educated guesses about what encodings are being used. Better yet, use a library written by someone who's already done the work. (Like chardet, mentioned in another answer by Desintegr.

Solution 3:

csv.reader cannot handle Unicode strings in 2.x. See the bottom of the csv documentation and this question for ways to handle it.

Solution 4:

If it will be fixed in Python 3, it should also be fixed by using

from __future__ import unicode_literals

Post a Comment for "Open A File In The Proper Encoding Automatically"