python-chardet – universal character encoding detector for Python2¶
- Related packages
- More information
- External resources
Chardet takes a sequence of bytes in an unknown character encoding, and attempts to determine the encoding.
Supported encodings:
- ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
- Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
- EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese)
- EUC-KR, ISO-2022-KR (Korean)
- KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
- ISO-8859-2, windows-1250 (Hungarian)
- ISO-8859-5, windows-1251 (Bulgarian)
- windows-1252 (English)
- ISO-8859-7, windows-1253 (Greek)
- ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
- TIS-620 (Thai)
This library is a port of the auto-detection code in Mozilla.
Distribution | Base version | Our version | Architectures |
---|---|---|---|
Debian GNU/Linux 10.0 (buster) | 3.0.4-3 | 3.0.4-1~nd100+1 | i386, amd64, sparc, armel, ppc64el |
Debian GNU/Linux 9.0 (stretch) | 2.3.0-2 | 3.0.4-1~nd90+1 | i386, amd64, sparc, armel |
Debian testing (bullseye) | 4.0.0-1 | ||
Debian unstable (sid) | 4.0.0-1 | 3.0.4-1~nd+1 | i386, amd64, sparc, armel |
Ubuntu 16.04 “Xenial Xerus” (xenial) | 2.3.0-2 | 3.0.4-1~nd16.04+1 | i386, amd64, sparc, armel |
Ubuntu 18.04 “Bionic Beaver” (bionic) | 3.0.4-1 | ||
Ubuntu 20.04 “Focal Fossa” (focal) | 3.0.4-4build1 | ||
Ubuntu 20.10 “Groovy Gorilla” (groovy) | 3.0.4-7 |