python-chardet – universal character encoding detector for Python2

Chardet takes a sequence of bytes in an unknown character encoding, and attempts to determine the encoding.

Supported encodings:

  • ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)

  • Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)

  • EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese)

  • EUC-KR, ISO-2022-KR (Korean)

  • KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)

  • ISO-8859-2, windows-1250 (Hungarian)

  • ISO-8859-5, windows-1251 (Bulgarian)

  • windows-1252 (English)

  • ISO-8859-7, windows-1253 (Greek)

  • ISO-8859-8, windows-1255 (Visual and Logical Hebrew)

  • TIS-620 (Thai)

This library is a port of the auto-detection code in Mozilla.

Package availability chart

Distribution

Base version

Our version

Architectures

Debian GNU/Linux 10.0 (buster)

3.0.4-3

3.0.4-1~nd100+1

i386, amd64, sparc, armel, ppc64el

Debian GNU/Linux 11.0 (bullseye)

4.0.0-1

Debian GNU/Linux 12.0 (bookworm)

5.1.0+dfsg-2

Debian GNU/Linux 9.0 (stretch)

2.3.0-2

3.0.4-1~nd90+1

i386, amd64, sparc, armel

Debian testing (trixie)

5.2.0+dfsg-1

Debian unstable (sid)

5.2.0+dfsg-1

3.0.4-1~nd+1

i386, amd64, sparc, armel

Ubuntu 16.04 “Xenial Xerus” (xenial)

2.3.0-2

3.0.4-1~nd16.04+1

i386, amd64, sparc, armel

Ubuntu 18.04 “Bionic Beaver” (bionic)

3.0.4-1

Ubuntu 20.04 “Focal Fossa” (focal)

3.0.4-4build1

Ubuntu 22.04 “Jammy Jellyfish” (jammy)

4.0.0-1

Comments

blog comments powered by Disqus