python-chardet – universal character encoding detector for Python2

Chardet takes a sequence of bytes in an unknown character encoding, and attempts to determine the encoding.

Supported encodings:

  • ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
  • Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
  • EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese)
  • EUC-KR, ISO-2022-KR (Korean)
  • KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
  • ISO-8859-2, windows-1250 (Hungarian)
  • ISO-8859-5, windows-1251 (Bulgarian)
  • windows-1252 (English)
  • ISO-8859-7, windows-1253 (Greek)
  • ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
  • TIS-620 (Thai)

This library is a port of the auto-detection code in Mozilla.

Package availability chart
Distribution Base version Our version Architectures
Debian GNU/Linux 10.0 (buster) 3.0.4-3 3.0.4-1~nd100+1 i386, amd64, sparc, armel
Debian GNU/Linux 8.0 (jessie) 2.3.0-1 3.0.4-1~nd80+1 i386, amd64, sparc, armel
Debian GNU/Linux 9.0 (stretch) 2.3.0-2 3.0.4-1~nd90+1 i386, amd64, sparc, armel
Debian testing (bullseye) 3.0.4-4    
Debian unstable (sid) 3.0.4-4 3.0.4-1~nd+1 i386, amd64, sparc, armel
Ubuntu 14.04 “Trusty Tahr” (trusty) 2.0.1-2build2 3.0.4-1~nd14.04+1 i386, amd64, sparc, armel
Ubuntu 16.04 “Xenial Xerus” (xenial) 2.3.0-2 3.0.4-1~nd16.04+1 i386, amd64, sparc, armel
Ubuntu 18.04 “Bionic Beaver” (bionic) 3.0.4-1    
Ubuntu 19.04 “Disco Dingo” (disco) 3.0.4-3    

Comments

blog comments powered by Disqus