python-indexed-gzip – fast random access of gzip files in Python

Drop-in replacement IndexedGzipFile for the built-in Python gzip.GzipFile class that does not need to start decompressing from the beginning of the file when for every seek(). It gets around this performance limitation by building an index, which contains seek points, mappings between corresponding locations in the compressed and uncompressed data streams. Each seek point is accompanied by a chunk (32KB) of uncompressed data which is used to initialise the decompression algorithm, allowing to start reading from any seek point. If the index is built with a seek point spacing of 1MB, only 512KB (on average) of data have to be decompressed to read from any location in the file.

This package provides the Python 2 module.

Package availability chart
Distribution Base version Our version Architectures
Debian GNU/Linux 8.0 (jessie)   0.6.1-1~nd80+1 amd64
Debian GNU/Linux 9.0 (stretch)   0.6.1-1~nd90+1 amd64
Debian testing (buster) 0.6.1-1    
Debian unstable (sid) 0.6.1-1 0.6.1-1~nd+1 amd64
Ubuntu 16.04 “Xenial Xerus” (xenial)   0.3.1.1-1~nd16.04+1 amd64
Ubuntu 17.04 “Zesty Zapus” (zesty)   0.3.1.1-1~nd90+1+nd17.04+1 amd64
Ubuntu 17.10 “Artful Aardvark” (artful) 0.3.1-1build2    

Comments

blog comments powered by Disqus