Fork me on GitHub

Text LZSS

Reference options

Reference length:
Static (fixed number of bits)
Dynamic (variable length with unary encoded log prefix)
Defines how many bits are used to to store the match offset. This effectively controls the dictionary size. Larger values allow references from further away.
Controls how many bits are used to store the length of a match. Larger values allow for referencing larger chunks.
The minimum match length ensures that only sufficiently large matches will be used. Referencing a too small match would take more space than including it as a literal would.

Encoding options


Examples

Aaca
Dovahkiin
Alice in Wonderland
Digits of Pi
Random
Unicode and surrogates



What is this?

The text is compressed using a (crude) variant of LZSS. This basically means that any long, repeated character sequence gets replaced with a short reference to its previous location, thus saving space.

Each element in LZSS output is either a literal or a reference. (In comparison, LZ77 exclusively uses references and literals combined in pairs.) Red boxes represent literals (uncompressed parts). Blue boxes contain references. A reference consists of an offset denoting how far to look back, and a length showing how many literals to copy. The minimum length of a reference should be chosen as not to be wasteful. This depends on the encoding, with dynamic encoding more flexibility. Note that references may contain themselves. For instance, 'aaaaa' can be stored as 'literal a' followed by 'reference offset 1, length 4'.

This variant of LZSS does not make use of Huffman trees, which would encode emitted symbols to represent them in just a few bits instead of whole bytes.