Text LZSS

Reference options

Reference length:

Static (fixed number of bits)

Dynamic (variable length with unary encoded log prefix)

Offset bits		Defines how many bits are used to to store the match offset. This effectively controls the dictionary size. Larger values allow references from further away.
Length bits		Controls how many bits are used to store the length of a match. Larger values allow for referencing larger chunks.
Min length		The minimum match length ensures that only sufficiently large matches will be used. Referencing a too small match would take more space than including it as a literal would.

Encoding options

Display required bits
Encode literals as UTF-8 (instead of UTF-16)

Examples

Aaca

Dovahkiin

Alice in Wonderland

Digits of Pi

Random

Unicode and surrogates

What is this?

The text is compressed using a (crude) variant of LZSS. This basically means that any long, repeated character sequence gets replaced with a short reference to its previous location, thus saving space.

Each element in LZSS output is either a literal or a reference. (In comparison, LZ77 exclusively uses references and literals combined in pairs.) Red boxes represent literals (uncompressed parts). Blue boxes contain references. A reference consists of an offset denoting how far to look back, and a length showing how many literals to copy. The minimum length of a reference should be chosen as not to be wasteful. This depends on the encoding, with dynamic encoding more flexibility. Note that references may contain themselves. For instance, 'aaaaa' can be stored as 'literal a' followed by 'reference offset 1, length 4'.

This variant of LZSS does not make use of Huffman trees, which would encode emitted symbols to represent them in just a few bits instead of whole bytes.