aboutsummaryrefslogtreecommitdiff
path: root/circuitpython/tools/huffman/README.rst
blob: 507bb01589a80c537d4c1d97cc1f293080b34b4b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
huffman
=======

Generate Huffman codebooks! `Huffman codes`_ are the optimal way to compress individual symbols into a binary sequence that can be unambiguously decoded without inter-symbol separators (it is "`prefix-free`_").

Provided an iterable of 2-tuples in ``(symbol, weight)`` format, generate a Huffman codebook, returned as a dictionary in ``{symbol: code, ...}`` format.

::

    >>> huffman.codebook([('A', 2), ('B', 4), ('C', 1), ('D', 1)])
    {'A': '10', 'B': '0', 'C': '110', 'D': '111'}

If you have an iterable of symbols, the ``collections.Counter`` is a handy way to tally them up.

::

    >>> huffman.codebook(collections.Counter('man the stand banana man').items())
    {' ': '111',
     'a': '10',
     'b': '0100',
     'd': '0110',
     'e': '11010',
     'h': '0101',
     'm': '1100',
     'n': '00',
     's': '11011',
     't': '0111'}


.. _Huffman codes: https://en.wikipedia.org/wiki/Huffman_coding
.. _prefix-free: https://en.wikipedia.org/wiki/Prefix_code