cbloom rants: 01/2016

1/29/2016

Oodle Network Usage Notes

Two things I thought to write down.

1. Oodle Network speed is very cache sensitive.

Oodle Network uses a shared trained model. This is typically 4 - 8 MB. As it compresses or decompresses, it needs to access random bits of that memory.

If you compress/decompress a packet when that model is cold (not in cache), every access will be a cache miss and performance can be quite poor.

In synthetic test, coding packets over and over, the model is as hot as possible (in caches). So performance can seem better in synthetic test loops than in the real world.

In real use, it's best to batch up all encoding/decoding operations as much as possible. Rather than do :


decode one packet
apply packet to world
do some other stuff

decode one packet
apply packet to world
do some other stuff

...

try to group all the Oodle Network encoding & decoding together :


gather up all my packets to send

receive all packets from network stack

encode all my outbound packets
decode all my inbound packets

now act on inbound packets

this puts all the usage of the shared model together as close as possible to try to maximize the amount that the model is found in cache.

2. Oodle Network should not be used on already compressed data. Oodle Network should not be used on large packets.

Most games send pre-compressed data of various forms. Some send media files such as JPEGs that are already compressed. Some send big blobs that have been packed with zlib. Some send audio data that's already been compressed.

This data should be excluded from the Oodle Network path and send without going through the compressor. It won't get any compression on them and will just take CPU time. (you could send them as a packet with complen == rawlen, which is a flag for "raw data" in Oodle Network).

More importantly, these packets should NOT be included in the training set for building the model. They are essentially random bytes and will just crud up the model. It's a bit like if you're trying to memorize the digits of Pi and someone keeps yelling random numbers in your ear. (Well, actually it's not like that at all, but those kind of totally bullshit analogies seem very popular, so there you are.)

On large packets that are not precompressed, Oodle Network will work, but it's just not the best choice. It's almost always better to use an Oodle LZ data compressor (BitKnit, LZNIB, whatever, depending on your space-speed tradeoff desired).

The vast majority of games have a kind of bipolar packet distribution :


A. normal frame update packets < 1024 bytes

B. occasional very large packets > 4096 bytes

it will work better to only use Oodle Network on the type A packets (smaller, standard updates) and to use Oodle LZ on the type B packets (rarer, large data transfers).

For example some games send the entire state of the level in the first few packets, and then afterward send only deltas from that state. In that style, the initial big level dump should be sent through Oodle LZ, and then only the smaller deltas go through Oodle Network.

Not only will Oodle LZ do better on the big packets, but by excluding them from the training set for Oodle Network, the smaller packets will be compressed better because the data will all have similar structure.

1/16/2016

Oodle 2.1.2

Oodle 2.1.2 is out. Oodle - now with more BitKnit!


Oodle 2.1.2 example_lz_chart [file] [repeats]
got arg : input=r:\testsets\big\lzt99
got arg : num_repeats=5
lz test loading: r:\testsets\big\lzt99
uncompressed size : 24700820
---------------------------------------------------------------
chart cell contains : raw/comp ratio : encode mb/s : decode mb/s
LZB16: LZ-bytewise: super fast to encode & decode, least compression
LZNIB: LZ-nibbled : still fast, but more compression; between LZB & LZH
LZHLW: LZ-Huffman : like zip/zlib, but much more compression & faster
LZNA : LZ-nib-ANS : very high compression with faster decodes than LZMA
All compressors can be run at different encoder effort levels
---------------------------------------------------------------
       |   VeryFast  |   Fast      |   Normal    |   Optimal1  |
LZB16  |1.51:517:2988|1.57:236:2971|1.62:109:2964|1.65: 37:3003|
LZBLW  |1.64:249:2732|1.74: 80:2682|1.77: 24:2679|1.85:1.6:2708|
LZNIB  |1.80:264:1627|1.92: 70:1557|1.94: 23:1504|2.04: 12:1401|
LZHLW  |2.16: 67: 424|2.30: 20: 447|2.33:7.2: 445|2.35:5.4: 445|
BitKnit|2.43: 28: 243|2.47: 20: 245|2.50: 13: 249|2.54:6.4: 249|
LZNA   |2.36: 24: 115|2.54: 18: 119|2.58: 13: 120|2.69:4.9: 120|
---------------------------------------------------------------
compression ratio:
       |   VeryFast  |   Fast      |   Normal    |   Optimal1  |
LZB16  |    1.510    |    1.569    |    1.615    |    1.654    |
LZBLW  |    1.636    |    1.739    |    1.775    |    1.850    |
LZNIB  |    1.802    |    1.921    |    1.941    |    2.044    |
LZHLW  |    2.161    |    2.299    |    2.330    |    2.355    |
BitKnit|    2.431    |    2.471    |    2.499    |    2.536    |
LZNA   |    2.363    |    2.542    |    2.584    |    2.686    |
---------------------------------------------------------------
encode speed (mb/s):
       |   VeryFast  |   Fast      |   Normal    |   Optimal1  |
LZB16  |    517.317  |    236.094  |    108.555  |     36.578  |
LZBLW  |    248.537  |     80.299  |     23.663  |      1.610  |
LZNIB  |    263.950  |     69.930  |     22.617  |     11.735  |
LZHLW  |     67.154  |     20.019  |      7.200  |      5.425  |
BitKnit|     28.203  |     20.223  |     12.672  |      6.371  |
LZNA   |     24.192  |     18.423  |     12.883  |      4.907  |
---------------------------------------------------------------
decode speed (mb/s):
       |   VeryFast  |   Fast      |   Normal    |   Optimal1  |
LZB16  |   2988.429  |   2971.339  |   2963.616  |   3003.187  |
LZBLW  |   2731.951  |   2681.796  |   2678.558  |   2707.534  |
LZNIB  |   1626.806  |   1557.309  |   1504.097  |   1400.654  |
LZHLW  |    423.936  |    446.990  |    444.832  |    445.040  |
BitKnit|    242.916  |    245.409  |    248.812  |    248.972  |
LZNA   |    114.791  |    119.369  |    119.994  |    120.362  |
---------------------------------------------------------------

Another test :


Oodle 2.1.2 example_lz_chart [file] [repeats]
got arg : input=r:\game_testset_m0.7z
got arg : num_repeats=5
lz test loading: r:\game_testset_m0.7z
uncompressed size : 79290970
---------------------------------------------------------------
chart cell contains : raw/comp ratio : encode mb/s : decode mb/s
LZB16: LZ-bytewise: super fast to encode & decode, least compression
LZNIB: LZ-nibbled : still fast, but more compression; between LZB & LZH
LZHLW: LZ-Huffman : like zip/zlib, but much more compression & faster
LZNA : LZ-nib-ANS : very high compression with faster decodes than LZMA
All compressors can be run at different encoder effort levels
---------------------------------------------------------------
       |   VeryFast  |   Fast      |   Normal    |   Optimal1  |   Optimal2  |
LZB16  |1.4:1039:4304|1.41:438:4176|1.42:184:4202|1.44: 52:4293|1.44:4.5:4407|
LZBLW  |1.51:380:3855|1.55:124:3778|1.56: 26:3774|1.62:1.0:3862|1.62:1.0:3862|
LZNIB  |1.56:346:2406|1.59: 84:2398|1.62: 24:2054|1.67: 15:2048|1.67: 10:2053|
LZHLW  |1.67: 85: 647|1.74: 25: 679|1.75:6.5: 635|1.77:3.3: 613|1.79:1.5: 618|
BitKnit|1.83: 24: 395|1.90: 18: 409|1.90: 12: 408|1.91:7.1: 402|1.91:6.5: 401|
LZNA   |1.78: 22: 171|1.84: 18: 178|1.88: 12: 185|1.93:5.6: 167|1.93:1.5: 167|
---------------------------------------------------------------
compression ratio:
       |   VeryFast  |   Fast      |   Normal    |   Optimal1  |   Optimal2  |
LZB16  |    1.390    |    1.408    |    1.424    |    1.436    |    1.442    |
LZBLW  |    1.509    |    1.548    |    1.558    |    1.615    |    1.615    |
LZNIB  |    1.557    |    1.593    |    1.622    |    1.669    |    1.668    |
LZHLW  |    1.669    |    1.745    |    1.754    |    1.767    |    1.790    |
BitKnit|    1.825    |    1.897    |    1.905    |    1.913    |    1.915    |
LZNA   |    1.781    |    1.838    |    1.878    |    1.927    |    1.932    |
---------------------------------------------------------------
encode speed (mb/s):
       |   VeryFast  |   Fast      |   Normal    |   Optimal1  |   Optimal2  |
LZB16  |   1038.910  |    437.928  |    184.457  |     52.008  |      4.465  |
LZBLW  |    380.030  |    123.621  |     26.028  |      0.973  |      0.973  |
LZNIB  |    345.905  |     83.577  |     24.299  |     14.544  |     10.444  |
LZHLW  |     84.519  |     25.218  |      6.542  |      3.256  |      1.547  |
BitKnit|     24.116  |     17.944  |     12.476  |      7.052  |      6.464  |
LZNA   |     21.859  |     18.034  |     11.767  |      5.602  |      1.465  |
---------------------------------------------------------------
decode speed (mb/s):
       |   VeryFast  |   Fast      |   Normal    |   Optimal1  |   Optimal2  |
LZB16  |   4304.144  |   4175.854  |   4202.491  |   4292.925  |   4406.853  |
LZBLW  |   3855.255  |   3777.826  |   3774.093  |   3861.922  |   3861.582  |
LZNIB  |   2406.379  |   2397.753  |   2054.429  |   2048.329  |   2053.340  |
LZHLW  |    646.796  |    679.173  |    635.035  |    613.051  |    617.994  |
BitKnit|    394.599  |    408.539  |    408.044  |    402.239  |    401.352  |
LZNA   |    171.111  |    177.565  |    184.677  |    167.439  |    166.904  |
---------------------------------------------------------------


vs LZMA :
ratio: 1.901
enc  : 2.70 mb/s
dec  : 30.27 mb/s

On this file, BitKnit is 13X faster to decode than LZMA, and gets more compression. (or at "Normal" level, the ratio is similar and BitKnit is 4.6X faster to encode).

cbloom rants

1/29/2016

Oodle Network Usage Notes

1/16/2016

Oodle 2.1.2

old rants