12-31-13 - Statically Linked DLL

Oodle for Windows is shipped as a DLL.

I have to do this because the multiplicity of incompatible CRT libs on Windows has made shipping libs for Windows an impossible disaster.

(seriously, jesus christ, stop adding features to your products and make it so that we can share C libs. Computers are becoming an increasingly broken disaster of band-aided together non-functioning bits.)

The problem is that clients (reasonably so) hate DLLs. Because DLLs are also an annoying disaster on Windows (having to distribute multiple files, accidentally loading from an unexpected place, and what if you have multiple products that rely on different versions of the same DLL, etc.).

Anyway, it seems to me that the best solution is actually a "statically linked DLL".

The DLL is the only way on Windows to combine code packages without mashing their CRT together, and being able to have some functions publicly linked and others resolved privately. So you want that. But you don't want an extra file dangling around that causes problems, you just want it linked into your EXE.

You can build your DLL as DelayLoad, and do the LoadLibrary for it yourself, so the client still sees it like a normal import lib, but you actually get the DLL from inside the EXE. The goal is to act like a static lib, but avoid all the link conflict problems.

The most straightforward way to do it would be to link the DLL in to your EXE as bytes, and at startup write it out to a temp dir, then LoadLibrary that file.

The better way is to write your own "LoadLibraryFromMem". A quick search finds some leads on that :

Loading Win3264 DLLs manually without LoadLibrary() - CodeProject
Loading a DLL from memory � ~magogpublic
LoadLibrary replacement - Source Codes - rohitab.com - Forums

Crazy or wise?


12-12-13 - Call for Game Textures

... since I've had SOooo much luck with these calls for data in the past. Anyway, optimism...

I need game textures! They must be from a recent game (eg. modern resolutions), and I need the original RGB (not the shipped DXTC or whatever).

If you can provide some, please contact me.


11-25-13 - Oodle and the real problems in games

When I started work on Oodle, I specifically didn't want to do a compression library. For one thing, I had done a lot of compression and don't like to repeat the same work, I need new topics and things to learning. For another, I doubted that we could sell a compression library; selling compression is notoriously hard, because there's so much good free stuff out there, and even if you do something amazing it will only be 10% better than the free stuff due to the diminishing returns asymptote; customers also don't understand things like space-speed tradeoffs. But most of all I just didn't think that a compression library solved important problems in games. Any time I do work I don't like to go off into esoteric perfectionism that isn't really helping much, I like to actually attack the important problems.

That's why Oodle was originally a paging / packaging / streaming / data management product. I thought that we had some good work on that at Oddworld and it seemed natural to take those concepts and clean them up and sell that to the masses. It also attacks what I consider to be very important problems in games.

Unfortunately it became pretty clear that nobody would buy a paging product. Game companies are convinced that they "already have" that, or that they can roll it themselves easily (in fact they don't have that, and it's not easy). On the other hand we increasingly saw interest in a compression library, so that's the direction we went.

(I don't mean to imply that the clients are entirely at fault for wanting the wrong thing; it's sort of just inherently problematic to sell a paging library, because it's too fundamental to the game engine. It's something you want to write yourself and have full control over. Really the conception of Oodle was problematic from the beginning, because the ideal RAD product is a very narrow API that can be added at the last second and does something that is not too tied to the fundamental operation of the game, and also that game coders don't want to deal with writing themselves)

The two big problems that I wanted to address with Original Oodle was -

1. Ridiculous game level load times.

2. Ridiculous artist process ; long bake times ; no real previews, etc.

These are very different problems - one is the game runtime, one is in the dev build and tools, but they can actually be solved at the same time by the same system.

Oodle was designed to offer per-resource paging; async IO and loading, background decompression. Resources could be grouped into bundles; the same resource might go into several bundles to minimize loads and seeks. Resources could be stored in various levels of "optimization" and the system would try to load the most-optimized. Oodle would store hashes and mod times so that old/incompatible data wouldn't be loaded. By checking times and hashes you can do a minimal incremental rebuild of only the bundles that need to be changed.

The same paging system can be used for hot-loading, you just page out the old version and page in the new one - boom, hot loaded content. The same system can provide fast in-game previews. You just load an existing compiled/optimized level, and then page in a replacement of the individual resource that the artist is working on.

The standard way to use such a system is that you still have a nightly content build that makes the super-optimized bundles of the latest content, but then throughout the day you can make instant changes to any of the content, and the newer versions are automatically loaded instead of the nightly version. It means that you're still loading optimized bakes for 90% of the content (thus load times and such aren't badly affected) but you get the latest all day long. And if the nightly bake ever fails, you don't stop the studio, people just keep working and still see all the latest, it just isn't all fully baked.

These are important problems, and I still get passionate about them (aka enraged) when I see how awful the resource pipeline is at most game companies.

(I kept trying to add features to the paging product to make it something that people would buy; I would talk to devs and say "what does it need to do for you to license it", and everybody would say something different (and even if it had that feature they wouldn't actually license it). That was a bad road to go down; it would have led to huge feature bloat, been impossible to maintain, and made a product that wasn't as lean and focused as it should be; customers don't know what they want, don't listen to them!)

Unfortunately, while compression is very interesting theoretically, make a compressor that's 5% better than an alternative is just not that compelling in terms of the end result that it has.


11-14-13 - Oodle Packet Compression for UDP

Oodle now has compressors for UDP (unordered / stateless) packets. Some previous posts on this topic :

cbloom rants 05-20-13 - Thoughts on Data Compression for MMOs
cbloom rants 08-08-13 - Oodle Static LZP for MMO network compression
cbloom rants 08-19-13 - Sketch of multi-Huffman Encoder

What I'm doing for UDP packet is static model compression. That is, you pre-train some model based on a capture of typical network data. That model is then const and can be just written out to a file for use in your game. At runtime, you read the model from disk, then it is const and shared by all network channels. This is particularly desirable for large scale servers because there is no per-channel overhead, either in channel startup time or memory use.

(ASIDE : Note that there is an alternative for UDP, which is to build up a consistent history between the encoder and decoder by having the decoder send back "acks", and then making sure the encoder uses only ack'ed packets as history, etc. etc. An alternative is to have the encoder mark packets with a description of the history used to encode them, and then when the decoder gets them if it doesn't have the necessary history it drops the packet and requests it be resent or something. I consider these a very bad idea and Oodle won't do them; I'm only looking at UDP compression that uses no transmission history.)

Call for test data! I currently only have a large network capture from one game, which obviously skews my results. If you make a networked game and can provide real-world sample data, please contact me.

Now for a mess of numbers comparing the options.

UDP compression of packets (packet_test.bin)

order-0 static huffman :  371.1 -> 234.5 average
(model takes 4k bytes)

order-0 static multi-huffman (32 huffs) : 371.1 -> 209.1 average
(model takes 128k bytes)

order-2 static arithmetic model : 371.0 -> 171.1 average
(model takes 549444 bytes)

OodleStaticLZP for UDP : 371.0 -> 93.4 average
(model takes 13068456 bytes)

In all cases there is no per-channel memory use. OodleStaticLZP is the recommended solution.

For comparison, the TCP compressors get :

LZB16 models take : 131072 bytes per channel
LZB16 [sw16|ht14] : 371.0 -> 122.6 average

LZNib models take : 1572864 bytes per channel
LZnib [sw19|ht18] : 371.0 -> 90.8 average

LZP models take : 104584 bytes per channel, 12582944 bytes shared
LZP [8|19] : 371.0 -> 76.7 average

zlib uses around 400k per channel
zlib -z3 : 371.0 -> 121.8 average
zlib -z6 : 371.0 -> 111.8 average

For MMO type scenarios (large number of connections, bandwidth is important), LZP is a huge win. It gets great compression with low per-channel memory use. The other compelling use case is LZNib when you are sending large packets (so per-byte speed is important) and have few connections (so per-channel memory use is not important); the advantage of LZNib is that it's quite fast to encode (faster than zlib-3 for example) and gets pretty good compression.

To wrap up, logging the variation of compression under some options.

LZPUDP can use whatever size of static dictionary you want. More dictionary = more compression.

LZPUDP [dic mb | hashtable log2]

LZPUDP [4|18] : 595654217 -> 165589750 = 3.597:1
1605378 packets; 371.0 -> 103.1 average
LZPUDP [8|19] : 595654217 -> 154353229 = 3.859:1
1605378 packets; 371.0 -> 96.1 average
LZPUDP [16|20] : 595654217 -> 139562083 = 4.268:1
1605378 packets; 371.0 -> 86.9 average
LZPUDP [32|21] : 595654217 -> 113670899 = 5.240:1
1605378 packets; 371.0 -> 70.8 average

And MultiHuffman can of course use any number of huffmans.

MultiHuffman [number of huffs | number of random trials]

MultiHuffman [1|8] : 66187074 -> 41830922 = 1.582:1
178376 packets; 371.1 -> 234.5 average, H = 5.056
MultiHuffman [2|8] : 66187074 -> 39869575 = 1.660:1
178376 packets; 371.1 -> 223.5 average, H = 4.819
MultiHuffman [4|8] : 66187074 -> 38570016 = 1.716:1
178376 packets; 371.1 -> 216.2 average, H = 4.662
MultiHuffman [8|8] : 66187074 -> 38190760 = 1.733:1
178376 packets; 371.1 -> 214.1 average, H = 4.616
MultiHuffman [16|8] : 66187074 -> 37617159 = 1.759:1
178376 packets; 371.1 -> 210.9 average, H = 4.547
MultiHuffman [32|8] : 66187074 -> 37293713 = 1.775:1
178376 packets; 371.1 -> 209.1 average, H = 4.508

On the test data that I have, the packets are pretty homogenous, so more huffmans is not a huge win. If you had something like N very different types of packets, you would expect to see big wins as you go up to N and then pretty flat after that.

Public note to self : it would amuse me to try ACB for UDP compression. ACB with dynamic dictionaries is not Pareto because it's just too slow to update that data structure. But with a static precomputed suffix sort, and optionally dynamic per-channel coding state, it might be good. It would be slower & higher memory use than LZP, but more compression.


10-14-13 - Oodle's Fast LZ4

Oodle now has a compressor called "LZB" (LZ-Bytewise) which is basically a variant of Yann's LZ4 . A few customers were testing LZNib against LZ4 and snappy and were bothered that it was slower to decode than those (though it offers much more compression and I believe it is usually a better choice of tradeoffs). In any case, OodleLZB is now there for people who care more about speed than ratio.

Oodle LZB has some implementation tricks that could also be applied to LZ4. I thought I'd go through them as an illustration of how you make compressors fast, and to give back to the community since LZ4 is nicely open source.

OodleLZB is 10-20% faster to decode than LZ4. (1900 mb/s vs. 1700 mb/s on lzt99, and 1550 mb/s vs. 1290 mb/s on all_dds). The base LZ4 implementation is not bad, it's in fact very fast and has the obvious optimizations like 8-byte word copies and so on. I'm not gonna talk about fiddly junk like do you do ptr++ or ++ptr , though that stuff can make a difference on some platforms. I want to talk about how you structure a decode loop.

The LZ4 decoder is like this :

U8 * ip; // = compressed input
U8 * op; // = decompressed output

    int control = *ip++;

    int lrl = control>>4;
    int ml_control = control&0xF;

    // get excess lrl
    if ( lrl == 0xF ) AddExcess(lrl,ip);

    // copy literals :
    copy_literals(op, ip, lrl);

    ip += lrl;
    op += lrl;

    if ( EOF ) break;

    // get offset
    int offset = *((U16 *)ip);

    int ml = ml_control + 4;
    if ( ml_control == 0xF ) AddExcess(ml,ip);

    // copy match :
    if ( overlap )
        copy_match_overlap(op, op - offset, ml );
        copy_match_nooverlap(op, op - offset, ml );

    op += ml;

    if ( EOF ) break;

and AddExcess is :

#define AddExcess(val,cp)   do { int b = *cp++; val += b; if ( b != 0xFF ) break; } while(1)

So, how do we speed this up?

The main thing we want to focus on is branches. We want to reduce the number of branches on the most common paths, and we want to make maximum use of the branches that we can't avoid.

There are four branches we want to pay attention to :

1. the checks for control nibbles = 0xF
2. the check for match overlap
3. the check of LRL and match len inside the match copiers
4. the EOF checks

So last one first; the EOF check is the easiest to eliminate and also the least important. On modern chips with good branch prediction, the highly predictable branches like that don't cost much. If you know that your input stream is not corrupted (because you've already checked a CRC of some kind), then you can put the EOF check under one of the rare code cases, like perhaps LRL control = 0xF. Just make the encoder emit that rare code when it hits EOF. On Intel chips that makes almost no speed difference. (if you need to handle corrupted data without crashing, don't do that).

On to the substantial ones. Note that branch #3 is hidden inside "copy_literals" and "copy_match". copy_literals is something like :

for(int i=0;i<lrl;i+=8)
    *((U64 *)(dest+i)) = *((U64 *)(src+i));

(the exact way to do copy_literals quickly depends on your platform; in particular are offset-addresses free? and are unaligned loads fast? Depending on those, you would write the loop in different ways.)

We should notice a couple of things about this. One is that the first branch on lrl is the most important. Later branches are rare, and we're also covering a lot of bytes per branch in that case. When lrl is low, we're getting a high number of branches per byte. Another issue about that is that the probability of taking the branch is very different the first time, so you can help the branch-prediction in the chip by doing some explicit branching, like :

if ( lrl > 0 )
    *((U64 *)(dest)) = *((U64 *)(src));
    if ( lrl > 8 )
        *((U64 *)(dest+8)) = *((U64 *)(src+8));
        if ( lrl > 16 )
            // .. okay now do a loop here ...

We'll also see later that the branch on ( lrl > 0 ) is optional, and in fact it's best to not do it - just always unconditionally/speculatively copy the first 8 bytes.

The next issue we should see is that the branch for lrl > 16 there for the copier is redundant with the check for the control value (lrl == 0xF). So we should merge them :

change this :

    // get excess lrl
    if ( lrl == 0xF ) AddExcess(lrl,ip);

    // copy literals :
    copy_literals(op, ip, lrl);

to :

    // get excess lrl
    if ( lrl == 0xF )

        // lrl >= 15
        copy_literals_long(op, ip, lrl);
        // lrl < 15
        copy_literals_short(op, ip, lrl);

this is the principle that we can't avoid the branch on the LRL escape code, so we should make maximum use of it. That is, it caries extra information - the branch tells us something about the value of LRL, and any time we take a branch we should try to make use of all the information it gives us.

But we can do more here. If we gather some stats about LRL we see something like :

% of LRL >= 0xF : 8%
% of LRL > 8 : 15%
% of LRL <= 8 : 85%
the vast majority of LRL's are short. So we should first detect and handle that case :

    if ( lrl <= 8 )
        if ( lrl > 0 )
            *((U64 *)(op)) = *((U64 *)(ip));
        // lrl > 8
        if ( lrl == 0xF )

            // lrl >= 15
            copy_literals_long(op, ip, lrl);
            // lrl > 8 && < 15
            *((U64 *)(op)) = *((U64 *)(ip));
            *((U64 *)(op+8)) = *((U64 *)(ip+8));

which we should then rearrange a bit :

    // always immediately speculatively copy 8 :
    *((U64 *)(op)) = *((U64 *)(ip));

    if_unlikely( lrl > 8 )
        // lrl > 8
        // speculatively copy next 8 :
        *((U64 *)(op+8)) = *((U64 *)(ip+8));

        if_unlikely ( lrl == 0xF )

            // lrl >= 15
            // can copy first 16 without checking lrl
            copy_literals_long(op, ip, lrl);
    op += lrl;
    ip += lrl;

and we have a faster literal-copy path.

Astute readers may notice that we can now be stomping past the end of the buffer, because we always do the 8 byte copy regardless of LRL. There are various ways to prevent this. One is to make your EOF check test for (end - 8), and when you break out of that loop, then you have a slower/safer loop to finish up the end. (ADD : not exactly right, see notes in comments)

Obviously we should do the exact same procedure with the ml_control. Again the check for spilling the 0xF nibble tells us something about the match length, and we should use that information to our advantage. And again the short matches are by far the most common, so we should detect that case and make it as quick as possible.

The next branch we'll look at is the overlap check. Again some statistics should be our guide : less than 1% of all matches are overlap matches. Overlap matches are important (well, offset=1 overlaps are important) because they are occasionally very long, but they are rare. One option is just to forbid overlap matches entirely, and really that doesn't hurt compression much. We won't do that. Instead we want to hide the overlap case in another rare case : the excess ML 0xF nibble case.

The way to make compressors fast is to look at the code you have to write, and then change the format to make that code fast. That is, write the code the way you want it and the format follows - don't think of a format and then write code to handle it.

We want our match decoder to be like this :

if ( ml_control < 0xF )
    // 4-18 byte short match
    // no overlap allowed
    // ...
    // long match OR overlap match
    if ( offset < 8 )
        ml = 4; // back up to MML

        // do offset < 8 possible overlap match
        // ...

        // do offset >= 8 long match
        // ...

so we change our encoder to make data like that.

Astute readers may note that overlap matches now take an extra byte to encode, which does cost a little compression ratio. If we like we can fix that by reorganizing the code stream a little (one option is to send one ml excess byte before sending offset and put a flag value in there; since this is all in the very rare pathway it can be more complex in its encoding), or we can just ignore it, it's around a 1% hit.

That's it.

One final note, if you want to skip all that, there is a short cut to get much of the win.

The simplest case is the most important - no excess lrl, no excess ml - and it occurs around 40% of all control bytes. So we can just detect it and fastpath that case :

U8 * ip; // = compressed input
U8 * op; // = decompressed output

    int control = *ip++;

    int lrl = control>>4;
    int ml_control = control&0xF;

    if ( (control & 0x88) == 0 )
        // lrl < 8 and ml_control < 8

        // copy literals :
        *((U64 *)(op)) = *((U64 *)(ip));
        ip += lrl;
        op += lrl;

        // get offset
        int offset = *((U16 *)ip);
        int ml = ml_control + 4;    

        // copy match; 4-11 bytes :
        U8 * from = op - offset;
        *((U64 *)(op)) = *((U64 *)(from));
        *((U32 *)(op+8)) = *((U32 *)(from+8));

        op += ml;


    // ... normal decoder here ...

This fastpath doesn't help much with all the other improvements we've done here, but you can just drop it on the original decoder and get much of the win. Of course beware the EOF handling. Also this fastpath assumes that you have forbidden overlaps from the normal codes and are sending the escape match code (0xF) for overlap matches.

ADDENDUM : A few notes :

In case it's not clear, one of the keys to this type of fast decoder is that there's a "hot region" in front of the output pointer that contains undefined data, which we keep stomping on over and over. eg. when we do the 8-byte literal blasts regardless of lrl. This has a few consequences you must be aware of.

One is if you're trying to decode in a minimum size circular window (eg. 64k in this case). Offsets to matches that are near window size (like 65534) are actually wrapping around to be just *ahead* of the current output pointer. You cannot allow those matches in the hot region because those bytes are undefined. There are a few fixes - don't use this decoder for 64k circular windows, or don't allow your encoder to output offsets that cause that problem.

A similar problem arises with parallelizing the decode. If you're decoding chunks of the stream in parallel, you cannot allow the hot region of one thread to be stomping past the end of its chunk, which another thread might be reading from. To handle this Oodle defines "seek points" which are places that you are allowed to parallelize the decode, and the coders ensure that the hot regions do not cross seek points. That is, within a chunk up to the seek point, the decoder is allowed to do these wild blast-aheads, but as it gets close to a seek point it must break out and fall into a safer mode (this can be done with a modified end condition check).


10-09-13 - Urgh ; Threads and Memory

This is a problem I've been trying to avoid really facing, so I keep hacking around it, but it keeps coming back to bite me every few months.

Threads/Jobs and memory allocation is a nasty problem.

Say you're trying to process some 8 GB file on a 32-bit system. You'd like to fire up a bunch of threads and let them all crank on chunks of the file simultaneously. But how big of a chunk can each thread work on? And how many threads can you run?

The problem is those threads may need to do allocations to do their processing. With free-form allocations you don't necessarily know in advance how much they need to allocate (it might depend on processing options or the data they see or whatever). With a multi-process OS you also don't know in advance how much memory you have available (it may reduce while you're running). So you can't just say "I have the memory to run 4 threads at a time". You don't know. You can run out of memory, and you have to abort the whole process and try again with less threads.

In case it's not obvious, you can't just try running 4 threads, and when one of them runs out of memory you pause that thread and run others, or kill that thread, because the thread may do work and allocations incrementally, like work,alloc,work,alloc,etc. so that by the time an alloc fails, you're alread holding a bunch of other allocs and no other thread may be able to run.

To be really clear, imagine you have 2 MB free and your threads do { alloc 1 MB, work A, alloc 1 MB, work B }. You try to run 2 threads, and they both get up to work A. Now neither thread can continue because you're out of memory.

The real solution is for each Job to pre-declare its resource requirements. Like "I need 80 MB" to run. Then it becomes the responsibility of the Job Manager to do the allocation, so when the Job is started, it is handed the memory and it knows it can run; all allocations within the Job then come from the reserved pool, not from the system.

(there are of course other solutions; for example you could make all your jobs rewindable, so if one fails an allocation it is aborted (and any changes to global state undone), or similarly all your jobs could work in two stages, a "gather" stage where allocs are allowed, but no changes to the global state are allowed, and a "commit" phase where the changes are applied; the job can be aborted during "gather" but must not fail during "commit").

So the Job Manager might try to allocate memory for a job, fail, and run some other jobs that need less memory. eg. if you have jobs that take { 10, 1, 10, 1 } of memories, and you have only 12 memories free, you can't run the two 10's at the same time, but you can run the 1's while a 10 is running.

While you're at it, you may as well put some load-balancing in your Jobs as well. You could have each Job mark up to what extend it needs CPU, GPU, or IO resources (in addition to memory use). Then the Job Manager can try to run jobs that are of different types (eg. don't run two IO-heavy jobs at the same time).

If you want to go even more extreme, you could have Jobs pre-declare the shared system resources that they need locks on, and the Job Manager can try to schedule jobs to avoid lock contention. (the even super extreme version of this is to pre-declare *all* your locks and have the Job Manager take them for you, so that you are gauranteed to get them; at this point you're essentially making Jobs into snippets that you know cannot ever fail and cannot even ever *block*, that is they won't even start unless they can run straight to completion).

I haven't wanted to go down this route because it violates one of my Fundamental Theorems of Jobs, which is that job code should be the same as main-thread code, not some weird meta-language that requires lots of markup and is totally different code from what you would write in the non-threaded case.

Anyway, because I haven't properly addressed this, it means that in low-memory scenarios (eg. any 32-bit platform), the Oodle compressors (at the optimal parse level) can run out of memory if you use too many worker threads, and it's hard to really know that's going to happen in advance (since the exact memory use depends on a bunch of options and is hard to measure). Bleh.

(and obviously what I need to do for Oodle, rather than solving this problem correctly and generally, is just to special case my LZ string matchers and have them allocate their memory before starting the parallel compress, so I know how many threads I can run)


10-04-13 - The Reality of Coding

How you actually spend time.

4 hours - think about a problem, come up with a new algorithm or variation of algorithm, or read papers to find an algorithm that will solve your problem. This is SO FUCKING FUN and what keeps pulling me back in.

8 hours - do initial test implementation to prove concept. It works! Yay!

And now the fun part is over.

50 hours - do more careful implementation that handles all the annoying corner cases; integrate with the rest of your library; handle failures and so on. Provide lots of options for slightly different use cases that massively increase the complexity.

50 hours - adding features and fixing rare bugs; spread out over the next year

20 hours - have to install new SDKs to test it; inevitably they've broken a bunch of APIs and changed how you package builds so waste a bunch of time on that

10 hours - some stupid problem with Win8 loading the wrong drivers; or the linux dir my test is writing to is no longer writeble by my user account; whatever stupid computer problem; chase that around for a while

10 hours - the p4 server is down / vpn is down / MSVC has an internal compiler error / my laptop is overheating / my hard disk is full, whatever stupid problem always holds you up.

10 hours - someone checks in a breakage to the shared library; it would take a minute just to fix it, but you can't do that because it would break them so you have to do meetings to agree on how it should be fixed

10 hours - some OS API you were using doesn't actually behave the way you expected, or has a bug; some weird corner case or undocumented interaction in the OS API that you have to chase down

40 hours - writing docs and marketing materials, teaching other members of your team, internal or external evangelizing

30 hours - some customer sees a bug on some specific OS or SDK version that I no longer have installed; try to remote debug it, that doesn't work, try to find a repro, that doesn't work, give up and install their case; in the end it turns out they had bad RAM or something silly.

The reality is that as a working coder, the amount of time you actually get to spend working on the good stuff (new algorithms, hacking, clever implementations) is vanishingly small.


10-03-13 - SetLastError(0)

Public reminder to myself about something I discovered a while ago.

If you want to do IO really robustly in Windows, you can't just assume that your ReadFile / WriteFile will succeed under normal usage. There are lots of nasty cases where you need to retry (perhaps with smaller IO sizes, or just after waiting a bit).

In particular you can see these errors in normal runs :

ERROR_NOT_ENOUGH_MEMORY = too many AsyncIO 's pending

ERROR_NOT_ENOUGH_QUOTA  = single IO call too large
    not enough process space pages available
    -> SetProcessWorkingSetSize

    failure to alloc pages in the kernel address space for the IO
    try again with smaller IOs  

    normal async IO return value

    sometimes normal EOF return value

anyway, this post is not about the specifics of IO errors. (random aside : I believe that some of these annoying errors were much more common in 32-bit windows; the failure to get address space to map IO pages was a bigger problem in 32-bit (I saw it most often when running with the /3GB option which makes the kernel page space a scarce commodity), I don't think I've seen it in the field in 64-bit windows)

I discovered a while ago that ReadFile and WriteFile can fail (return false) but not set last error to anything. That is, you have something like :

SetLastError(77); // something bogus

if ( ! ReadFile(....) )
    // failure, get code :
    DWORD new_error = GetLastError();

    // new_error should be the error info about ReadFile failing
    // but sometimes it's still 77

and *sometimes* new_error is still 77 (or whatever; that is, it wasn't actually set when ReadFile failed).

I have no idea exactly what situations make the error get set or not. I have no idea how many other Win32 APIs are affected by this flaw, I only have empirical proof of ReadFile and WriteFile.

Anyhoo, the conclusion is that best practice on Win32 is to call SetLastError(0) before invoking any API where you need to know for sure that the error code you get was in fact set by that call. eg.

if ( ! SomeWin32API(...) )
    DWORD hey_I_know_this_error_is_legit = GetLastError();


That is, Win32 APIs returning failure does *not* guarantee that they set LastError.

ADD : while I'm at it :

in the VC watch window is pretty sweet.

GetLastError is :

*(DWORD *)($tib+0x34)

or *(DWORD *)(FS:[0x34]) on x86


09-27-13 - Playlist for Rainy Seattle

Who the fuck turned the lights out in the world? Hello? I'm still here, can you turn the lights back on please?

Playlist for the gray :

(Actually now that I think about it, "playlist for the gray" is really just the kind of music I listened to when I was young (and, apparently, depressed). It reminds me of sitting in the passenger seat of a car, silent, looking out the window, it's raining, the world passing.)

Music request :

Something I've been seeking for a while and am having trouble finding : really long repetitive tracks. Preferably analog, not techno or dance music. Like some guitar strumming and such. Not pure drone, not just like one phrase repeated over and over, but a song, a proper song, just a really long version of that song. And not some awful Phish crap either; I don't mean "jazz" or anything with long improvs that get away from the basic song structure. I don't want big Mogwai walls of sound or screeching anything either; nothing experimental; I hate songs that build to a big noise crescendo, no no, just keep the steady groove going. Just regular songs, But that go on and on.

Any pointers appreciated. Particularly playlists or "best of" lists on music sites. There must be some DJ kid doing super-long remixes of classic rock songs, right? Where is he?

Some sort of in the right vein examples :

Neu! - Hallogallo (10 min)
Traffic - Mr Fantasy Live (11 min) (maybe not a great example, too solo-y)
YLT - Night Falls on Hoboken (17 min)


09-20-13 - House Design

Blah blah blah cuz it's on my mind.

First a general rant. There's nothing worse than "designers". The self-important sophomoric egotism of them is just mind boggling. Here's this product (or in this case, house plan) that has been refined over 1000's of years by people actually using it. Oh no, I know better. I'm so fucking great that I can throw all that out and come up with something off the top of my head and it will be an improvement. I don't have to bother with prototyping or getting evaluations from the people who actual use this stuff every day because my half-baked ideas are so damn good. I don't need to bother learning about how this product or layout has been fiddled with and refined in the past because my idea *must* be brand new, no one could have possibly done the exact same thing before and proved that it was a bad idea.

And onto the random points -

X. Of course the big fallacy is that a house is going to make your life better or fix anything. One of the most hillarious variants of this is people who put in a specific room for a gym or a bar or a disco, because of course in their wonderful new house they'll be working out and having parties and lots of friends. Not sitting in front of the TV browsing donkey porn like they do in their old house.

X. Never use new technology. It won't work long term, or it will be obsolete. You don't want a house that's like a computer and needs to be replaced in a few years. Use old technology that works. That goes for both the construction itself and for any integrated gadgets. Like if you get some computer-controlled power and lighting system; okay, good luck with that, I hope it's not too hard to rip out of the walls in five years when it breaks and the replacement has gone out of production. Haven't you people ever used electronics in your life? How do you not know this?

X. Living roof? WTF are you thinking? What a nightmare of maintenance. And it's just a huge ceiling leak inevitably. Yeah, I'm sure that rubber membrane that had several leaks during construction is gonna be totally fine for the next 50 years.

X. Assume that all caulks, all rubbers, all glues, all plastics will fail at some point. Make them easy to service and don't rely on them for crucial waterproofing.

X. Waterproofing should be primarily done with the "shingle principle". That is, mechanical overlaps - not glues, caulks, gaskets, coatings. Water should have to run upwards in order to get somewhere you don't want it.

X. Lots of storage. People these days want to eliminate closets to make rooms bigger. WTF do you need those giant rooms for? Small rooms are cosy. Storage everywhere makes it easy to keep the rooms uncluttered. So much nicer to have neat clean small rooms. The storage needs to be in every single room, not centralized, so you aren't walking all over the house every time you want something.

X. Rooms are good. Small rooms. I feel like there are two extreme trends going on these days that are both way off the ideal - you have the McMansion types that are making 5000 sqft houses, and then the "minimalist" types trying to live in 500 sqft to prove some stupid point. Both are retarded. I think the ideal amount of space for two people is around 1200-1500 sqft. For a family of three it's 1500-2000 or so.

X. Doors are good. Lofts are fucking retarded. Giant open single spaces are awful. Yeah it's okay if you're single, but if there are two people in the house you might just want to do different things and not hear each other. Haven't you ever lived in a place like that? It's terrible. Doors and separated spaces are wonderful things. (I like traditional Japanese interiors with the sliding screens so that you can rearrange spaces; maybe an open living-dining room, but with a sliding door through the middle to make it into two rooms when you want that? Not sure.)

X. Shadow gaps, weird kitchen islands, architectural staircases, sunken areas, multiple levels. Bleh. You've got to think about the cleaning. These things might look okay when it's first built, but they're a nightmare for maintenace.

X. Use trim. The popular thing these days seems to be trim-less walls. (part of the sterile awful "I live in a museum" look). In classic construction trim is partly used to hide the boundary between two surfaces that might not have a perfect joint, or that are different materials and thus might move against each other over time. With fancy modern construction the idea is that you don't have a bad joint that you have to hide, so you can do away with the baseboards and crown molding for a cleaner look. Blah, no, wrong. Baseboards are not just for hiding the joint, they serve a purpose. You can clean them, you can kick them, and they protect the bottom of your walls. They also provide a visual break from the floor to the wall, though that's a matter of taste.

X. I don't see anybody do the things that are so fucking obviously necessary these days. One is that all your wiring should be accessible for redoing, because we're going to continue to get new internet and cabling needs. Really you should run PVC tubes through your walls with fish-wires in them so that you can just easily pull new wiring through your house. (or of course if you make one of those god-awful modern places you should just do exposed pipes and wires; it's one of the few advantages of modern/industrial style, wtf. don't take the style and then reject the advantage of it). You should have charging stations that are just cupboards with outlets inside the cupboard so that you can hide all your charger cords and devices. There should be tons of outlets and perhaps they should be hidden; you could make them styled architectural features in some cases, or hide them in some wood trim or something. Another option would be to have retractable power outlets that coil up inside the wall and you can pull out as needed. Another idea is your baseboards could have a hollow space behind them, so you could snap them off and run cords around the room hidden behind the baseboards.

It would have been fun to be an architect. I have a lot of ideas about design, and I appreciate being in physical spaces that do something special to your experience. I love making physical objects that you can see and touch and show other people; it's so frustrating working in code and never making anything real.

But I'm sure I would have failed. For one things being an architect requires a lot of salesmanship and bullshitting. Particularly at the top levels, it's more about being a celebrity persona than about your work (just like art, cooking, etc.). To make any money or get the good commisions as an architect you have to have a bit of renown; you need to get paid extra because you're a name that's getting magazine attention.

But it's really more about following trends than about doing what's right. I suppose that's just like cooking too. Magazines (and blogs now) have a preconceived idea of what is "current" or "cutting edge" (which is not actually cutting edge at all, because it's a widespread cliche by that time), and they look for people that fit their preconceived idea. If you're a cook that does fucking boring ass generic "farm to table" and "sustainably raised" shit, then you're current and people will feature you in the news. If you just stick to what you know is delicious and ignore the stupid fads, then you're ignored.


09-18-13 - Per-Thread Global State Overrides

I wrote about this before ( cbloom rants 11-23-12 - Global State Considered Harmful ) but I'm doing it again because I think almost nobody does it right, so I'm gonna be really pedantic.

For concreteness, let's talk about a Log system that is controlled by bit-flags. So you have a "state" variable that is an or of bit flags. The flags are things like where do you output to (LOG_TO_FILE, LOG_TO_OUTPUTDEBUGSTRING, etc.) and maybe things like subsection enablements (LOG_SYSTEM_IO, LOG_SYSTEM_RENDERER, ...) or verbosity (LOG_V0, LOG_V1, ...). Maybe some bits of the state are an indent level. etc.

So clearly you have a global state where the user/programmer have set the options they want for the log.

But you also need a TLS state. You want to be able to do things like disable the log in scopes :


U32 oldState = Log_SetState(0);




(and in practice it's nice to use a scoper-class to do that for you). If you do that on the global variable, your thread is fucking up the state of other threads, so clearly it needs to be per-thread, eg. in the TLS. (similarly, you might want to inc the indent level for a scope, or change the verbosity level, etc.).

(note of course this is the "system has a stack of states which is implemented in the program stack").

So clearly, those need to be Log_SetLocalState. Then the functions that are used to set the overall options should be something like Log_SetGlobalState.

Now some notes on how the implementation works.

The global state has to be thread safe. It should just be an atomic var :

static U32 s_log_global_state;

U32 Log_SetGlobalState( U32 state )
    // set the new state and return the old; this must be an exchange

    U32 ret = Atomic_Exchange(&s_log_global_state, state , mo_acq_rel);

    return ret;

U32 Log_GetGlobalState( )
    // probably could be relaxed but WTF let's just acquire

    U32 ret = Atomic_Load(&s_log_global_state, mo_acquire);

    return ret;

(note that I sort of implicitly assume that there's only one thread (a "main" thread) that is setting the global state; generally it's set by command line or .ini options, and maybe from user keys in a HUD; the global state is not being fiddled by lots of threads at program time, because that creates races. eg. if you wanted to do something like turn on the LOG_TO_FILE bit, it should be done with a CAS loop or an Atomic OR, not by doing a _Get and then _Set).

Now the Local functions need to set the state in the TLS and *also* which bits are set in the local state. So the actual function is like :

per_thread U32_pair tls_log_local_state;

U32_pair Log_SetLocalState( U32 state , U32 state_set_mask )
    // read TLS :

    U32_pair ret = tls_log_local_state;

    // write TLS :

    tls_log_local_state = U32_pair( state, state_set_mask );

    return ret;

U32_pair Log_GetLocalState( )
    // read TLS :

    U32_pair ret = tls_log_local_state;

    return ret;

Note obviously no atomics or mutexes are need in per-thread functions.

So now we can get the effective combined state :

U32 Log_GetState( )
    U32_pair local = Log_GetLocalState();
    U32 global = Log_GetGlobalState();

    // take local state bits where they are set, else global state bits :

    U32 state = (local.first & local.second) | (global & (~local.second) );

    return state;

So internally to the log's operation you start every function with something like :

static bool NoState( U32 state )
    // if all outputs or all systems are turned off, no output is possible
    return ((state & LOG_TO_MASK) == 0) ||
        ((state & LOG_SYSTEM_MASK) == 0);

void Log_Printf( const char * fmt, ... )
    U32 state = Log_GetState();

    if ( NoState(state) )

    ... more here ...


So note that up to the "... more here ..." we have not taken any mutexes or in any way synchronized the threads against each other. So when the log is disabled we just exit there before doing anything painful.

Now the point of this post is not about a log system. It's that you have to do this any time you have global state that can be changed by code (and you want that change to only affect the current thread).

In the more general case you don't just have bit flags, you have arbitrary variables that you want to be per-thread and global. Here's a helper struct to do a global atomic with thread-overridable value :

struct tls_intptr_t
    int m_index;
        m_index = TlsAlloc();
        ASSERT( get() == 0 );
    intptr_t get() const { return (intptr_t) TlsGetValue(m_index); }

    void set(intptr_t v) { TlsSetValue(m_index,(LPVOID)v); }

struct intptr_t_and_set
    intptr_t val;
    intptr_t set; // bool ; is "val" set
    intptr_t_and_set(intptr_t v,intptr_t s) : val(v), set(s) { }
struct overridable_intptr_t
    atomic<intptr_t>    m_global;
    tls_intptr_t    m_local;    
    tls_intptr_t    m_localset;
    overridable_intptr_t(intptr_t val = 0) : m_global(val)
        ASSERT( m_localset.get() == 0 );
    intptr_t set_global(intptr_t val)
        return m_global.exchange(val,mo_acq_rel);
    intptr_t get_global() const
        return m_global.load(mo_acquire);
    intptr_t_and_set get_local() const
        return intptr_t_and_set( m_local.get(), m_localset.get() );
    intptr_t_and_set set_local(intptr_t val, intptr_t set = 1)
        intptr_t_and_set old = get_local();
        if ( set )
        return old;
    intptr_t_and_set set_local(intptr_t_and_set val_and_set)
        intptr_t_and_set old = get_local();
        if ( val_and_set.set )
        return old;
    intptr_t_and_set clear_local()
        intptr_t_and_set old = get_local();
        return old;
    intptr_t get_combined() const
        intptr_t_and_set local = get_local();
        if ( local.set )
            return local.val;
            return get_global();


// test code :  

static overridable_intptr_t s_thingy;

int main(int argc,char * argv[])
    argc; argv;
    ASSERT( s_thingy.get_combined() == 1 );
    intptr_t_and_set prev = s_thingy.set_local(3,1);
    ASSERT( s_thingy.get_combined() == 3 );

    ASSERT( s_thingy.get_combined() == 3 );
    ASSERT( s_thingy.get_combined() == 2 );
    return 0;

Or something.

Of course this whole post is implicitly assuming that you are using the "several threads that stay alive for the length of the app" model. An alternative is to use micro-threads that you spin up and down, and rather than inheriting from a global state, you would want them to inherit from the spawning thread's current combined state.

09-18-13 - Fast TLS on Windows

For the record; don't use this blah blah unsafe unnecessary blah blah.

extern "C" DWORD __cdecl FastTlsGetValue_x86(int index)
    mov     eax,dword ptr fs:[00000018h]
    mov     ecx,index

    cmp     ecx,40h // 40h = 64
    jae     over64  // Jump if above or equal 

    // return Teb->TlsSlots[ dwTlsIndex ]
    // +0xe10 TlsSlots         : [64] Ptr32 Void
    mov     eax,dword ptr [eax+ecx*4+0E10h]
    jmp     done

    mov     eax,dword ptr [eax+0F94h]
    mov     eax,dword ptr [eax+ecx*4-100h]


DWORD64 FastTlsGetValue_x64(int index)
    if ( index < 64 )
        return __readgsqword( 0x1480 + index*8 );
        DWORD64 * table = (DWORD64 *)  __readgsqword( 0x1780 );
        return table[ index - 64 ];

the ASM one is from nynaeve originally. ( 1 2 ). I'd rather rewrite it in C using __readfsdword but haven't bothered.

Note that these may cause a bogus failure in MS App Verifier.

Also, as noted many times in the past, you should just use the compiler __declspec thread under Windows when that's possible for you. (eg. you're not in a DLL pre-Vista).


09-17-13 - Travel with Baby

We did our first flight with baby over the weekend.

It went totally fine. All the things that people worry about (getting through security, baby ears popping, whatever) are no problem. Sure she cried on the plane some, but mostly she was really good, and anybody on the plane who was bothered can go to hell.

(hey dumbass who sat next to us - when you see a couple with a baby and they say "would you like to move to another seat" you should fucking move. And if you don't move, then you need to be cool about it. Jesus christ you all suck so bad, I'm fucking holding your hand helping you be a decent person, you don't even need to take any initiative, I know you all suck too bad to speak up and take the lead at being decent, but even when I open the door for you, you still can't manage to take the easy step. Amazing.)

Despite it being totally fine, it made me feel like I don't really need to do that again.

You just wind up spending the whole trip staring at baby anyway. You wind up spending a lot of time stuck in the hotel room, because she needs to nap, or you have to go back to feed her and get more toys, or get her out of the sun, or whatever. Hotel rooms are god awful. There's this weird romanticism about hotels, but the reality is they're almost uniformly dreary, in the standard shoebox design with light at only one end. My house is so much fucking better.

Like I'm in another part of the world, but I'm still just doing "goo-goos" and shaking the rattle, why do I need to bother with the travel if this is all I'm doing? And of course it's much harder because I don't have all my handy baby accessories and her comfy bed and all that. It made me think of those cheesy photo series with the baby always in the foreground of the photo and all kinds of different world locations in the background.

09-17-13 - A Day in the Life

Wake up at 730 with the baby.

Show her some toys, shake them around, let her chew on them. Practice rolling over, she gets frustrated, help her out. She wants to walk around a bit, so pick her up and hold her while she walks. Ugh this is hard on my back. She swats at things, I move away the dangerous knives and bring close some jars for her to play with. She's getting frustrated. Show her the mirror baby, she flirts for a minute. She gets fussy; check diaper, yep it's dirty; go change it. Lay her down and make faces and do motorboats and stuff. Try to read her a book; she just wants to eat the pages and gets frustrated. Walk her around outside and show her some plants, let her grab some leaves. She wants to walk, help her walk in the grass. Ugh this is hard on my back. Getting tired now. Take her in and put her in the walker; put some toys on the walker, she swats them off, pick them up and put them back. She's getting bored of that. Show her some running water, let her suck my finger. She's getting fussy again.

God I'm tired. Check the clock.

It's 800.

Twelve more hours until she sleeps. ZOMG.


09-12-13 - Health Insurance

We've got a bunch of health insurance bills from baby and related care, and several of them have fraudulent up-coding or double-billing. But it's sort of a damned-if-you-whatever situation. Either you :

1. Fight it, spend lots of time on hold on the phone, filling out forms, talking to assholes who won't help you. Probably get nowhere and be stressed and angry the whole time.

2. Just pay it and try to "let it go". The money's not that huge and peace of mind is worth it. But in fact feel defeated and like a pussy for letting them get away with robbing you. Become less of a man day by day as you are crushed by the indefeatable awfulness of the world.

Though I suppose those are generally your only two options when fighting beaurocracy. It's just that health care is more important to our lives, our wallets, and generally the health of the entire economy as it becomes an increasing tax on all of us.

We walked through some local street fair a few weeks ago, and saw one of the doctors who's fraudulently billed us; he was being all smiley, oo look at me I'm part of the community, I'm your nice neighborhood doctor man. I wanted to just punch him in the face.

Also : Republicans are retarded pieces of shit. How could you possibly be seriously opposed to tearing apart our entire health care sector and rebuilding from scratch with cost controls and a public option? They're either morons (unaware of their evil), or assholes (aware and intentionally evil). Oh yes, it's wonderful that we have choice and competition in this robust and functioning health care economy. Oh no, wait, actually we don't have that at all. We have a corrupt government-private conspiracy, which you have intentionally created, which is screwing over everyone in America except for the heads of the health care industry (and the politicians that take their money). Sigh. Time to go back to pretending that nothing exists outside my home, because it's all just too depressing.


09-10-13 - Grand Designs

I've been watching a lot of "Grand Designs" lately; it's my new go-to soothing background zone-out TV. It's mostly inoffensive (*) and slightly interesting.

(* = the incredibly predictably saccharine sum-up by Kevin at the end is pretty offensive; all throughout he's raising very legitimate concerns, and then at the end every time he just loves it. There are a few episodes where you can see the builder/client is just crushed and miserable at the end, but they never get into anything remotely honest like that, it's all very superficial, ooh isn't excess consumerism wonderful!)

I'm not even going to talk about the house designs really. I think most of them are completely retarded and abysmal generic pseudo-modern crap. (IMO in general modernism just doesn't work for homes. Moderism is beautiful when it's pure, unforgiving, really strictly adhered to. But nobody can live in a home like that. So they water it down and add some natural materials and normal cosy furniture and storage and so on, and that completely ruins it and turns it into "condo pseudo-modernism" which is just the tackiest of all types of housing. Modernism should be reserved for places where it's realistic to keep the severe empty minimalism that makes it beautiful, like museums).

Thoughts in sections :

What makes a house.

One of the most amusing things is noticing what people actually say at the sum-up at the end. Every time Kevin sits down and talks with the couple when the house is done and asks what they really love about it, the comments are things like :

"We're just glad it's done / we're just glad to have a roof over our head".
"It's so nice to just be together as a family"
"The views are amazing."
"The location is so special."

etc. it always strikes me that none of the pros has anything to do with the expensive house they just made. They never say anything like : "the architecture is incredibly beautiful and it's a joy to just appreciate the light and the spaces". Or "we're really glad we made a ridiculous 4000 sq ft living room because we often have 100 friends over for huge orgies". Because they don't actually care about the house at all (and it sucks).

Quite often I see the little cramped bungalow that people start out with and think "that looks just charming, why are you building?". It's all cosy and full of nick-nacks. It's got colorful paint schemes and is appropriately small and cheap and homey. Then they build some awful huge cavernous cold white box.

The children in particular always suffer. Often they're in a room together before the build, and the family is building to get more space so the kids can all have their own room. But the kids in the shared room are all laughing and wrestling, piled up on each other and happy. Kids are meant to be with other kids. In fact, humans are meant to be with other humans. We spend all this money to get big empty lonely spaces, and are worse off for it. Don't listen to what people say they want, it's not good for them.

In quite a few of the episodes, the couple at the beginning is full of energy, really loving to each other, bright eyed. At the end of the episode, they look crushed, exhausted, stressed out. Their lips are thin and they're all tense and dead inside.

Even in the disasters they're still saying how wonderful it is and how they'd do it all over (because people are so awfully boringly cowardlyly dishonest and never admit regret about major life decisions until after they've unwound them (like every marriage is "wonderful" right up until the day of the divorce)), but you can see it in their eyes. Actually the final interviews of Grand Designs are an interesting study in non-verbal communication, because the shlock nonsense that they say with their mouths has absolutely zero information content (100% predictable = zero information), so you have to get all your information from what they're not saying.

It's so weird the way some people get some ridiculous idea in their head and stick to it no matter how inconvenient and costly and more and more obviously foolish it is. Like I absolutely have to build on this sloping lot that has only mud for soil, or I have to build in this old water tower that's totally impractical. They incur huge costs, for what? You could have just bought a normal practical site, and you would have been just as happy, probably much happier.

In my old age I am appreciating that all opinions are coincidences and taste is an illusion. Why in the world would you butt your head against some obviously difficult and costly and impractical problem. You didn't actually want that thing anyway. You just thought you wanted it because you are brainwashed by the media and your upbringing. Just get something else that's easier. All things are equal.

Eco Houses.

The "eco houses" are some of the most offensive episodes to me. The purported primary focus of these "eco houses" is reducing their long-term carbon footprint, with the primary focus being on insulation to reduce heating costs. That's all well and good, but it's a big lie built on fallacies.

They completely ignore the initial eco cost of the build. Often they tear down some perfectly good house to start the build, which is absurd. Then they build some giant over-sized monstrosity out of concrete. They ignore all that massive energy waste and pollution because "long term the carbon savings will make up for it". Maybe, maybe not. Correctly doing long term costs is very hard.

Obviously anyone serious about an eco house should build it as small as reasonable. Not only does a small house use less material to make and use less energy to heat and light, there's less maintenance over its whole life, you fill it with less furniture, it's smaller in the landfill when inevitably torn down, etc.

Building a ridiculously huge concrete "eco house" is just absurd on the face of it; it's so hypocritical that it kind of blows your sensor out and you can't even measure it any more. It's sort of like making a high performance electric sports car and pretending that's "eco", it's just an absurd transparently illogical combination; oh wait...

One of the big fallacies of the eco house is the "long term payoff". There might be new building technology in 5 years that makes your house totally obsolete. Over-engineering with anything technical is almost always a mistake, because the cost (and environment cost in this case) is going down so fast.

Your house might go out of style. Houses now are not permanent, venerated monuments. They are fashion accessories. You can see by the way the eco people so happily tear down the houses from the 50's and 70's as if they were garbage. In 20 years if your house isn't fashionable, someone will buy it and tear it down. You're using lots of experimental new technology which greatly reduces the chance of your house actually lasting for the long term. Things like burying a house in the ground with a neoprene waterproofing layer makes the probability of the house actually lasting for the long term very small.

Perhaps the biggest issue is that they assume that the carbon cost of energy is constant. In fact it's going down rapidly. The whole idea of the carbon savings (for things like using massive amounts of concrete) is that the alternative (a timber house with normal insulation and more energy use) is polluting a lot through its energy use. But if its heat energy comes from solar or other clean sources, then your passive house is not a win. The technology of energy production will improve massively in the next 20-50 years, so saying your house is a win over the long (50-100 years) term is insane.

As usual in these phony arguments, they use a straw man alternative. They compare building a new $500k eco house vs. just leaving the old house alone. That's absurd. Of course if you want to say that the tear-down-and-build is more "eco" you should compare your new house vs. spending $500k on the old house or other eco purposes. What if you just left the old house and spent $100k to add insulation and solar panels and better windows? Then you could spend $400k on preserving forest land or something. A fair comparison has to be along an iso-line of constant cost, and doing the best you can per dollar in each case.

I'm sure the reality in most cases is just that people *want* a new house and are rationalizing and making up excuses why it's okay to do it. I'd like it so much better if they just said "yeah, we fucking want a new house that uses tons of concrete and we don't give a shit about the eco, but we're going to make it passive so that we can feel smug and show off to our friends".

European building vs. American.

Holy crap European building quality is ridiculously good.

In one of the episodes somebody puts softwood cladding on the house and the host is like "but that will rot in 10 years!" and the builder feels all guilty about it. (it's almost vanishingly rare to have anything but softwood cladding in America (*), and yes in fact it does rot almost immediately). (* = you might get plastic or aluminum, or these days we have various fake wood cement fiber-board options, but you would never ever use hardwood, yegads).

Granted the houses on the show are on the high end; I'm sure low end European houses are shittier. Still.

Almost every house is post-and-beam, either timber or steel frames. The timer frames are fucking oak which just blew my mind the first time I saw it. Actual fucking carpenters cutting joints. And real fucking wood. We have nothing like that. "skilled tradesman" isn't even an occupation in America anymore. All we have is "day laborer who is using a nail gun for the first time ever".

An American-style asphalt shingle roof is looked down upon as ridiculously crappy. Everything is slate or tile or metal. Their rooves last longer than our entire houses.

One funny thing I noticed is that the cost of stonemasons and proper carpenters seems to be incredibly low in the UK. There are some houses with amazing hand-done stonework and carpentry, and they cost about the same as the fucking awful modern boxes that are all prefab glass and drywall. Why in the world would you get a horrible cold cheapo-condo-looking modern box when you could have something made of natural materials cut by hand? The stone work in particular shocked me how cheap it was.

Another odd one is the widespread use of "blockwork" (concrete block walls). That's something we almost never do for homes in America, and I'm not sure why not. It's very quick and cheap, and makes a very solid wall. We associate it with prisons and prison-like schools and such, but if you put plaster over it, it's a perfectly nice wall and feels just like a stone house. I guess even blockwork is expensive compared to the ridiculously cheap American stick-framing method.

Another difference that shocked me is the "fixed price contract". Apparently in the UK you can get a builder to bid a price, and then if there are any overruns *they* have to cover it. OMG how fucking awesome is that, I would totally consider building a house if you could do that.

Oh yeah, and of course the planning regulations are insane. Necessary evil I suppose. It's why Europe is beautiful (despite heavy human modification absolutely everything) and America looks like a fucking pile of vomit anywhere it's been touched by the hand of man. (though a lot of the stuff that gets allowed on the show in protected areas is pretty awful modern crap that doesn't fit in or hide well at all, so I'm not sure the planners are really doing a great job. It seems like if you spend enough time and money they will eventually let you build an eyesore).

It's interesting to watch how people (the clients) handle the building process.

A few people get completely run over by their builder or architect, pushed into building something they don't want, and forced to eat delays and overruns and shitty quality, and that's sad to see. But it's also unforgivably pathetic of them to let it happen. YOU HAVE A FUCKING CH4 CAMERA CREW! It's the easiest time ever to make your builder be honest and hard-working. Just go confront them when the cameras are there and make them explain themselves on camera. WTF how can you be such a pussy that you don't stand up for yourself even when you have this amazing backup. But no, they'll say "oh, I don't know, what can I do about it?". You can bloody well do more than you are doing.

A few people are asshole micro-managers totally hovering over the crew all the time. The crew hate them and complain about them. But they also do seem to work harder. In this sad awful life of ours, being an annoying nag really does work great, because most people just don't want to deal with it and so will do what they have to in order to not get nagged.

Building a house is one of those situations where you can really see the difference between people who just suck it up and go with the flow "oh, I guess that's just what it costs", vs. people who are always scrapping and fighting and getting more for themselves. You can see some rich old pussy fart who doesn't fight might spend $1M on a build, and some other poor immigrant guy who knows how to deal and cajole and hustle might spend $200k on the exact same build. You can be bigger than your money or your intellect if you just fight for it.

The ones that are most impressive to me are the self-builds. It just astounds me how hard they work. And how wonderful to put 2 years or so of your life into just building one thing, that afterward you can go "I made this". Amazing, I'd love to do that. It's also the only time that I really see the people enjoying the process, and being happy afterward. (I particularly like the couple in scotland that does the gut-rehab of an old stone house all by themselves with no experience).

There are a few episodes with the classic manipulative architect. The architect-client relationship is usually semi-adversarial. Architects don't just want to make you the nice house you want, that suits you and is cheap and easy to build. They want to build something that will get them featured in a magazine; they want to build something that is cutting edge, or they have some bee in their bonnet that they want to try out. They want to use expensive and experimental methods and make you take all the risk for it. In order to get you to do that, they will lie to you about how risky and expensive it really it is. I don't necessarily begrudge the architects for that, it's what they have to do in order to get something interesting built. But it's amazing how naive and trusting some of the clients are. And it's a sort of inherently shady situation. Any time one person gets the upside (eg. the architect benefits if it goes well) and someone else gets the downside (the client has to eat the cost overrun and delays and live in the shitty house if it doesn't go well), that's a big problem for morality. You're relying solely on their ethics to treat you well, and that is an iffy thing to rely on.


09-02-13 - DEET

About to go camping for a few days. Discovered that the DEET has eaten its way through the original tube it came in, through a few layers of ziplocs, and out into my tub of camping stuff where it gladly ate a hole in my thermarest. Fucking DEET !

I guess after every trip I need to take the deet out and put it in a glass jar by itself. Or in a toxic waste containment facility or some shit. It's nasty stuff. Still better than mosquitos.


08-28-13 - How to Crunch

Baby is like the worst crunch ever. Anyway it's got me thinking about things I've learned about how to cope with crunch.

1. There is no end date. Never push yourself at an unsustainable level, assuming it's going to be over soon. Oh, the milestone is in two weeks, I'll just go really hard and then recover after. No no no, the end date is always moving, there's always another crunch looming, never rely on that. The proper way to crunch is to find a way to lift your output to the maximum level that you can hold for an indeterminate amount of time. Never listen to anyone telling you "it will be over on day X, let's just go all-out for that", just smile and nod and quietly know that you will have the energy to keep going if necessary.

2. Don't stop taking care of yourself. Whatever you need to do to feel okay, you need to keep doing. Don't cut it because of crunch. It really doesn't take that much time, you do have 1-2 hours to spare. I think a lot of people impose a kind of martyrdom on themselves as part of the crunch. It's not just "let's work a lot" it's "let's feel really bad". If you need to go to the gym, have a swim, have sex, do yoga, whatever it is, keep doing it. Your producers and coworkers who are all fucking stupid assholes will give you shit about it with passive aggressive digs; "ooh I'm glad our crunch hasn't cut into your workout time, none of the rest of us are doing that". Fuck you you retarded inefficient less-productive martyr pieces of crap. Don't let them peer pressure you into being stupid.

3. Resist the peer pressure. Just decided this is worth it's own point. There's a lot of fucking retarded peer pressure in crunches. Because others are suffering, you have to also. Because others are putting in stupid long hours at very low productivity, you have to also. A classic stupid one is the next point -

4. Go home. One of the stupidest ideas that teams get in crunches is "if someone on the team is working, we should all stay for moral support". Don't be an idiot. You're going to burn out your whole team because one person was browsing the internet a month ago when they should have been working and is therefore way behind schedule? No. Everyone else GO HOME. If you aren't on the critical path, go sleep, you might be critical tomorrow. Yes the moral support is nice, and in rare cases I do advocate it (perhaps for the final push of the game if the people on the critical path are really hitting the wall), but almost never. Unfortunately as a lead you do often need to stick around if anyone on your team is there, that's the curse of the lead.

5. Sleep. As crunch goes on, lack of sleep will become a critical issue. You've got to anticipate this and start actively working on it from the beginning. That doesn't just mean making the time to lie in bed, it means preparing and thinking about how you're going to ensure you are able to get the sleep you need. Make rules for yourself and then be really diligent about it. For me a major issue is always that the stress of crunch leads to insomnia and the inability to fall asleep. For me the important rules are things like - always stop working by 10 PM in order to sleep by 12 (that means no computer at all, no emails, no nothing), no coffee after 4 PM, get some exercise in the afternoon, take a hot shower or bath at night, no watching TV in bed, etc. Really be strict about it; your sleep rules are part of your todo list, they are tasks that have to be done every day and are not something to be cut. I have occasionally fallen into the habit of using alcohol to help me fall asleep in these insomnia periods; that's a very bad idea, don't do that.

6. Be smart about what you cut out of your life. In order to make time for the work crunch you will have to sacrifice other things you do with your life. But it's easy to cut the wrong things. I already noted don't cut self care. (also don't cut showering and teeth brushing, for the love of god, you still have time for those). Do cut non-productive computer and other electronics time. Do cut anything that's similar to work but not work, anything where you are sitting inside, thinking hard, on a computer, not exercising. Do cut "todos" that are not work or urgent; stuff like house maintenace or balancing your checkbook, all that kind of stuff you just keep piling up until crunch is over. Do cut ways that you waste time that aren't really rewarding in any way (TV, shopping, whatever). Try not to cut really rewarding pleasure time, like hanging out with friends or lovers, you need to keep doing that a little bit (for me that is almost impossible in practice because I get so stressed I can't stop thinking about working for a minute, but in theory it sounds like a good idea).

7. Be healthy. A lot of people in crunch fall into the trap of loading up on sugar and caffeine, stopping exercising, generally eating badly. This might work for a few days or even weeks, but as we noted before crunch is always indeterminate, and this will fuck you badly long term. In fact crunch is the most critical time to be careful with your body. You need it to be healthy so you can push hard, in fact you should be *more* careful about healthy living than you were before crunch. It's a great time to cut out all sugary snacks, fast food, and alcohol.

08-28-13 - Crying Baby Algorithm

Diagnosis and solution of crying baby problem.

1. Check diaper. This is always the first check because it's so easy to do and is such a conclusive result if positive. Diaper is changed, proceed to next step.

2. Check for boredom. Change what you're doing, take baby outside and play a new game. If this works, hunger or fatigue are still likely, you're just temporarily averting them.

3. Check hunger. My wife disagrees about this being so early in the order of operations, but testing it early seems like a clear win to me. After diaper this is the test with the most reliable feedback - if baby eats, that's a positive result and you're done. You can do an initial test for hungry-sucking with a finger in the mouth (hungry-sucking is much harder and more aggressive than comfort-sucking); hungry-sucking does not 100% of the time indicate actual hunger, nor does lack of hungry-sucking mean no hunger, but there is strong correlation. Doing a test feed is much easier with breastfed babies than with bottles (in which case you have to warm them up and find a clean nipple and so on). If baby is extremely upset, then failure to respond to a test-feed does not mean it is not hungry, you may have to repeat with stronger measures (see below).

4. Check for obvious stomach pain. Crying from stomach pain (gas, burps, acid, whatever) can be hard to diagnose once it becomes constant or severe (we'll get to that case below). But in the early/mild stages it's pretty easy to detect by testing body positions. If baby cries like crazy on its back but calms down upright or on its stomach (in a football hold), stomach pain is likely. If baby quiets down with abdomen pressure and back-patting, stomach pain is likely.

5. Look for obvious fatigue. The signs of this are sometimes clear - droopy eyes, yawning, etc. Proceed to trying to make baby sleep.

This is the end of the easy quick checks. If none of these work you're getting into the problem zone, where there may be multiple confounding factors, or the factors may have gone on so long that baby no longer responds well to them (eg. hungry but won't eat, gassy and patting doesn't stop the crying), so you will have to do harder checks.

Before proceeding, go back to #1 and do the easy checks again. Often it's the case that baby was just hungry, but after you did #1 it pooped and so wouldn't eat. It also helps to have a different person do the re-check; baby may not have responded to the first person but will respond to the next person doing exactly the same check the second time.

This is worth repeating beacuse it's something that continues to be a stumbling block for me to this day - just becaues you checked something earlier in the crying process doesn't mean it's not the issue. You tested hunger, she wouldn't eat. That doesn't mean you never test hunger again! You have to keep working on issues and keep re-checking. You have not ruled out anything! (except dirty diaper, stick your finger in there again to make sure, nope that's not it).

If that still doesn't work, proceed to the longer checks :

6. Check for overstimulation. This is pretty straightforward but can take a while. Take baby to a dark room, pat and rock, make shushes or sing to baby, perhaps swaddle. Just general calming. Crying may continue for quite a while even if this is the right solution so it takes some patience. (and then do 1-5 again)

7. Check for fatigue but won't go to sleep. The usual technique here is a stroller walk. (and then do 1-5 again)

8. Check for hungry but won't eat. You will have to calm baby and make it feel okay in order to eat. Proceed as per #6 (overstimulation) but stick baby on a nipple, in its most secure eating position. For us it helps to do this in the bedroom where baby sleeps and eats at night, that seems to be the most comforting place. Swaddling also helps here. Baby may even turn away and reject eating several times before it works. (and then do 1-5 again)

9. Assume unsignalled stomach pain. Failing all else, assume the problem is gastrointestinal, despite the lack of a clear GI relief test result (#4). So just walk baby around and pat its back, there's nothing else to do. (and keep repeating 1-5 again)

10. If you reach this point, your baby might just be a piece of shit. Throw it out and get a better one.


08-27-13 - Email Woes

I'm having some problem with emails. Somewhere between gmail and my verio POP host, some emails are just not getting through.

If you send me an important email and I don't reply, you should assume I didn't get it. Maybe send it again with return-receipt to make sure. (if you send me a "funny" link and I don't reply, GTFO)

Fucking hell, I swear everyone is fired. How is this so hard. Sometimes I think that the project I should really be working on (something that would actually be important and make a difference in the world to a lot of people) is making a new internet from scratch. Fully encrypted and peer-to-peer so governments can never monitor it, and no corporate overlord controls the backbone, and text-only so fucking flash or whatever can never ruin it. Maybe no advertising either, like the good old days. But even if I built it, no one would come because it wouldn't have porn or lolcats or videos of nut-shots.


08-26-13 - OMG E46 M3's have the best ads

Everyone knows that E46 M3's attract an amazing demographic . I'm still occasionally keeping my eye out for them (I can't face the minivan!) , and you come across the most incredible ads for these things.

Check this one out :

2004 E46 Bmw Laguna Blue M3

OMG it starts so good and then just gets better and better. Like you can't believe that he can top his last sentence, and then he just blows it away. Amazing. (if it's fake, it's fucking brilliant)

In white for when that expires (fucking useless broken ass piece of shit internet can't even hold some text permanently, what a fakakta load of crap everything in this life is, oy) :

Hello i got my baby blue BMW m3. Finally decided to sell. You will never find a cleaner BMW i promise. Best deal for best car. -black on black -6 speed manual -66,435 miles (all freeway) (half city)(1/2country) never racetracked. -Rebuilt title( -fixed by my brother Yuri with quality shop, he did good job). -19 inch wheels perfect for race, burnout, drift. I never do.. I promise. -AC blows hard like Alaska. -333hp but i put intake and its now 360hp at least, trust me. Powerfull 3,2 engine almost as fast as Ferarri 360. You dont believe? Look it up, youtube i have video! I can show you in the test drive.. You will feel like king on the roads... Trust me. This car puts you in extasy feeling. The pistons push hard. sometimes i say, "Big mistake- big payment, sucks job". But for you this the best because i believe ur serious. I keep it very clean, oil change every 20,000 miles. Ladies like it. My wife is mad sell, because she drives to work everyday, in seattle, and likes the power. I say we buy toyota and go vacation to Hawai. CASH ONLY( no credit, debit , fake checks, paypal, ebay, trade only for mercedes AMG, i like power. No lowball, serious buyers only, im tired of low balls offers, so please serious. I have pictures here. Thank you. I love the car so i could keep it if you dont want it. And go to casino to make payment. Im good with blackjack.

You dont believe? Look it up, youtube i have video!


08-22-13 - Sketch of Suffix Trie for Last Occurance

I don't usually like to write about algorithms that I haven't actually implemented yet, but it seems in my old age that I will not actually get around to doing lots of things that I think about, so here goes one.

I use a Suffix Trie for string matching for my LZ compressors when optimal parsing.

reminder: Suffix Tries are really the super-awesome solution, but only for the case that you are optimal parsing not greedy parsing, so you are visiting every byte, and for large windows (sliding window Suffix Tries are not awesome). see : LZ String Matcher Decision Tree (w/ links to Suffix Trie posts)

Something has always bothered me about it. Almost the entire algorithm is this sweet gem of computer science perfection with no hackiness and a perfect O(N) running time. But there's one problem.

A Suffix Trie really just gives you the longest matching substring in the window. It's not really about the *location* of that substring. In particular, the standard construction using pointers to the string that was inserted will give you the *first* occurance of each substring. For LZ compression what you want is the *last* occurance of each substring.

(I'm assuming throughout that you use path compression and your nodes have pointers into the original window. This means that each step along the original window adds one node, and that node has the pointer to the insertion location.)

In order to get the right answer, whenever you do a suffix query and find the deepest node that you match, you should then visit all children and see if any of them have a more recent pointer. Say you're at depth D, all children at depth > D are also substring matches of the same first D bytes, so those pointers are equally valid string matches, and for LZ you want the latest one.

An equivalent alternative is instead of searching all children on query, you update all parents on insertion. Any time you insert a new node, go back to all your parents and change their pointers to your current pointer, because your pointer must match them all up to their depth, and it's a larger value.

Of course this ruins the speed of the suffix trie so you can't do that.

In Oodle I use limitted parent updates to address this issue. Every time I do a query/insert (they always go together in an optimal parse, and the insert is always directly under the deepest match found), I take the current pointer and update N steps up the parent links. I tested various values of N against doing full updates and found that N=32 gave me indistinguishable compression ratios and very little speed hit.

(any fixed value of N preserves the O(N) of the suffix trie, it's just a constant multiplier). (you need to walk up to parents anyway if you want to find shorter matches at lower offsets; the normal suffix lookup just gives you the single longest match).

So anyway, that heuristic seems to work okay, but it just bothers me because everything else about the Suffix Trie is so pure with no tweak constants in it, and then there's this one hack. So, can we solve this problem exactly?

I believe so, but I don't quite see the details yet. The idea goes like this :

I want to use the "push pointer up to parents method". But I don't actually want to update all parents for each insertion. The key to being fast is that many of the nodes of the suffix trie will never be touched again, so we want to kind of virtually mark those nodes as dirty, and they can update themselves if they are ever visited, but we don't do any work if they aren't visited. (BTW by "fast" here I mean the entire parse should still be O(N) or O(NlogN) but not fall to O(N^2) which is what you get if you do full updates).

In particular in the degenerate match cases, you spend all your time way out at the leaves of the suffix trie chasing the "follows" pointer, you never go back to the root, and many of the updates overwrite each other in a trivial way. That is, you might do substring "xxaaaa" at "ptr", and then "xxaaaaa" at "ptr+1" ; the update of "ptr" back up the tree will be entirely overwrittten by the update from "ptr+1" (since it also serves as an "xxaa" match and is later), so if you just delay the update it doesn't need to be done at all.

(in the end this whole problem boils down to a very simple tree question : how do you mark a walk from a leaf back to the root with some value, such that any query along that walk will get the value, but without actually doing O(depth) work if those nodes are not touched? Though it's not really that question in general, because in order to be fast you need to use the special properties of the Suffix Trie traversal.)

My idea is to use "sentries". (this is a bit like skip-lists or something). In addition to the "parent" pointer, each node has a pointer to the preceding "sentry". Sentry steps take you >= 1 step toward root, and the step distance increases. So stepping up the sentry links might take you 1,1,2,4,.. steps towards root. eg. you reach root in log(depth) steps.

When you insert a new node, instead of walking all parents and changing them to your pointer, you walk all sentries and store your pointer as a pending update.

When you query a node, you walk to all sentries and see if any of them has a lower pointer. This effectively finds if any of your children did an update that you need to know about.

The pointer that you place in the sentry is really a "pending update" marker. It means that update needs to be applied from that node up the tree to the next sentry (ADD: I think you also need to store the range that it applies to, since a large-step range can get broken down to smaller ranges by updates). You know what branch of the tree it applies to because the pointer is the string and the string tells you what branch of the tree to follow.

The tricky bit happens when you set the pointer in the sentry node, there may be another pointer there from a previous insertion that is still pending update. You need to apply the previous pending update before you store your new pointer in the pending update slot.

Say a node contains a pending update with the pointer "a", and you come in and want to mark it with "b". You need to push the "a" update into the range that it applies to, so that you can set that node to be pending with a "b".

The key to speed is that you only need to push the "a" update where it diverges from "b". For example if the substring of "a" and "b" is the same up to a deeper sentry that contains "b" then you can just throw away the "a" pending update, the "b" update completely replaces it for that range.

Saying it all again :

You have one pointer update "a" that goes down a branch of the tree. You don't want to actually touch all those nodes, so you store it as applying to the whole range. You do a later pointer update "b" that goes down a branch that partially overlaps with the "a" branch. The part that is "a" only you want to leave as a whole range marking, and you do a range-marking for "b". You have to find the intersection of the two branches, and then the area where they overlap is again range-marked with "b" because it's newer and replaces "a". The key to speed is that you're marking big ranges of nodes, not individual nodes. My proposal for marking the ranges quickly is to use power-of-2 sentries, to mark a range of length 21 you would mark spans of length 16+4+1 kind of a thing.

Maybe some drawings are clearer. Here we insert pointer "a", and then later do a query with pointer "b" that shares some prefix with "a", and then insert "b".

The "b" update to the first sentry has to push the "a" update that was there up until the substrings diverge. The update back to the root sees that "a" and "b" are the same substring for that entire span and so simply replaces the pending update of "a" with a pending update of "b".

Let's see, finishing up.

One thing that is maybe not clear is that within the larger sentry steps the smaller steps are also there. That is, if you're at a deep leaf you walk back to the root with steps that go 1,1,2,4,8,16,32. But in that last big 32 step, that doesn't mean that's one region of 32 nodes with no other sentries. Within there are still 1,2,4 type steps. If you have to disambiguate an update within that range, it doesn't mean you have to push up all 32 nodes one by one. You look and see hey I have a divergence in this 32-long gap, so can I just step up 16 with "a" and "b" being the same? etc.

I have no idea what the actual O() of this scheme is. It feels like O(NlogN) but I certainly don't claim that it is without doing the analysis.

I haven't actually implemented this so there may be some major error in it, or it might be no speed win at all vs. always doing full updates.

Maybe there's a better way to mark tree branches lazily? Some kind of hashing of the node id? Probabilistic methods?


08-19-13 - Sketch of multi-Huffman Encoder

Simple way to do small-packet network packet compression.

Train N different huffman code sets. Encoder and decoder must have a copy of the static N code sets.

For each packet, take a histogram. Use the histogram to measure the transmitted length with each code set, and choose the smallest. Transmit the selection and then the encoding under that selection.

All the effort is in the offline training. Sketch of training :

Given a large number of training packets, do this :

for - many trials - 

select 1000 or so seed packets at random
(you want a number 4X N or so)

each of the seeds is a current hypothesis of a huffman code set
each seed has a current total histogram and codelens

for each packet in the training set -
add it to one of the seeds
the one which has the most similar histogram
one way to measure that is by counting the huffman code length

after all packets have been accumulated onto all the seeds,
start merging seeds

each seed has a cost to transmit; the size of the huffman tree
plus the data in the seed, huffman coded

merge seeds with the lowest cost to merge
(it can be negative when merging makes the total cost go down)

keep merging the best pairs until you are down to N seeds

once you have the N seeds, reclassify all the packets by lowest-cost-to-code
and rebuild the histograms for the N trees using only the new classification

those are your N huffman trees

measure the score of this trial by encoding some other training data with those N trees.

It's just k-means with random seeds and bottom-up cluster merging. Very heuristic and non-optimal but provides a starting point anyway.

The compression ratio will not be great on most data. The advantage of this scheme is that there is zero memory use per channel. The huffman trees are const and shared by all channels. For N reasonable (4-16 would be my suggestion) the total shared memory use is quite small as well (less than 64k or so).

Obviously there are many possible ways to get more compresion at the cost of more complexity and more memory use. For packets that have dword-aligned data, you might do a huffman per mod-4 byte position. For text-like stationary sources you can do order-1 huffman (that is, 256 static huffman trees, select by the previous byte), but this takes rather more const shared memory. Of course you can do multi-symbol huffman, and there are lots of ways to do that. If your data tends to be runny, an RLE transform would help. etc. I don't think any of those are worth pursuing in general, if you want more compression then just use a totally different scheme.

Oh yeah this also reminds me of something -

Any static huffman encoder in the vernacular style (eg. does periodic retransmission of the table, more or less optimized in the encoder (like Zip)) can be improved by keeping the last N huffman tables. That is, rather than just throwing away the history when you send a new one, keep them. Then when you do retransmission of a new table, you can just send "select table 3" or "send new table as delta from table 5".

This lets you use locally specialized tables far more often, because the cost to send a table is drastically reduced. That is, in the standard vernacular style it costs something like 150 bytes to send the new table. That means you can only get a win from sending new tables every 4k or 16k or whatever, not too often because there's big overhead. But there might be little chunks of very different data within those ranges.

For example you might have one Huffman table that only has {00,FF,7F,80} as literals (or whatever, specific to your data). Any time you encounter a run where those are the only characters, you send a "select table X" for that range, then when that range is over you go back to using the previous table for the wider range.

old rants