All stick and no carrot, since ought-three.
Recent search referers:
Fri, 19 Dec 2003
So we've been talking a little about this at work and I just found a good page describing it here: http://dogma.net/markn/articles/bwt/bwt.htm
It's like magic. Basically, the BWT is a completely reversible O (n log n) transformation that increases the compressibility of text. It's pretty mind-blowing to follow through the algorithm because it's really not clear, at least on the first pass, how it works, much less why it works. But the basic idea is that it permutes characters by sorting on their (left or right) contexts, which tends to group similar characters together, as they typically have similar contexts. Of course, doing this in a reversible manner is the brilliant part.
So this will work on any data that exhibits local contexts, which is the vast majority of data that we, as humans, care about (though not all—there are apparently certainly types of image data for which this doesn't hold true, though I'm not clear on the specifics). It would be interesting to compare the effect this has on, say, English vs. Chinese at the character level vs. Chinese at the byte level in different encodings.
I think bzip2 uses the BWT as a first pass, so they've apparently decided it is useful for arbitrary binary data.
When in doubt, tell the truth. -- Mark Twain