The All-Thing

All stick and no carrot, since ought-three.


山居秋暝 (王維)


| web page

Other views:
RSS 1.0
RSS 0.91
Plain (good for lynx)

Past posts:

Sun Mon Tue Wed Thu Fri Sat
  4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Recent comments:

Recent search referers:
last samurai mp3 torrent
korean bit torrent movies
pr0n torrent
gay torrent (x2)
"chinese movies" torrent
"xiii crack" (x2)
XIII torrent (x2)
xiii no cd crack
"ack too long"
shogun total war bit torrent
i want look for gay fried (x2)
"living in Boston"
bit torrent books
Last Samurai bit torrent
taboo bit torrent

William's Aggregated Feeds

Creative Commons License
This work is licensed under a Creative Commons License.


Fri, 19 Dec 2003
Burrows-Wheeler Transform

So we've been talking a little about this at work and I just found a good page describing it here:

It's like magic. Basically, the BWT is a completely reversible O (n log n) transformation that increases the compressibility of text. It's pretty mind-blowing to follow through the algorithm because it's really not clear, at least on the first pass, how it works, much less why it works. But the basic idea is that it permutes characters by sorting on their (left or right) contexts, which tends to group similar characters together, as they typically have similar contexts. Of course, doing this in a reversible manner is the brilliant part.

So this will work on any data that exhibits local contexts, which is the vast majority of data that we, as humans, care about (though not all—there are apparently certainly types of image data for which this doesn't hold true, though I'm not clear on the specifics). It would be interesting to compare the effect this has on, say, English vs. Chinese at the character level vs. Chinese at the byte level in different encodings.

I think bzip2 uses the BWT as a first pass, so they've apparently decided it is useful for arbitrary binary data.

Posted at 20:22 | trackback | 1 comment | back


Re: Burrows-Wheeler Transform
William wrote on Wed, 07 Jan 2004 14:46

apparently you can do it O(N) with suffix trees


Your Comment

URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Save my Name and URL/Email for next time