The All-Thing

All stick and no carrot, since ought-three.

各位能夠讀中文得來賓您好。小的這還在學中文中,恐怕中文寫得不太好,希望你們還看得懂。


山居秋暝 (王維)

空山新雨後,天氣晚來秋。
明月鬆間照,清泉石上流。
竹喧歸浣女,蓮動下漁舟。
隨意春芳歇,王孫自可留。

Contact:
| web page

Other views:
RSS 1.0
RSS 0.91
Plain (good for lynx)

Past posts:

March
Sun Mon Tue Wed Thu Fri Sat
  4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Recent comments:

Recent search referers:
last samurai mp3 torrent
korean bit torrent movies
pr0n torrent
gay torrent (x2)
"chinese movies" torrent
老女人torrent
"xiii crack" (x2)
XIII torrent (x2)
xiii no cd crack
"ack too long"
shogun total war bit torrent
i want look for gay fried (x2)
"living in Boston"
bit torrent books
Last Samurai bit torrent
taboo bit torrent

Exits:
William's Aggregated Feeds



Creative Commons License
This work is licensed under a Creative Commons License.

       

Fri, 19 Dec 2003
Burrows-Wheeler Transform

So we've been talking a little about this at work and I just found a good page describing it here: http://dogma.net/markn/articles/bwt/bwt.htm

It's like magic. Basically, the BWT is a completely reversible O (n log n) transformation that increases the compressibility of text. It's pretty mind-blowing to follow through the algorithm because it's really not clear, at least on the first pass, how it works, much less why it works. But the basic idea is that it permutes characters by sorting on their (left or right) contexts, which tends to group similar characters together, as they typically have similar contexts. Of course, doing this in a reversible manner is the brilliant part.

So this will work on any data that exhibits local contexts, which is the vast majority of data that we, as humans, care about (though not all—there are apparently certainly types of image data for which this doesn't hold true, though I'm not clear on the specifics). It would be interesting to compare the effect this has on, say, English vs. Chinese at the character level vs. Chinese at the byte level in different encodings.

I think bzip2 uses the BWT as a first pass, so they've apparently decided it is useful for arbitrary binary data.

Posted at 20:22 | trackback | 1 comment | back

Comments

Re: Burrows-Wheeler Transform
William wrote on Wed, 07 Jan 2004 14:46

apparently you can do it O(N) with suffix trees


Reply

Your Comment

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comment:
Save my Name and URL/Email for next time