M17n in Ruby 2.0

Unicode/i18n/m17n support has always struck me as one of Ruby’s weakest areas. (That and the lack of any kind of API documentation past 1.6, for fuck’s sake.) Matz has recently been describing the future API over on ruby-core via the Socratic method: 1, 2, 3, 4, 5, and 6 (and undoubtedly more by now, so follow the threads), and since a mention in Redhanded, I’ve been pondering the possibility of a Ruby 1.8 library that emulates 2.0’s expected m17n support.

Without a m17n library beyond iconv, though, it’s hard to see how one would accomplish this. I started writing a version of String that keeps everything in a “normalized form” of utf-8, has versions of all the String operations in utf-8, and converts to and from the target encoding via iconv, but the expected Regexp behavior has me at a loss. Without the equivalent of iconv for regexes, I can’t really emulate that, can I? And is a partial compatibility library really better than no compability library? (Especially given that it’s going to be wicked slow?)