Showing only posts with topic "sup" [.rss for this topic]. See all posts.

Calculating string display width in Ruby

Most programmers are by now familiar with the difference between the number of bytes in a string and the number of characters. Depending on the string’s encoding, the relationship between these two measures can be either trivially computable or complicated and compute-heavy.

With the advent of Ruby 1.9, the Ruby world at last has this distinction formally encoded at the language level: String#bytesize is the number of bytes in the string, and String#length and String#size the number of characters.

But when you’re writing console applications, there’s a third measure you have to worry about: the width of the string on the display. ASCII characters take up one column when displayed on screen, but super-ASCII characters, such as Chinese, Japanese and Korean characters, can take up multiple columns. This display width is not trivially computable from the byte size of the character.

Finding the display width of a string is critical to any kind of console application that cares about the width of the screen, i.e. is not simply printing stuff and letting the terminal wrap. Personally, I’ve been needing it forever:

  1. Trollop needs it because it tries to format the help screen nicely.
  2. Sup needs it in a million places because it is a full-fledged console application and people use it for reading mail in all sorts of funny languages.

The actual mechanics of how to compute string width make for an interesting lesson in UNIX archaeology, but suffice it to say that I’ve travelled the path for you, with help from Tanaka Akira of pp fame, and I am happy to announce the release of the Ruby console gem.

The console gem currently provides these two methods:

  • Console.display_width: calculates the display width of a string
  • Console.display_slice: returns a substring according to display offset and display width parameters.

There is one horrible caveat outstanding, which is that I haven’t managed to get it to work on Ruby 1.8. Patches to this effect are most welcome, as are, of course, comments and suggestions.

Try it out!.

Ruby, Ncurses and blocked threads

If you’re writing a multithreaded Ruby program that uses ncurses, you might be curious why program stops running when you call Ncurses.getch. Sup has been plagued by this issue since 2005. Thankfully, I think I finally understand it.

The problem is that there is a bug in the Ruby ncurses library such that using blocking input will block all Ruby threads when it waits for user input, instead of just the calling thread. So Ncurses.getch will cause everything to grind to a halt. This is probably due to the library not releasing the GVL when blocking on stdin.

This bug is present in the latest rubygems version of curses, 0.9.1. It has been fixed in the latest libncurses-ruby Debian packages (1.1-3).

To see if you have a buggy, blocking version of the ruby ncurses library, run this program:

require 'rubygems'
require 'ncurses'
require 'thread'

Ncurses.curs_set 0 do
  sleep 0.1
  Ncurses.stdscr.mvaddstr 0, 0, "library is GOOD."

  Ncurses.stdscr.mvaddstr 0, 0, "library is BAD."
  Ncurses.curs_set 1
  puts "bye"

(I purposely require rubygems in there to load the rubygems ncurses library if it’s present; you can drop this if you don’t use rubygems.)

There are two workarounds to this problem. First, you can simply tell ncurses to use nonblocking input:

Ncurses.nodelay Ncurses.stdscr, true

But if you’re writing a multithreaded app, you probably aren’t interested in nonblocking input, unless you want a nasty polling loop.

The better choice is to add a call to before getch, which will block the calling thread until there’s an actual keypress, and then allow getch to pick it up:

if [$stdin], nil, nil, 1
end requires a delay, so you’ll have to handle the periodic nils that generates. But the background threads should no longer block.

There is one further complication, which is that you won’t be able to receive the pseudo-keypresses Ncurses emits when the terminal size changes, since they don’t show up on $stdin and thus the select won’t pass. The solution is to install your own signal handler:

trap("WINCH") { ... handle sigwinch  ... }

You will still see the resize events coming from getch, but only once the user presses a key. You can drop them at this point.

That should be enough to make any multithreaded Ruby ncurses app able function. Of course, once everyone’s using a fixed version fo the ncurses libraries, you can do away with the select and set nodelay to false.

(One last hint for the future: I’ve found it necessary to set it to false before every call to getch; otherwise a ctrl-c will magically change it back to nonblocking mode. Not sure why.)

What’s cooking in Sup next

The 0.7 release ain’t the only exciting Sup news. Here’s a list of interesting features that are currently cooking in Sup next, along with the associated branch name.

  • zsh completion for sup commandline commands, thanks to Ingmar Vanhassel. (zsh-completion)
  • Undo support for many commands, thanks to Mike Stipicevic. (undo-manager)
  • You can now remove labels from multiple tagged threads, thanks to Nicolas Pouillard, using the syntax -label). (multi-remove-labels)
  • Sup works on terminals with transparent backgrounds (and that’s fixed copy-and-paste for me too!), thanks to Mark Alexander. (default-colors)
  • Pressing ‘b’ now lets you roll buffers both forward and backward, also thanks to Nicolas Pouillard. (roll-buffers)
  • Duplicate messages (including messages you send to a mailing list, and then receive a copy of) should now have their labels merged, except for unread and inbox labels. So if you automatically label messages from mailing lists via the before-add-hook, that should work better for you now. (merge-labels)
  • Saving message state is now backgrounded, so pressing ‘$’ after reading a big thread shouldn’t interfere with your life. It still blocks when closing a buffer, though, so I have to make that work. (background-save)
  • Email canonicalization, also thanks to Nicolas Pouillard. The mapping between email addresses and names is no longer maintained across multiple emails. (dont-canonicalize-email-addresses)

The canonicalization one is a weird one. There’s been a long-standing problem in Sup where names associated with email addresses are saved and reused. Unfortunately many automated systems like JIRA, evite, blogger, etc. will send you email on behalf of someone else, using the same email address but different names. The issue was compounded because Sup decided that longer names should always replace shorter ones, so receiving some spam claiming to be from your address but with a random name would have all sorts of crazy effects.

Addresses are still stored in the index, both for search purposes, and for thread-index-mode. (Otherwise thread-index-mode has to reread the headers from the message source, which is slow.) Once thread-view-mode is opened, the headers must be read from the source anyways, so the email address is updated to the correct version.

So, incoming new email should be fine. Sup will store whatever name is in the headers, and won’t do any canonicalization.

For older email, you can update the index manually by viewing the message in thread-view-mode, and forcing Sup to re-save it, e.g. by changing the labels and then changing them back. Marking it as read, and then reading it, is an easy way to accomplish this, at least for read messages.

You can also make judicious use of sup-sync to do this for all messages in your index.

Sup 0.7 released

Sup 0.7 has been released.

You can read the announcement here

The big win in this release is that Ferret index corruption issues should now be fixed, thanks to an extensive programming of locking and thread-safety-adding.

The other nice change is that text entry will now scroll to the right upon overflow, thanks to some arcane Curses magic.

Sharing Conflict Resolutions in Git

Development of Sup is done with Git. Sup follows a topic branch methodology: features and bugfixes typically start off as “topic” branches from master, and are merged into an “integration”/“version” branch next for integration testing. After n cycles of additional bugfix commits to the topic branch, and re-merges into next, the topic branches are finally merged down to master, to be included in the next release.

I really like this approach because I think it evinces the real power of Git: that merges are so foolproof that I can pick and choose, on a feature-by-feature basis, which bits of code I want at each level of integration. That’s crazy cool. And users can stick to master if they want something stable, and next if they want the latest-and-greatest features.

The biggest problem I’ve had, though, is that long-lived topic branches often conflict with each other. This happens both when merging into next and when merging into master. I don’t think there’s a way around it; isolating features in this way has all the benefits above, but it also means that when they touch the same bits of code, you’ll get a conflict.

As a lazy maintainer, the biggest question I’ve had is: is there a way to push the burden of conflict resolution to the patch submitter? Is there a way for me to say: hey, your change conflicts with Bob’s. Can you resolve the conflict and send it to me?

One option I’ve considered is to have contributors to publish not only their feature branches, but their next branch as well. Assuming they aren’t mucking about with their next branch otherwise, if it contains just the merge commit, I can merge it into mine, and it should be a fast-forward that gets me the merge commit, conflict resolution and all.

But I don’t like that idea because, in every other case, I’m merging in the feature branches directly. Why should I suddenly start merging in next just because you have a conflict?

Furthermore, Sup primarily receives email contributions via git format-patch, and I do the dirty deed of sorting them into branches and merging things around. Requiring everyone to host a git repo iff they produce a conflicting patch seems silly. (And git format-patch, unfortunately, produces nothing for merge commits, even if they have conflict resolution changes. Maybe there’s a good reason for this, or maybe not. I’m not sure.)

After some effort, and some git-talk discussion, I have a solution. And no, it doesn’t involve sharing git-rerere caches. (Which it seems that some people do!)

For the contributor: once you have resolved the conflict, do a git diff HEAD^. This will output the conflict resolution changes. Email that to the maintainer along with your patch.

For the maintainer:

$ git checkout next
$ git merge <offending branch>
[... you have a conflict, yada yada ...]
$ git checkout next .
$ git apply --index <resolution patch filename>
$ git commit

Running git merge gets you to the point where you have a conflict. Running git checkout next . sets your working directory to the state it was before you merged. And git apply applies the resolution changes.

You lose authorship of the conflict resolution, but you can use git commit --author to set it.

I think the ideal solution would be for git format-patch to produce something usable in this case. I see some traffic on the Git list that suggests this is being considered, so hopefully one day this rigmarole will not be necessary.

Rethinking Sup part II

In Rethinking Sup part I, I concluded that Sup the MUA is an evolutionary dead end, and that the future lies in Sup the Service (STS). But what does that mean?

One thing I want to make clear it does not mean is any abandonment of the Sup curses UI. That particular “user experience” has been refined over the past few years to become my ideal email interface. It would be silly to throw that away.

What will happen to the curses code is that it will become one client among (hopefully) many. Once there’s a clear delineation between UI and backend, you can make a UI choice independent of making a choice to use Sup in the first place. You can run sup-curses-client if you want. Or you can build a web interface, or an Openmoko interface. Working with ncurses has always been the least enjoyable part of Sup, so maybe I’ll actually enjoy learning Javascript.

What backend functionality will STS actually provide? If I were simply reworking Sup into a client and a server, the obvious answer would be “a searchable, labelable, threaded view of large amounts of email”.

But reworking Sup is a great time to extend its original goals. In particular, I would love for STS to handle to other types of documents besides email. I’ve always used my inbox as a mechanism for writing notes to myself. I’ve experimented briefly with reading RSS feeds through it. I’d like STS to support email, of course, but not to be limited by it.

My grand vision: STS will be a searchable, labelable, threaded view of large numbers of documents.

You can throw whatever you want in there, and STS will store it, thread it, and let you label and search for it. Email, RSS feeds, notes, jabber and IRC logs, web pages, RI documents—I want you to be able to throw them all in there. I want you to be able to annotate any of those things by adding notes and threading them against the original objects. Basically I want STS to be the primary tool you use for organizing and recalling all the textual information you’ve ever encountered in your life.

Cool, huh?

There’s another convenient benefit to this transformation: no one will expect STS to act like a MUA. STS does its own storage. You add your email and your other documents to the server and then you can throw those files away (or not). There are no more questions of supporting IMAP or various mbox dialects or “why doesn’t Sup treat Maildir correctly”. The files are in STS, and once they’re their, they’re out of your hands. You’ll be able to export them, of course, and if you’re crazy you might be able to write an IMAP server translation layer for STS, but there will be no more expectation of realtime Maildir handling. As I explained in part I, that’s a game I don’t want to play.

STS is a grander vision than a MUA, and it no longer has to be hobbled by the constraints of being expected to act like one.

Some other nice benefits of reworking Sup into SYS:

  • You’ll be able to run multiple clients at once.
  • It’s an opportunity to rework some things. For example, one of the most noticeably slow operations in Sup (“Classic”) is assembling a large thread. This is because I made a decision early on to do all threading at search time. That made certain things easier (in particular, I could change the threading model without having to rescan the entire index), but in retrospect the cost is too high. STS will maintain document trees directly.
  • I can replace Ferret with Sphinx. It’s been a good couple years, but the periodic non-deterministic index corruption that’s been an issue for over a year is an exit sign to me. Working with Sphinx is nowhere nearly as nice as working with Ferret, but speed and stability go a long way.

I’ve been working on the code for STS on and off for the past couple weeks and it’s slowly starting to come together. Once the major components have at least been all sketched, I will host a git repo.

Rethinking Sup

It’s been clear to me for a while now that Sup has been trying to be two very different things at once, thus pleasing no one and irritating everyone. There’s Sup the email client, which is kind of the standard view of things. And then there’s Sup the service: a threaded, fielded, searchable, labelable view into your email.

Sup the email client is lacking in many ways, as many people have been very quick to point out to me. The most obvious of these is that it refuses to actually, you know, actually write back any state to your mailstore. Specifically, read/unread state is never written anywhere except its internal index. Furthermore, mailstore rescans of most any type are incredibly slow. These two features make using it in conjunction with other clients near impossible, which pretty much breaks one of the primary principles of tool design: don’t break other tools. (Then there’s also the problem of IMAP connections being terrifically slow and prone to crashes, but I lay most of that blame on IMAP being a crappy protocol and the Ruby IMAP libraries leaving a lot to be desired.)

Sup the service, on the other hand, suffers from the rather obvious flaw of not being exposed in any manner other than through Sup itself (and irb, I suppose).

I think the reason for this bizarre situation stems from my goal of fusing two very different things together: mutt and Gmail. Mutt is a client; Gmail is a service; Sup cherry-picks functionality, and lack of functionality, from both. Examples: I refused to have Sup write back to mailstores because Gmail didn’t have to export to your local Maildir or mbox file, so why should I? (Well technically, I said I would accept patches that did that, but that I wouldn’t be working on that feature myself. A fine distinction!) At the same time, I pooh-poohed the notion of a Sup server because mutt didn’t have a server, and so why should Sup? And so on.

For Sup to evolve into something more useful than it is, and that appeals to a broader audience than it currently does, I believe it has to go down one of these routes completely. And I believe I know which one, and I believe this can be done without compromising the basic user experience, which I would be very reluctant to do because it has been lovingly tweaked over the years to be William’s Ideal Email Experience.

The first option is to make Sup more of a client. In order to be a real email client, Sup must be able to interoperate with other clients. This means it has to write back all its state to the mailstores: read/unread status in whatever manner the mailstore supports, and probably something like all labels in a special header. It must also be able to do a full rescan in a fast manner, so that changes by other clients are reflected.

Right off the bat, that seems impossible, redundant with other software, and not that interesting. As I wrote in a sup-talk thread from a few months ago:

Sup is never going to be able to compete with programs like Mutt in terms of operations like “open up a mailstore of some format X, and mark a bunch of messages as read, and move a bunch of messages to this other mailstore.” That’s a tremendous amount of work to get right, get safe and get fast, and Mutt’s already done it well, and I sure don’t want to have to reimplement it. Competing with mutt on grounds of speed, stability, and breadth of Mailstore usage is a recipe for fail. Ruby sure as shit ain’t gonna come close to C for speed (at least until Rubinius gets LLVM working), and mutt’s already hammered out all the quirkinesses with Exchange, etc.

But not only would it be impossible, it wouldn’t be interesting. The things that make Sup valuable are the UI, the indexing and the flags, and those simple don’t translate to external mailstores. Furthermore, Sup is aimed at the mailstores of the future (my present mailstores), which are so big that mutt can’t handle them anyways.

So that leaves Sup as a service. And that’s where things get interesting. But I’ll save that for a later post.