Old comments are in

I’ve finally pulled in all the old comments from the Blogspot blog. A painful process of semi-automated Atom to YAML+Textile conversion, and the resulting comments are not threaded, but they’re at least here now.

As a side note, I’m really liking having my posts stored in a git repo. I can write them locally, tweak them and see how things look, and push when they’re finally ready to be published.

As another side note, MathML is a being a shitshow as usual. Firefox 3.1 (but not 3.0?) apparently craps out at embedded style sheets in XML (craps out as in, refuses to display the blog and displays a big red error instead), or some shit. So I’ve removed some stylesheet line from the master template and now everything seems to work in both Firefoxes. But that line is critical according to Putting mathematics on the Web with MathML so god only knows what I’ve broken in the process.

The big problem with all this MathML stuff is that the XML wonks apparently managed to trick everyone into violating Postel’s law and failing hard when the browser doesn’t like something about the XML it sees. So the moment anything is slightly out of whack, no one can see your blog. Maybe that’s why no one in the world uses MathML except for me?

That brings to mind an old Mark Pilgrim post about XML and Postel’s Law which is a good read, and includes this memorable quote:

Various people have tried to mandate this principle out of existence, some going so far as to claim that Postel’s Law should not apply to XML, because (apparently) the three letters “X”, “M”, and “L” are a magical combination that signal a glorious revolution that somehow overturns the fundamental principles of interoperability.

Good stuff. Too bad that was five fucking years ago and I’m still dealing with this shit.

Whisper 0.1 released

I’ve released Whisper 0.1. Now you can blog like me. It will happily serve static files, though if you’re expecting heavy traffic, you might put it behind something like Nginx. (See instructions in the configuration file for more.)

How to do it:

  1. sudo gem install whisper --source http://masanjin.net/
  2. whisper-init <blog directory>
  3. Follow the instructions!

Whisper benchmarks

I’ve done some benchmarking on Whisper. Here are the results, with a few points of comparison:

system req/s ms/req delta ms/req
nginx static 13736.04 7.280
rack/thin 3065.24 32.624 25.344
whisper/no logging 1918.56 52.123 19.499
whisper 1833.40 54.544 2.421

Nginx static is nginx serving a static file. We see it can handle 13k requests per second, and takes about 7ms for a single request. If we add a simple Thin server on top of that, going through Rack, we immediately drop requests/second by an order of magnitude, and it takes us an extra 25ms/request. That’s the cost of using Ruby.

Adding Whisper on top of that requires another 19.5 ms/requests, bringing our rate down to 1919 requests/second, or over 7 times slower than Nginx serving static files. And if you want logging with that, add another 2.4 ms/request.

That 2.4ms/request is interesting, because it’s basically the result of a few puts statements. Yes, Ruby is expensive. The bare Rack/Thin performance shows the headroom I have on the Ruby side (i.e. without rewriting the whole thing in C). If a puts is that expensive, then stripping out a couple debugging statements and caching some regexp results would probably result in a very noticable improvement in performance.

But how many requests/second do you need to be able to survive being Slashdotted? A brief web search suggests a high estimate of “several hundred”. Let’s say that means 300 req/s. That means that Whisper is already 6 times the Slashdot effect requirement. So it’s almost definitely not worth complicated the code for the sake of performance.

Experiment parameters: these are all tests using ab (the Apache benchmark tool) with 100 concurrent requests, averaged over 50k requests. The tests were performed by connecting to localhost (i.e. going over the network stack but not over the network itself), on a quad-core Intel 2Ghz (Q8200) running 64-bit Linux 2.6.27. YMMV.

What William Reads

When you start studying Chinese, one of the first decisions you have to make is whether you want to learn simplified characteres or traditional characters. The best wisdom I received on the subject was this: if you learn simplified characters, you can read what Mao Zedong wrote. If you learn traditional characters, you can read what he read.

To that end, I’ve thrown together all my RSS feeds into one HTML page using Planet. Now you can read what I read.

What William Reads.

Another month, another blog platform

Over the past month or so I’ve been spending some time hacking together yet another blogging platform, to satisfy all my (admittedly weird) blogging desires. It’s finally at the point where I can host my fascinating insights on it, so here you go. It’s called Whisper, and you’re looking at it now.

Interesting features:

  1. No RDBMS. Storing your blog entries in a RDBMS is like driving to work in the Space Shuttle.
  2. YAML+Textile, sitting on a disk. Like Hobix, blog posts and comments are stored on disk in regular files, using a mix of YAML and Textile. This means you can keep your content under version control, and you can edit it with whatever editor you desire. Unlike Hobix, the entry content is stored in a separate file from the metadata, so there’s none of the trickiness of embedding Textile in YAML.
  3. Sits directly on top of Rack (or Thin). No intermediate layer to slow things down. These particular bits are served from Thin over a unix socket to Nginx.
  4. Lazy cached dependency graph: every bit of content is cached, built lazily, and a part of a big dependency graph. That means almost every request is served directly from memory, and making a change, like adding or updating an entry, forces a regeneration of only those bits that require it. Infrequently-requested bits of content eventually expire.
  5. Markup enhancements: I’ve added some extra processing on top of Textile to do the things I’ve always wanted to do. Ruby code is automatically syntax-highlighted, LaTeX math expressions are turned into MathML (via RiTeX ), etc. Finally I can write purty-lookin’ math and code without a ridiculous amount of effort.
  6. Threaded comments. Why would you not have this?
  7. Comments via email. This is still a work in progress, but comments can currently only be made by entering your email address, and replying to the resulting email. This allows you to quote, thread, and generally have a reasonable discussion, which is what email is good at, and what typing shit into little text areas on your web browser is not. The eventual goal is to automatically mirror the entire conversation, but right now it just mirrors individual replies.
  8. Multiformat support. In addition to HTML and RSS output, there’s a plain text mode for the hard-core.
  9. Pagination, labels, per-label and per-author indices, etc.
  10. The whole thing amounts to a little over 1200 lines of code.

The code’s still a while away from being ready for public consumption, but I’ve put up a git repo here: git://masanjin.net/whisper.

The next steps are to flesh out the code enough to make it usable by other people, make a gem, and maybe publish some performance numbers.

Trollop 1.11 released

Trollop 1.11 has been released. This is a minor release with only one new feature: when an option <opt> is actually given on the commandline, a new key <opt>_given is inserted into the return hash (in addition to <opt> being set to the actual argument(s) specified).

This allows you to detect which options were actually specified on the commandline. This is necessary for situations where you want one option to override or somehow influence other options. For example, configure’s --exec-prefix and --bindir flags: if --exec-prefix is specified, you want to override the default value for --bindir, unless that’s also given. If neither are given, you want to use the default values.

This should be a backwards-compatible release, except for namespace issues, if you actually had options called <something>-given.

Copying objects in memory

One of the fundamental questions in VM design for OO languages is how you represent your object handles internally. Regardless of what the language itself exposes in terms of fundamental types, boxing, and the like, the VM still has to shuffle objects around on the stack frame, pass them between methods/functions, etc.

The traditional way to do this is the “tagged union”, where you use a two-element struct consisting a type field and a union of values for each possible type. One of these types is probably an object pointer; the other types let you represent unboxed fundamental types like ints and floats. This is the approach used by Rubinius, by Lua, and probably many more.

The Neko VM instead uses 31-bit pointers for everything except ints, and fixes the lowest bit as the integer bit. If this bit is on, the value represents a 31-bit integer; if it’s off, the value is a pointer. Of course this means that Neko objects can only be at even addresses in memory. (I’m not sure what happens on 64-bit machines; either ints stay at 31 bits or they grow to 63-bit longs; the pointers certainly grow to 64-bit).

The result is that Neko object handles are the size of pointers, hence small, but Neko loses the ability to handle unboxed floats. All float operations will require lots of heap allocation and dereferencing. On the other hand, Lua object handles are much larger, but Lua can do float arithmetic on the stack. (The VM stack, not the system stack.)

The Neko folks claim that their representation is better, because it’s smaller, and faster when you’re copying things around. But what value do you really get by sacrificing floats? And what about when we take into account different architectures?

Comparing sizes is easy. On a 32-bit machine, Lua objects take up 12 bytes: a double is 8 bytes, and the tag grows the struct to 12. So Lua object handles are three times the size of Neko handles. On a 64-bit machine, Lua objects take up 16 bytes, and Neko objects take up 8. Note that Lua handles are now only twice as big as Neko handles.

Comparing speed is a little more interesting. How much is lost, exactly by copying around those extra 8 bytes, for each architecture? I did some simple experiments where I copied objects of various sizes around 10 million times, picking a random start and end point for each within a block of allocated memory on the heap, and measuring how long everything took.

On my 32-bit machine, taking 10m random numbers took 2.874 seconds; copying 12-byte objects from one location to the other each time took an additional 91ms. Copying 4-byte objects took only an extra 77ms. That works out to a 15.3% slowdown for Lua.

On my 64-bit machine, taking 10m random numbers took 517ms; copying 16-byte objects each time took an additional 1.85 seconds; copying 8-byte objects took an additional 1.81 seconds. That works out to a 2.2% slowdown for tagged unions.

So personally, the 32-bit case is maybe arguable, but the 64-bit case doesn’t seem that compelling. Copying object handles around is one thing of very many that the VM spends its time on, so the overall slowdown is going to be much less than 2.2%. I don’t know that sacrificing float performance, and half of your integer space, is really worth it.

If you want to run these experiments for yourself, the code is here. Please let me know if I’m doing something wrong!

Damn you, _why!

  • Topics:
  • vm

I swear to god that two weeks ago I started writing a VM for a classless OO language with scoped mixins. And now you go and release Potion.

Indirect threading for VM opcode dispatch

There’s a good discussion with lots of interesting details on a recent patch submission for adding indirect threading to the Python VM. (And by “discussion” I mean a single, un-threaded sequence of comments where you have to manually figure out who’s replying to what, which apparently is what everyone in the world is happy with nowadays except for me. Email clients have had threading since 1975, bitches, so get with the fucking program. [Hence, Whisper—ed.]) Pointed to by programming.reddit.com, which remains surprisingly useful, as long as you cut yourself once the comment thread devolves (as it invariably does) into meta-argumentation.

Indirect threading is a vaguely-neat trick that I first learned about around the time I was getting into the Rubinius code. The idea is that, in the inner loop of your VM, which is going through and interpreting the opcodes one at a time (dispatching each to a block of handler code), instead of jumping back to the top of the loop at the end of handler code section, you jump directly to the location of the handler code for the next opcode. The big benefit is not so much that you save a jump per opcode (which maybe is optimized out for you anyways), but that the CPU can do branch prediction on a per-opcode basis. So common opcode sequences will all be pipelined together.

But the discussion shows that this kind of thing is very compiler- and architecture-dependent, and you have to spend a lot of time making sure that GCC is optimizing the “right way” for your particular architecture, is not overly-optimizing by collapsing the jumps together, etc. OTOH, the submitter is reporting a 20% speedup, and this is the very heart of the VM, so it could very well be worth spending time on such trickery.

More information:

  • The structure and performance of efficient interpreters [pdf]
  • Inline threading, Tracemonkey, etc.
  • A Pypy-dev thread on threaded interpretation.
  • Various performance-specific bits of the V8 Javascript interpreter design.

A spot of bright news in the MA wine-shipping situation

One of the “fun” things about living in MA (besides the obvious fun of “the weather” and “the people”) is that you can’t get wine shipping directly to your house anywhere in the state. Until 2005 it was illegal; until very recently it was effectively illegal; and now, thanks to a district court decision overturning a state law, it’s merely uncertain.

But uncertainty is a positive step in this state. If you read the “factual background” section of the text of the decision itself, you’ll get a fun overview of how, in typical Massachusetts fashion, the current situation is the result of a culture of cronyism and old-boys-club-ism, with wine wholesalers in the state controlling the legislative process and protecting their own monopoly at the expense of both wineries and consumers, typically while justifying their actions by appealing to the state’s deep-seated Puritan anti-alcohol sentiment.

The specifics of the legislation that was just overturned should give you an idea of the crass, absolutely unsubtle gerrymandering the state legislature is willing to stoop to, in this case to circumvent another state law that was overturned as unconstitutional in 2005 (by the US Supreme court, no less!):

The detailed account sheds light on a fact that we known all along—that the 30,000 gallon capacity cap was set conveniently above the production capacity of the largest winery in Massachusetts (24,000 gallons). This cap was designed to allow the Massachusetts wineries to ship directly to consumers, while simultaneously protecting Massachusetts wholesalers by prohibiting out-of-state medium and large wineries from doing the same.

Of course, we’re still a ways away from being able to join the Screaming Eagle wine-of-the-month club: MA still has a host of other regulations that make delivery services like Fedex and UPS either unable or unwilling to deliver wine, like requiring a special permit for each vehicle that might have wine on it. But maybe we’re getting closer. If nothing else, we can hope the increased attention will have a “sunlight is the best disinfectant” kind of effect on the issue.

prev  0 1 2 3 4 5 6 7  next