Showing only posts with topic "benchmarks" [.rss for this topic]. See all posts.

Whisper benchmarks

I’ve done some benchmarking on Whisper. Here are the results, with a few points of comparison:

system req/s ms/req delta ms/req
nginx static 13736.04 7.280
rack/thin 3065.24 32.624 25.344
whisper/no logging 1918.56 52.123 19.499
whisper 1833.40 54.544 2.421

Nginx static is nginx serving a static file. We see it can handle 13k requests per second, and takes about 7ms for a single request. If we add a simple Thin server on top of that, going through Rack, we immediately drop requests/second by an order of magnitude, and it takes us an extra 25ms/request. That’s the cost of using Ruby.

Adding Whisper on top of that requires another 19.5 ms/requests, bringing our rate down to 1919 requests/second, or over 7 times slower than Nginx serving static files. And if you want logging with that, add another 2.4 ms/request.

That 2.4ms/request is interesting, because it’s basically the result of a few puts statements. Yes, Ruby is expensive. The bare Rack/Thin performance shows the headroom I have on the Ruby side (i.e. without rewriting the whole thing in C). If a puts is that expensive, then stripping out a couple debugging statements and caching some regexp results would probably result in a very noticable improvement in performance.

But how many requests/second do you need to be able to survive being Slashdotted? A brief web search suggests a high estimate of “several hundred”. Let’s say that means 300 req/s. That means that Whisper is already 6 times the Slashdot effect requirement. So it’s almost definitely not worth complicated the code for the sake of performance.

Experiment parameters: these are all tests using ab (the Apache benchmark tool) with 100 concurrent requests, averaged over 50k requests. The tests were performed by connecting to localhost (i.e. going over the network stack but not over the network itself), on a quad-core Intel 2Ghz (Q8200) running 64-bit Linux 2.6.27. YMMV.

Preliminary Rubinius inliner benchmarks

I’ve done some very preliminary benchmarking on the inliner I’ve been hacking into Rubinius.

For the very simple case it can handle so far—guaranteed dispatch to self, fixed number of arguments (no splats or defaults), no blocks—here’s what we get for 10m iterations of a simple function calling another simple function:

name user system total real
uninlined-no-args 22.49 0 22.49 22.49
inlined-no-args 21.74 0 21.74 21.74
uninlined-4-args 27.74 0 27.74 27.74
inlined-4-args 24.59 0 24.59 24.59

So inlining results in a 3.5% speedup on method dispatch with no arguments, and a 12.8% speedup when there are four arguments.

Of course this is the very optimal case for the inliner. Guaranteed dispatch to self means that I don’t even add any guard code, which would definitely slow things down. But this actually is a fairly common case that occurs whenever you use self accessors and any helper functions that don’t have blocks or varargs.

And the real boost of inlining, presumably, is going to be in conjunction with JIT, since the CPU can pipeline the heck out of everything.