Showing only posts with topic "ruby" [.rss for this topic]. See all posts.

“sudo gem install” considered harmful

Update 2010/10/02: see here for a real-life example.

If you habitually type sudo gem install on your development box, you are potentially exposing yourself to nasty behavior. If you have sudo gem install as part of your automated deploy process, you are begging for something tragic to happen.

Consider:

  1. A gem can execute arbitrary code at install time.1
  2. Anyone with the proper permissions on rubygems.org can publish a new version of a gem at any point. This code is not reviewed or audited by anyone before publication.
  3. gem install pulls in the latest version of any dependencies that it can, for the entire dependency graph.

All it takes is for one malicious or incompetent gem writer to do something wrong, even in a gem you don’t directly depend on, and sudo gem install will destroy your box.

Happily, rubygems work perfectly well in non-root mode. For local development, you can leave out the sudo and gems will be installed in your home directory. For production use, you should be running servers and apps as non-root users anyways.

Please, stop propagating the sudo gem install meme.

1 See http://github.com/wmorgan/killergem.

Calculating string display width in Ruby

Most programmers are by now familiar with the difference between the number of bytes in a string and the number of characters. Depending on the string’s encoding, the relationship between these two measures can be either trivially computable or complicated and compute-heavy.

With the advent of Ruby 1.9, the Ruby world at last has this distinction formally encoded at the language level: String#bytesize is the number of bytes in the string, and String#length and String#size the number of characters.

But when you’re writing console applications, there’s a third measure you have to worry about: the width of the string on the display. ASCII characters take up one column when displayed on screen, but super-ASCII characters, such as Chinese, Japanese and Korean characters, can take up multiple columns. This display width is not trivially computable from the byte size of the character.

Finding the display width of a string is critical to any kind of console application that cares about the width of the screen, i.e. is not simply printing stuff and letting the terminal wrap. Personally, I’ve been needing it forever:

  1. Trollop needs it because it tries to format the help screen nicely.
  2. Sup needs it in a million places because it is a full-fledged console application and people use it for reading mail in all sorts of funny languages.

The actual mechanics of how to compute string width make for an interesting lesson in UNIX archaeology, but suffice it to say that I’ve travelled the path for you, with help from Tanaka Akira of pp fame, and I am happy to announce the release of the Ruby console gem.

The console gem currently provides these two methods:

  • Console.display_width: calculates the display width of a string
  • Console.display_slice: returns a substring according to display offset and display width parameters.

There is one horrible caveat outstanding, which is that I haven’t managed to get it to work on Ruby 1.8. Patches to this effect are most welcome, as are, of course, comments and suggestions.

Try it out!.

Simple breakpoints in Ruby

Sometimes it’s nice to have a simple breakpointing function that will dump you into an interactive session with all your local variables in place.

There are more sophisticated solutions for the world of multiple servers and daemonized code, but after some fighting with IRB, I find myself using this little snippet of code in many projects:

require 'irb'

module IRB
  def IRB.start_with_binding binding
    IRB.setup __FILE__
    w = WorkSpace.new binding
    irb = Irb.new w
    @CONF[:MAIN_CONTEXT] = irb.context
    irb.eval_input
  end
end

## call me like this: breakpoint binding
def breakpoint binding; IRB.start_with_binding binding end

As the comment states, you can invoke the breakpoint at any point by inserting a breakpoint binding statement anywhere in your code. Once that line is reached, you’ll be dumped into an IRB session with local variables intact. Quitting the session resumes execution.

Obviously with this method I’m having you pass in your binding explicitly. There are fancier tricks for capturing the binding of the caller (involving kernel trace functions and continuations), but I’m opting for the simpler solution here.

Works with Ruby 1.9, of course.

Ruby, Ncurses and blocked threads

If you’re writing a multithreaded Ruby program that uses ncurses, you might be curious why program stops running when you call Ncurses.getch. Sup has been plagued by this issue since 2005. Thankfully, I think I finally understand it.

The problem is that there is a bug in the Ruby ncurses library such that using blocking input will block all Ruby threads when it waits for user input, instead of just the calling thread. So Ncurses.getch will cause everything to grind to a halt. This is probably due to the library not releasing the GVL when blocking on stdin.

This bug is present in the latest rubygems version of curses, 0.9.1. It has been fixed in the latest libncurses-ruby Debian packages (1.1-3).

To see if you have a buggy, blocking version of the ruby ncurses library, run this program:

require 'rubygems'
require 'ncurses'
require 'thread'

Ncurses.initscr
Ncurses.noecho
Ncurses.cbreak
Ncurses.curs_set 0

Thread.new do
  sleep 0.1
  Ncurses.stdscr.mvaddstr 0, 0, "library is GOOD."
end

begin
  Ncurses.stdscr.mvaddstr 0, 0, "library is BAD."
  Ncurses.getch
ensure
  Ncurses.curs_set 1
  Ncurses.endwin
  puts "bye"
end

(I purposely require rubygems in there to load the rubygems ncurses library if it’s present; you can drop this if you don’t use rubygems.)

There are two workarounds to this problem. First, you can simply tell ncurses to use nonblocking input:

Ncurses.nodelay Ncurses.stdscr, true

But if you’re writing a multithreaded app, you probably aren’t interested in nonblocking input, unless you want a nasty polling loop.

The better choice is to add a call to IO.select before getch, which will block the calling thread until there’s an actual keypress, and then allow getch to pick it up:

if IO.select [$stdin], nil, nil, 1
  Ncurses.getch
end

IO.select requires a delay, so you’ll have to handle the periodic nils that generates. But the background threads should no longer block.

There is one further complication, which is that you won’t be able to receive the pseudo-keypresses Ncurses emits when the terminal size changes, since they don’t show up on $stdin and thus the select won’t pass. The solution is to install your own signal handler:

trap("WINCH") { ... handle sigwinch  ... }

You will still see the resize events coming from getch, but only once the user presses a key. You can drop them at this point.

That should be enough to make any multithreaded Ruby ncurses app able function. Of course, once everyone’s using a fixed version fo the ncurses libraries, you can do away with the select and set nodelay to false.

(One last hint for the future: I’ve found it necessary to set it to false before every call to getch; otherwise a ctrl-c will magically change it back to nonblocking mode. Not sure why.)

Fibers via Continuations

In the last post I talked about some differences between fibers and continuations. What may not have been clear is that continuations are more primitive and flexible than fibers are. In fact, you can implement fibers using continuations.

Here’s how. The basic idea is that we want to maintain two variables with continuations in them, inside and outside. The first one will transfer execution into the block of code that forms the fiber. The second will transfer control back to the outside world.

When the outside world calls #resume, we save our continuation point as outside, and call the current inside continuation. When, within the block, #yield is called, we save our current continuation point as inside, and transfer code back to the current outside.

There are a few more details in terms of passing values from #yield to #resume, handling the return value of the block, and handling excessive calls to #resume, but that’s the basic story. Here’s the code:

require 'continuation'

class CFiber
  class Error < StandardError; end

  def initialize &block
    @block = block
    callcc do |cc|
      @inside = cc
      return
    end
    @var = @block.call self
    @inside = nil
    @outside.call
  end

  def resume
    raise Error, "dead cfiber called!" unless @inside
    callcc do |cc|
      @outside = cc
      @inside.call
    end
    @var
  end

  def yield var
    callcc do |cc|
      @var = var
      @inside = cc
      @outside.call
    end
  end
end

This is also runnable on Ruby 1.8—just remove the require.

So why does Ruby 1.9 bother to implement fibers, when we can just use continuations? I don’t know what the real answer is, but “speed” is at least a good answer. Let’s do some some benchmarking to compare the two:

require 'benchmark'
n = ARGV.shift.to_i
Benchmark.bm do |bm|
  bm.report " fibers" do
    f = Fiber.new do
      x, y = 0, 1
      loop do
        Fiber.yield y
        x, y = y, x + y
      end
    end

    n.times { |i| f.resume }
  end

  bm.report "cfibers" do
    f = CFiber.new do |c|
      x, y = 0, 1
      loop do
        c.yield y
        x, y = y, x + y
      end
    end

    n.times { |i| f.resume }
  end
end

We’ll start with backporting that code to the Ruby 1.8.7 that Ubuntu provides (ruby 1.8.7 (2008-08-11 patchlevel 72)). For 10000 Fibonacci numbers, we see:

user system total real
cfibers 0.810000 0.070000 0.880000 0.879930

That’s roughly 11.4kfps (that’s thousand Fibonacci numbers per second) that we can produce using continuation-based fibers.

Let’s try the ancient Ruby 1.9.0 that Ubuntu provides (Ruby 1.9.0 (2008-06-20 revision 17482)):

user system total real
fibers 0.040000 0.000000 0.040000 0.037583
cfibers 18.680000 1.770000 20.450000 20.482006

Wow, fibers are fast: 250kfps. But things have gotten significantly worse for cfibers, clocking at a measely 0.489kfps for cfibers.

Finally let’s try the latest and greatest Ruby 1.9.1 (ruby 1.9.1p129 (2009-05-12 revision 23412)):

user system total real
fibers 0.040000 0.000000 0.040000 0.035148
cfibers 0.150000 0.000000 0.150000 0.155890

Fibers are just as fast as before, but continuations have improved dramatically—from 11.4kfps to 66.6kfps. Still, native fibers are more than three times faster.

So perhaps Ruby 1.9.1 is the best of both worlds. When you need fast non-preemptive concurrency, you can use native fibers; when you need to implement your own crazy control structures, you can use continuations and be assured that they’re still pretty darn fast (at least, as far as Ruby operations are concerned).

Fibers vs Continuations

Ruby 1.9 has both fibers and continuations. The two are often mentioned in the same breath. They do vaguely similar-sounding things, and are implemented in Ruby 1.9 with similar mechanics underneath the hood, much as how continuations and threads were implemented with the same underlying mechanics in Ruby 1.8 [PDF, p. 14].

But implementation similarities aside, continuations and fibers have very different semantics. A fiber behaves as a thread without preemption. Like a thread, you create it, and it eventually dies; unlike a thread, you must manually call yield and resume to transfer control in and out of it, instead of just letting the runtime call them for you whenever it feels like it. Like a thread, when you resume a fiber, you have the same call stack and heap state (local variables) as when you left.

What’s nice about fibers is that, since you keep explicit control of the order of execution, you can get thread-like behavior without all the hassle of mutexes and synchronization. Of course you have to deal with the hassle of ordering all your operations, but you at least have the option of avoiding the fun race-condition game that always seems to crop up in threaded programming.

What about continuations? Instead of fibers’ create, kill, yield, and resume operations, a continuation only really has two operations: capture and resume. A continuation is captured once, and may be resumed multiple times. When you resume a continuation, the call stack is reverted to what it looked like when it was captured, but the heap state stays the same. There’s no exit point or death for a continuation (at least until Ruby gets bounded continuations); execution simply continues from the capture point.

What’s nice about continuations is that you can use them to implement control structures. Loops, exceptions, cross-procedure gotos… almost every control structure you can come up with can be implemented with continuations. In fact, you can implement fibers using continuations!

Let’s look at an example. Here’s the fiber-based Fibonacci computation from the InfoQ article on Fibers in Ruby 1.9:

fib = Fiber.new do  
  x, y = 0, 1 
  loop do  
    Fiber.yield y 
    x, y = y, x + y 
  end 
end 
20.times { puts fib.resume }

Here we call yield from within the fiber once we’ve computed a number, which transfers control to the main function, and which prints out the number yielded and then calls resume to transfer control back to the fiber. A thread version looks very similar:

require 'thread'
q = SizedQueue.new 1
fib = Thread.new do  
  x, y = 0, 1 
  loop do  
    q.push y 
    x, y = y, x + y 
  end 
end 

20.times { puts q.pop }

Since we don’t have explicit control over the scheduling, we implicitly scheduled the order of operations by using a synchronized SizedQueue data structure, which blocks the computation thread from computing a new number until the printing thread is ready to receive it. (There are many ways we could’ve accomplished this.)

Here’s the version using continuations:

require 'continuation'
c, x, y, i = callcc { |cc| [cc, 0, 1, 1] }
puts y
c.call c, y, x + y, i + 1 if i < 20

You’ll notice there are no loops, and variables are never changed after assignment. In fact the code is starting to look suspiciously like an inductive proof, with one line that like a base case and another line that looks like a recursive case. You can see why continuations make functional-programming enthusiasts get excited!

This implementation works because resuming the continuation (the call to c.call) replaces the call stack and point of execution with what they were at the point it was captured (the call to callcc). In contrast, resume-ing the fiber moved us back to the point we were when the fiber called yield, and so the outer loop in the fiber implementation was necessary.

Beyond call stacks, another major difference between fibers and continuations is the way the heap is treated. Multiple fibers on the same section of code do not share local variables. Multiple continuations on the same section of code do. Here’s a brief example. First, the fibers version:

fib = (0 ... 5).map do |i|
  Fiber.new do
    x = 0
    Fiber.yield x
    x += 1
  end
end

fib.each { |f| puts f.resume }

We create five fibers, and call resume on them once each. As you’d expect, this prints out a series of 0’s. The variable x is not shared between the multiple fibers. Of course, the fiber constructor here is a block, and blocks are closures, so we could make them share state by moving the x = 0 line outside the map line. But that’s a result of having closures, not of fibers per se.

Let’s try an example with multiple continuations, all jumping into the same point in the code:

require 'continuation'

x = 0
c = callcc { |cc| cc }
d = callcc { |cc| cc } if c
e = callcc { |cc| cc } if c && d
f = callcc { |cc| cc } if c && d && e
x += 1

puts x
c.call if c
d.call if d
e.call if e
f.call if f

We initialize x to 0, create 4 separate continuations, add one to x, and call the continuations in order. (The postfix if statements ensure that the continuations variables aren’t set or called more than once. Calling c.call without arguments will jump back to the c = callcc line and set c to nil.)

Silly, but it illustrates the point: the output is “1 2 3 4 5”, meaning that the four continuations all share the same heap. When d is called, its x is the same as the x of c, and even though it was 0 when d was captured, it has since been modified by the resumption of c. When e is called, its x is also the same x, and so on. (In fact this whole example depends on this behavior—each of the continuation variables are only set once, and must “retain” their value across all rentries to continuations above them.)

In additon to multiple continuations being able to share state, the converse is true too: multiple resumes on the same continuation will share state:

require 'continuation'

x = 0
c = callcc { |cc| cc }
x += 1
puts x
c.call c while x < 5

This outputs the same thing as the examples above.

Hopefully that clears up some of the confusion. Here’s the summary:

Fibers Continuations
Four operations: create, exit, yield, resume. Two operations: capture and resume.
Upon resume, call stack is wherever it was at the last yield. Upon resume, call stack is where it was when captured.
Do not share state except via closure. Multiple continuations and multiple invocations of the same continuation can share state.

Vim ruby syntax comment reformatting

The vim ruby syntax seems to screw up comments that have multiple hashes. E.g. I like to differentiate

### section heading comments,
## non-inline comments, and
x = a + b # inline comments

But reformatting the comments (e.g. with “gq}”) always screws them up, unless you do:

$ mkdir -p ~/.vim/after/syntax
$ cat > ~/.vim/after/syntax/ruby.vim
set comments=n:#

which tells vim that multiple hash marks are ok.

ruby readline filename tab completion

Navigating the ancient Readline interface is a bit complicated. Here’s how to get filename completion when you hit the tab button:

require 'readline'
def ask_for_filename question, start_dir=""
  Readline.completion_append_character = nil
  Readline.completion_proc = lambda do |prefix|
    files = Dir["#{start_dir}#{prefix}*"]
    files.
      map { |f| File.expand_path(f) }.
      map { |f| File.directory?(f) ? f + "/" : f }
  end
  Readline.readline question
end

A ruby puzzle

Name this function:

  inject({}) { |h, o| h[yield(o)] = o; h }.values

Hints:

  1. It’s a variant of a common stdlib function.
  2. The name has 7 characters, one of which is an underscore.

A survey of my rubyist colleagues suggests this is a hard question. Much harder than writing the function given the name, which took about 10 seconds.