Iterators, Generators and Continuations In Ruby

Continuations are weird things. They give you an ability to tweak program control flow in such potentially mind-boggling ways that I feel like people who really understand them tend to avoid them, or only use them in very structured and hidden ways.

Akinori Musha’s generator.rb in Ruby’s standard library brings together the concepts of iterators, generators and continuations in one nice juxtaposition, and also serves as a very compelling case for why continuations can be useful. I think it’s neat and I’d like to describe how it works.

Many people have written detailed descriptions of continuations (for example, see a page about call/cc). The simplistic definition is: they’re kinda like gotos, but without most of the problems/ambiguities that goto has. In other words, they let you jump to arbitrary points in your code, but with consistent and predictable behavior in terms of scoping, variable definitions, etc.

Continuations have traditionally been used for things like coroutines and robust error handling. Most modern languages that support continuations, like Ruby, also come with built-in threads and exception handling, so I don’t think continuations are quite as useful as they once were. But they still have their uses, such as generator.rb’s Generator class.

What are generators? In Ruby,[1] generators convert internal iterators to external iterators. An internal iterator is a method like Enumerable#each, which takes a block as an argument; an external iterator is an object with next and next? methods. Java, for example, only has external Iterators, defined by the Enumeration and Iterator interfaces.

Ruby’s syntax, standard library, and entire philosophy is built around the concept of the internal iterator, and rightly so: they’re vastly easier to write and easier to read than external iterators. Consider

  array.each { |e| puts e }
vs

  i = array.iterator();
  while(i.hasNext()) print(i.next());
You can see which the difference in clarity. Now what about this excerpt from RubyTorrent:

  @peers.find_all { |p| p.running? }.sort_by { |p| -p.start_time }.each do |p|
    # ...
  end
Imagine the grunt work and “temporary variable clutter” involved in rewriting that with external iterators.

So it should be clear that internal iterators are more clear, concise and elegant than external iterators. There’s only one problem: there are some things you just can’t do with internal iterators. Namely, you can’t run more than one internal iterator in parallel.

The classic problem demonstrating this is interleaving the results of two internal iterators. In Ruby, that means: given two Enumerables, yield the first element of the first Enumerable, then the first element of the second, then the second element of the first, then the second element of the second, then the third element of the first, and so on.

Of course, now that we have Enumerable#zip, I have to think of something else. :) So here’s another, less elegant example: given two Enumerables, the first of which is a sequence of integers, write a function select() that returns a sequence of arrays of i elements of the second enumerable, for each i given by the first. For example, select([1,4,0,2,3], [1,2,3,4,5,6,7,8,9,10]) should result in [[1], [2, 3, 4, 5], [], [6, 7], [8, 9, 10]]. And remember, we want it to work via each. Arrays happen to have random access so we could do this in other ways for them, but ideally we’d like to be able to do the same operation on, say, two network traffic streams, or the results of two expensive computations.

This can’t be done with internal iterators. Try it at home until you’ve convinced yourself of this. (Ok, it can be done by using continuations. But I’m going to call that cheating for the time being.) However, if we assume we have an external iterator for the second argument, then the problem is trivial:

  def select_with_extit(enumerable1, extit)
    enumerable1.map do |i|
      a = []
      i.times { a << extit.next }
      a
    end
  end
And how do we create that external iterator? With a Generator, like so:

  require 'generator'

  def select(enumerable1, enumerable2)
    select_with_extit(enumerable1, Generator.new(enumerable2))
  end

Well that’s pretty cool. As you can see, a Generator takes an Enumerable (or even a block, inline) and returns and object with #next and #next? methods. Problem solved.

You might want to take a break at this point and pour yourself a stiff drink.

Next question: how do Generators work? The answer, of course, is continuations. Let’s take a look inside the Generator class to figure out what’s going on. If we go through the #initialize, #yield and #next, we’ll understand the internals, and learn a few things about continuations and the idioms for using them in understandable and manageable ways.

The way to create a continuation is with Kernel#callcc. The continuation is passed as an argument to a block, rather than returned directly. This might seem strangle and clunky, but it becomes very useful when you want to pass arguments around via the continuations. We won’t get into that level of complexity here, but if this article leaves you hungry for more, consider my implementation of coroutines for future reading. Anyways, we’re not worried about passing values around with continuations here, so the idiom we’ll use to get a continuation at the current point in the code is this:

  c = callcc { |c| c }
The only trick to understanding continuations is this: the instant you see a callcc like that, alarm bells should be going off in your head. That callcc call is not an ordinary method call—it’s our goto label equivalent. The first time through, c is going to be the continuation object. We can jump back to this point later on in our program by calling the continuation (i.e. invoking c.call), and c is going to be something else then. If we call the continuation with a value x, we’ll jump back to that statement and c will be x. If we call the continuation without any value, c will be nil.

(This paragraph intentionally left blank.)

One common way to use this last feature (the nil value) is in the context of a simple if statement:

  if @cont = callcc { |c| c }
    ## stuff here is executed the first time only
  end
  ## if you call @cont.call, you'll jump right to here.
  ## the first time, @cont is the continuation.
  ## if you got here by calling @cont.call, @cont is now nil.
Armed with these two idioms, let’s see what happens in Generator#initialize.

  def initialize(enum = nil, &block)
    if enum
      @block = proc { |g|
        enum.each { |x| g.yield x }
      }
    else
      @block = block
    end

    @index = 0
    @queue = []
    @cont_next = @cont_yield = @cont_endp = nil

    if @cont_next = callcc { |c| c }
      @block.call(self)

      @cont_endp.call(nil) if @cont_endp
    end

    self
  end

  def yield(value)
    if @cont_yield = callcc { |c| c }
      @queue << value
      @cont_next.call(nil)
    end

    self
  end
We have three continuations, i.e. three points in the code we can jump back to: @cont_next, @cont_yield, and @cont_endp. Here’s what happens when you call the initializer: # @cont_next is set within #initialize. # @block.call(self) is called, which calls #yield with the value yielded by the internal iterator. # @cont_yield is set within #yield and the value is added to @queue. This is the cool part. #yield is running in the context of the block’s #each method. Every time we return from #yield, we’ll be immediately thrust in to it again with the next element of the block, until the block runs out of elements. This first time, however, we don’t return control to the the block; rather, we call @cont_next. # control is given back to #initialize. @cont_next is now nil, and we return self.

Phew! So at this point @cont_yield is set within the context of a the given internal iterator’s #each method, @cont_next and @cont_endp are nil, and we have one yielded value in @queue.

Now for #next?.

  def end?()
    if @cont_endp = callcc { |c| c }
      @cont_yield.nil? && @queue.empty?
    else
      @queue.empty?
    end
  end

  def next?()
    !end?
  end
Here’s what happens when we call it: # #end? is called. # @cont_endp is set within #end?. # @cont_yield.nil? && @queue.empty? is returned. Don’t worry about @cont_endp for now. We’ll see how that comes in to play a little later. The important thing to take away from this method is: as long as @cont_yield is non-nil, #end? will never return true. Finally, let’s look at #next.

  def next()
    if end?
      raise EOFError, "no more element is supplied" 
    end

    if @cont_next = callcc { |c| c }
      @cont_yield.call(nil) if @cont_yield
    end

    @index += 1

    @queue.shift
  end
When we call #next, here’s what happens: # #end? is called, setting @cont_endp. control is returned normally to #next. # @cont_next is set. # @cont_yield is called, and control is passed to #yield. # In #yield, we jump immediately to the end of the if statement, so we simply return self to the block’s #each method. # Two things happen now: if the block has more elements, #yield is called with the new element. @cont_yield is set, the new element is added to the queue, and control is passed back to #next by calling @cont_next, which simply shifts the first element off the queue and returns it. # If there are no more elements, what do you think happens? When #yield returns control to the block, the block returns control to… #initialize! (Hold your skull together.) We emerge from the @block.call(self) call. Then we call @call_endp, which transfers control back to #end, which simply evaluates @queue.empty?, returning that value all the way back to #next, which raises the exception. So there you see the where @cont_endp comes in to play.

There’s still a little bit I’ve left unexplained, but if you do a little gedankenexperiment and trace through the code in the cases where there are zero or only one element yielded by the block, you’ll see how it all plays out.

You can almost think of the continuation usage here as primitive, non-concurrent threads, where you explicitly transfer control between different contexts. We need to have one context “within” the internal iterator, and a separate context corresponding to the external program’s control flow, and we need to be able to hop around between these two at will. I think you can also see why people tend to avoid using continuations: they can make it difficult to discern how the code operates. But in this case, continuations were necessary to write the Generator class—it could not have been created using traditional control flow mechanisms.

1 Unfortunately, this terminology seems to vary across different languages. Python generators and iterators, for example, are related but very different concepts.

Comments

This entry has been archived and comments are no longer accepted.