Friday, December 12, 2014

Garbage collection and purity

One of my favorite things in language design is when a high-level language feature lets you change your low-level runtime system in interesting ways. In this post, I'll talk about one such idea, which I probably won't have the time to work out in detail any time soon, but which is pretty enough that I want to offer it to the interweb.

In 2004, David Bacon, Perry Cheng, and V.T. Rajan published their paper A Unified Theory of Garbage Collection, which showed that tracing collectors and reference counting were the two extreme ends of a spectrum of memory management strategies.

Consider Cheney's two-space collector. When performing a garbage collection, it will start with the root objects, copy them from old-space to new-space, and then recursively traverse the objects reachable from the roots, moving them from old-space to new-space as it goes.

An important property of Cheney's algorithm is that it never touches any object that needs to be freed; it only follows pointers from live objects to live objects.

On the other hand, consider the naive reference counting algorithm. When the reference count on an object goes to zero, the algorithm will decrement the reference counts of everything the object points to, recursively invoke itself on all of the objects whose reference counts also went to zero, and then free the original object.

Bacon, Cheng and Rajan observed that this reference counting algorithm has the opposite property as Cheney's algorithm -- it only traverses dead objects, and never follows a pointer inside a live object.

When object lifetimes are very short, tracing collectors beat reference counting, because very few objects will be live at gc time, and tracing only follows live objects. On the other hand, when object lifetimes are very long, reference counting wins, because very few objects are dead at each collection, and reference counting only follows dead objects.

So (a) the best memory management strategy to use depends on the lifetime of the object in question, and (b) every memory management algorithm can be seen as a hybrid of reference counting and tracing, based on which objects it chooses to follow.

In 2003, Stephen Blackburn and Kathryn Mckinley gave an incredibly slick application of this idea in their paper Ulterior Reference Counting. (Yes, the chronology is backwards: research is asynchronous.)

Most production garbage collectors are based on what is called "the generational hypothesis". This says that in typical programs, most objects have a very short lifetime, and only a few have a long lifetime. So it's a good idea to allocate objects into a region of memory called "the nursery", and when it fills up, to copy live objects out of it. Because of the generational hypothesis, most objects in the nursery will be dead, and so collecting the nursery will be very fast.

Blackburn and McKinley observed that the generational hypothesis also implies that if an object survives a young collection, it's likely to live for a very long time. So in their algorithm, they have a nursery as usual for tracing collectors. But for objects copied out of the nursery, they use reference counting. That is, for objects likely to have a short lifetime, they use tracing, and for objects likely to have a long lifetime, they use reference counting!

Now, if you're a functional programmer, the mere mention of reference counting very likely rings some alarm bells --- what about cycles? Reference counting is, after all, notorious for its inability to handle cyclic data.

Blackburn and McKinley handle this issue with a backup mark-and-sweep algorithm that is periodically run on the old generation. But wouldn't it be nice if we could just know that there isn't any cyclic data in the heap? Then we could do away with the backup collector, and implement a very simple "trace young, reference count old" collector.

Surprisingly, this is possible! If we program in a pure functional language, then under the usual implementation strategies, there will never be nontrivial cycles in the heap. The only way a cyclic reference could occur is in the closure of a recursive function definition, and we can simply mark such recursive pointers as something the gc doesn't need to follow.

So a very high-level property (purity) seems to imply something about our low-level runtime (the gc strategy strategy)! Proving this works (and benchmarking it) is something I don't have room on my plate for, but it's something I wish I could do...

1. Seems like a relevant citation would be Armstrong & Virding "One Pass Real-Time Generational Mark-Sweep Garbage Collection" 1995, which exploits Erlang's purity for its GC

2. There's a similar trick in the agda frp run time, which uses ref counting but has an explicit constructor for cyclic structures. This is all built on top of the JS gc, so probably isn't desperately efficient.

3. The Garbage Collection Handbook has another cool instance of this sort of trick: because a pure program's references always point "backwards" in time, when a Prolog program backtracks, all data allocated since the backtracking point is dead, and can be reclaimed in bulk.

I guess the same sort of trick could be applied to use a region system behind the scenes for bulk reclamation of ST-allocated data.

Unfortunately, as it is wont to do, the real world undermines the beautiful dream. Even ignoring mutable state, features like letrec and mdo are hugely useful, because cyclic structures are sometimes a property of the domain that a given piece of code is trying to model. Such as... well, an interpreter for a language with letrec ;-) or a state machine implemented as a tuple of mutually recursive functions. If the host language forbids cyclic heap structures in pure code, programmers will be forced to work around it by re-implementing a local heap in terms of, e.g, IntMap, perhaps wrapped up in a state monad. One could argue that forcing the potential-nontermination of code operating on cyclic data structures to appear as an explicitly typed effect is a good thing, but I'm not sure I buy it...

4. Google tried to stop Bob Harper from writing:

insofar as one considers haskell pure, it isn’t true that pure fp gives you the “no back-links” property. the problem is that haskell is lazy, and this means that it performs assignments at a blistering rate, violating the condition you mention. a truly pure function language such as ml does enjoy this property, precisely because it is not lazy. guy blelloch and i take advantage of this important property in our popl paper on i/o efficiency of functional algorithms. our results are for truly pure fp, not for lazy fp.

btw, it’s a myth that fix induces cycles in the heap. haskell implements recursion this way, to its detriment both semantically and operationally, but there is no need to have cycles to implement recursion. instead use the Y combinator, or its equivalents, which rely only on self-application, and do not involve any cycles in the store. sml/nj has used this method since its inception, and it works perfectly well.

a corollary is that for a truly pure fl, you need never worry about cycles at all. does this imply that copying collection is unnecessary? no. the reason is i/o efficiency, as previously mentioned, and ties in with another point in your post about the “generational hypothesis”. funnily enough, it’s not even true for haskell, because of the assignments, so i wonder why people keep bringing it up? dan spoonhower in his phd developed what i consider to be the right idea, which is to track the density of live objects in a page. dense pages should not be copied, but could be traced or pinned; sparse pages should be copied, but not traced. the question is how to know when a page is dense or spares. age is only a weak indicator of density; better to track the density directly, which dan did by using an off-by-one method similar to the way tex manages cross-references.

so, yes, truly pure functional languages (those that don't do you the favor of assignments behind your back) do indeed have good storage management properties, and, as a consequence, enjoy good i/o efficiency. yet more reasons not to be lazy.