Monday, May 9, 2016

Löb's theorem is (almost) the Y combinator

As a computer scientist, one of the stranger sociological facts you'll encounter is that regular mathematicians tend to dislike it (and its cousin formal logic). The reason for that is nicely encapsulated in the following anecdote that Danny Hillis tells about Richard Feynman and the design of the Connection Machine computer:

[...] Richard had completed his analysis of the behavior of the router, and much to our surprise and amusement, he presented his answer in the form of a set of partial differential equations. To a physicist this may seem natural, but to a computer designer, treating a set of boolean circuits as a continuous, differentiable system is a bit strange. Feynman's router equations were in terms of variables representing continuous quantities such as "the average number of 1 bits in a message address." I was much more accustomed to seeing analysis in terms of inductive proof and case analysis than taking the derivative of "the number of 1's" with respect to time. Our discrete analysis said we needed seven buffers per chip; Feynman's equations suggested that we only needed five. We decided to play it safe and ignore Feynman.

The decision to ignore Feynman's analysis was made in September, but by next spring we were up against a wall. The chips that we had designed were slightly too big to manufacture and the only way to solve the problem was to cut the number of buffers per chip back to five. Since Feynman's equations claimed we could do this safely, his unconventional methods of analysis started looking better and better to us. We decided to go ahead and make the chips with the smaller number of buffers.

Fortunately, he was right. When we put together the chips the machine worked.

I've put the relevant bit in bold. To most mathematicians, inductive proofs are the tool of last resort. You always try to weave together the known properties of your object together in a creative way, and do an exhaustive case analysis only if all else fails. However, to logicians and computer scientists, induction is the tool of first resort – as soon as we have an object, we grind it to dust with induction and reassemble it bit by bit with the properties we want.

Regular mathematicians find this process roughly as appealing as filling in tax forms, but it does come with its compensations. One of those compensations is that we get to collect a wide variety of fixed point theorems, to justify an equally wide variety of recursive constructions.

The modal version of Löb's theorem is one of the more exotic fixed point theorems in logic. It is often glossed as representing the "abstract content" of Gödel's incompleteness theorem, which is the sort of explanation that conceals as much as it reveals. If you look at the proof, though, you'll see that its structure is a very close match for something very familiar to computer scientists – the proof of Löb's theorem is (almost) a derivation of the Y combinator!

To understand this, let's try formulating the fragment of provability logic we need as a natural deduction system. The types we will use are the following: $$ \begin{array}{lcl} A & ::= & b \bnfalt A \to B \bnfalt \mu a.\,A(a) \bnfalt \Box A \end{array} $$

This looks like the simply-typed lambda calculus, extended with recursive types and a special $\Box A$ type former. This is almost, but not quite, correct! $\Box A$ is indeed a modal operator -- in particular, it satisfies the axioms of the K4 modality. However, the recursive type $\mu a.\,A$ is not an ordinary recursive type, because unrestricted recursive types lead to logical inconsistency. Instead, $\mu a.\,A$ is a modal fixed point. That is, it satisfies the isomorphism: $$ \mu a.\,A \simeq A{[\Box(\mu a.\,A)/a]} $$

Hopefully, this will become a bit clearer if we present typing and inference rules for this logic. We'll use a variant of Davies and Pfenning's judgemental formulation of modal logic, modified to support the K4 modality rather than the S4 modality. The basic judgement will be $\judge{\Delta; \Gamma}{e}{A}$, where $\Delta$ are modal variables and $\Gamma$ are ordinary variables. $$ \begin{array}{ll} \rule{ x:A \in \Gamma} { \judge{\Delta; \Gamma}{x}{A} } & \\[1em] \rule{ \judge{\Delta; \Gamma, x:A}{e}{B} } { \judge{\Delta; \Gamma}{\fun{x}{e}}{A \to B} } & \rule{ \judge{\Delta; \Gamma}{e}{A \to B} & \judge{\Delta; \Gamma}{e'}{A} } { \judge{\Delta; \Gamma}{e\;e'}{B} } \\[1em] \rule{ \judge{\Delta; \Gamma}{e}{A{[\Box(\mu a.\,A)/a]}} } { \judge{\Delta; \Gamma}{\fold(e)}{\mu a.\,A} } & \rule{ \judge{\Delta; \Gamma}{e}{\mu a.\,A} } { \judge{\Delta; \Gamma}{\unfold(e)}{A{[\Box(\mu a.\,A)/a]}} } \\[1em] \rule{ \judge{\Delta; \Delta}{e}{A} } { \judge{\Delta; \Gamma}{\box(e)}{\Box A} } & \rule{ \judge{\Delta; \Gamma}{e}{\Box A} & \judge{\Delta, x:A; \Gamma}{e'}{C} } { \judge{\Delta; \Gamma}{\letv{\box(x)}{e}{e'}}{C} } \end{array} $$

The two key rules are the introduction and elimination rules for $\Box A$.

The elimination rule is the standard Davies-Pfenning rule. It says that if you can prove $\Box A$, you can add an $A$ to the modal context $\Delta$ and carry on with your proof.

The introduction rule is different, and weaker than the DP S4 rule. It says you can introduce a $\Box A$, if you can prove $A$ only using the modal hypotheses in $\Delta$. So the variables in $\Gamma$ are deleted, and the ones in $\Delta$ are copied into the ordinary hypothetical context. This setup is weaker than the DP introduction rule, and captures the fact that in K4 you can't derive the comonadic unit rule that $\Box A \to A$.

Once we have this, it becomes possible to derive Löb's theorem. Before we do that, let's look at the derivation of the Y combinator. To derive a fixed point at a type $P$, we introduce a recursive type $μa.\, a \to P$, and then we can add fold's and unfold's to the standard definition of the Y combinator:

let y : (P → P) → P = 
  λf. let f' = f in                   
      let g =                         
         (λr. let r' = r in          
              f' (unfold r' (r')))
      in 
      (g (fold g))     

You could also write this more compactly as λf. let g = λx. f (unfold x x) in g (fold g), but I have introduced some spurious let-bindings for a reason which will become apparent in a second.

Namely, virtually the same construction works with Löb's theorem! We will take the modal recursive type $\mu a.\,a \to P$, and then add some boxing and unboxing to manage the boxes (I'll annotate the variables with their types so that you can see what's going on):

let lob : □(□P → P) → □P = 
  λf. let box(f)' = f in                      // f' modal, □P → P
      let box(g) =                            // g modal, □(μa. a → P) → P)
         box(λr. let box(r') = r in           // r' modal, μa. a → P
                 f' box(unfold r' (box r')))  // application type P
      in 
      box(g (fold g))     

Note that if you took $\Box$ to be the identity constructor, and erased all the $\box$'s from this term, then this proof of Löb's theorem is actually exactly the same as Curry's Y combinator!

However, people more typically take $\Box$ to mean something like "quotation" or "provability". What proof shows is that there's nothing mysterious going on with self-reference and quotation -- or rather, it is precisely the same mystery as the universality of the lambda-calculus.

Postscript: One of the nice things about blogging is that you can be more informal than in a paper, but it's also one of the pitfalls. I got an email asking about the history of this system, and of course a blog post doesn't have a related work section! Basically, a few years ago, I worked out a calculus giving a syntax for applicative functors. An applicative is basically a monoidal functor with a strength, and I noticed that the modal logic K is basically a monoidal functor without a strength. There matters stood until last year I read Kenneth Foner's Haskell Symposium paper Getting a Quick Fix on Comonads, which was about programming with Löb's theorem. This was interesting, and after Wikipedia's proof of Löb's theorem, I saw that (a) the proof was basically Curry's paradox, and (b) I could adapt the rules for applicatives to model the modal logic K4, which could (with a small bit of tinkering) have a very similar calculus to applicatives. (In fact, the proof I give of Löb's theorem is a transliteration of the proof on Wikipedia into the system in this blog post!)

I should emphasize that I haven't really proved anything about these systems: I think the proofs should be straightforward, but I haven't done them. Luckily, at least for K, I don't have to -- recently, Alex Kavvos has put a draft paper on the arXiv, System K: a simple modal λ-calculus, which works out the metatheory of K with a dual-zone Davies/Pfenning-style calculus. If you need syntax for applicative functors minus strength, do take a look!

Friday, May 6, 2016

Life Update: Moving to Cambridge

This August I'll be changing jobs, by leaving Birmingham and joining the the Computer Laboratory at the University of Cambridge.

I lived in Cambridge when I did a postdoc with Nick Benton at Microsoft Research, so it'll be nice to be able to reconnect with friends I haven't been able to see regularly a while. At the same time, of course, I will miss the friends I have made here, such as Dan Ghica and Paul Levy.

Thursday, April 21, 2016

The Essence of Event-Driven Programming

Together with Jennifer Paykin and Steve Zdancewic, I have a new draft paper, The Essence of Event-Driven Programming:

Event-driven programming is based on a natural abstraction: an event is a computation that can eventually return a value. This paper exploits the intuition relating events and time by drawing a Curry-Howard correspondence between a functional event-driven programming language and a linear-time temporal logic. In this logic, the eventually proposition ◇A describes the type of events, and Girard’s linear logic describes the effectful and concurrent nature of the programs. The correspondence reveals many interesting insights into the nature of event- driven programming, including a generalization of selective choice for synchronizing events, and an implementation in terms of callbacks where ◇A is just ¬□¬A.

If you've been following the work on temporal logic and functional reactive programming that I and others (e.g., Alan Jeffrey, Wolfgang Jeltsch, Andrew Cave et al) have been pursuing for a while, then you'll have heard that there's a really compelling view of reactive programs as proof terms for linear temporal logic.

However, the programs that come out seem to be most naturally conceived of as a kind of "higher-order synchronous dataflow" (in the same space as languages like ReactiveML or Lucid). These languages are based on the synchrony hypothesis, which postulates that programs do relatively little work relative to the clock rate of the event loop, and so a polling implementation where programs wake up once on each trip through the event loop is reasonable. (If you've done game programming, you may be familar with the term immediate mode GUI, which is basically the same idea.

But most GUI toolkits typically are not implemented in this style. Instead, they work via callbacks, in an event-driven style. That is, you write little bits of code that get invoked for you whenever an event occurs. The advantage of this is that it lets your program sleep until an event actually happens, and when an event happens, only the relevant pieces of your program will run. For applications (like a text editor) where the display doesn't change often and events happen at a relatively low clock rate, you can avoid a lot of useless computation.

In this paper, we show that the event-based style also fits into the temporal logic framework! Basically, we can encode events (aka promises or futures) using the double-negation encoding of the possibility monad from modal logic. And then the axioms of temporal logic (such as the linearity of time) work out to be natural primitives of event-based programming, such as the select operator.

Also, before you ask: an implementation is in progress, but not yet done. I'll post a link when it works!

Tuesday, April 19, 2016

Postdoctoral Position in Recursion, Guarded Recursion and Computational Effects

We are looking for a Postdoctoral Research Fellow to work on an EPSRC-funded project Recursion, Guarded Recursion and Computational Effects.

We shall be investigating fine-grained typed calculi and operational, denotational and categorical semantics for languages that combine either guarded or general recursion and type recursion with a range of computational effects. This will build on existing work such as call-by-push-value and Nakano's guarded recursion calculus.

The successful candidate will have a PhD in programming language semantics or a related discipline, and an ability to conduct collaborative mathematical research. Knowledge of operational and denotational semantics, category theory, type theory and related subjects are all desirable.

You will work with Paul Levy (principal investigator) and Neelakantan Krishnaswami (co-investigator) as part of the Theoretical Computer Science group at the University of Birmingham.

The position lasts from 1 October 2016 until 30 September 2019.

You can read more and apply for the job at:

Don't hesitate to contact us (particularly Paul Levy) informally to discuss this position.

The closing date is soon: 22 May 2016.

Reference: 54824 Grade point 7 Starting salary £28,982 a year, in a range up to £37,768 a year.

Monday, March 21, 2016

Agda is not a purely functional language

One of the more surprising things I've learned as a lecturer is the importance of telling lies to students. Basically, the real story regarding nearly everything in computer science is diabolically complicated, and if we try to tell the whole truth up front, our students will get too confused to learn anything.

So what we have to do is to start by telling a simple story which is true in broad outline, but which may be false in some particulars. Then, when they understand the simple story, we can point out the places where the story breaks down, and then use that to tell a slightly less-false story, and so on and so on, until they have a collection of models, which range from the simple-but-inaccurate to the complex-but-accurate. Then our students can use the simplest model they can get away with to solve their problem.

The same thing happens when teaching physics, where we start with Newtonian mechanics, and then refine it with special relativity and quantum mechanics, and then refine it further (in two incompatible ways!) with general relativity and quantum field theories. The art of problem solving is to pick the least complicated model that will solve the problem.

But despite how essential lies are, it's always a risk that our students discover that we are telling lies too early -- because their understanding is fragile, they don't have a sense of the zone of applicability of the story, and so an early failure can erode the confidence they need to try tackling problems. Happily, though, one advantage that computer science has over physics is that computers are more programmable than reality: we can build artificial worlds which actually do exactly obey the rules we lay out.

This is why my colleague Martín Escardó says that when teaching beginners, he prefers teaching Agda to teaching Haskell -- he has to tell fewer lies. The simple story that we want to start with when teaching functional programming is that data types are sets, and computer functions are mathematical functions.

In the presence of general recursion, though, this is a lie. Martín has done a lot of work on more accurate stories, such as data types are topological spaces and functions are continuous maps, but this is not what we want to tell someone who has never written a computer program before (unless they happen to have a PhD in mathematics)!

Agda's termination checking means that every definable function is total, and so in Agda it's much closer to true that types are sets and functions are functions. But it's not quite true!

Here's a little example, as an Agda program. First, let's get the mandatory bureaucracy out of the way -- we're defining the program name, and importing the boolean and identity type libraries.

module wat where 

open import Data.Bool
open import Relation.Binary.PropositionalEquality using (_≡_; refl)

Next, let's define the identity function on booleans.


f : Bool  Bool
f true = true
f false = false

In fact, let's define it a second time!

g : Bool  Bool
g true = true
g false = false

So now we have two definitions of the identity function f, and g, so they must be equal, right? Let's see if we can write a proof this fact.

wat : f  g 
wat = refl

Oops! We see that refl is in red -- Agda doesn't think it's a proof. In other words, two functions which are identical except for their name are not considered equal -- α-equivalence is not a program equivalence as far as Agda is concerned!

This is somewhat baffling, but does have an explanation. Basically, an idea that was positively useful in non-dependent functional languages has turned toxic in Agda. Namely, type definitions in ML and Haskell are generative -- two declarations of the same type will create two fresh names which cannot be substituted for each other. For example, in Ocaml the following two type definitions

type foo = A | B 
type bar = A | B 

are not interchangeable -- Ocaml will complain if you try to use a foo where a bar is expected:

OCaml version 4.02.3

let x : foo = A 

let bar_id (b : bar) = b

let _ = bar_id x;;

Characters 96-97:
  let _ = bar_id x;;
                 ^
Error: This expression has type foo but an
        expression was expected of type bar

In other words, type definitions have an effect -- they create fresh type names. Haskell alludes to this fact with the newtype keyword, and in fact that side-effect is essential to its design: the type class mechanism permits overloading by type, and the generativity of type declarations is the key feature that lets users define custom overloads for their application-specific types.

Basically, Agda inherited this semantics of definitions from Haskell and ML. Unfortunately, this creates problems for Agda, by complicating its equality story. So we can't quite say that Agda programs are sets and functions. We could say that they are sets and functions internal to the Schanuel topos, but (a) that's something we may not want to hit students with right off the bat, and (b) Agda doesn't actually include the axioms that would let you talk about this fact internally. (In fact, Andy Pitts tells me that the right formulation of syntax for nominal dependent types is still an open research question.)

Thursday, March 17, 2016

Datafun: A Functional Datalog

Together with Michael Arntzenius, I have a new draft paper, Datafun: a Functional Datalog
Datalog may be considered either an unusually powerful query language or a carefully limited logic programming language. It has been applied successfully in a wide variety of problem domains thanks to its "sweet spot" combination of expressivity, optimizability, and declarativeness. However, most use-cases require extending Datalog in an application-specific manner. In this paper we define Datafun, an analogue of Datalog supporting higher-order functional programming. The key idea is to track monotonicity via types.

I've always liked domain specific languages, but have never perpetrated one before. Thanks to Michael, now I have! Even better, it's a higher-order version of Datalog, which is the language behind some of my favorite applications, such as John Whaley and Monica Lam's BDDBDDB tool for writing source code analyses.

You can find the Github repo here, as well. Michael decided to implement it in Racket, which I had not looked at closely in several years. It's quite nice how little code it took to implement everything!

Wednesday, November 18, 2015

Error Handling in Menhir

My favorite category of paper is the "cute trick" paper, where the author describes a programming trick that is both easy to implement and much better than the naive approach. One of my favorite cute trick papers is Clinton L. Jeffrey's 2003 ACM TOPLAS paper, Generating LR Syntax Error Messages from Examples. The abstract explains the idea:
LR parser generators are powerful and well-understood, but the parsers they generate are not suited to provide good error messages. Many compilers incur extensive modifications to the source grammar to produce useful syntax error messages. Interpreting the parse state (and input token) at the time of error is a nonintrusive alternative that does not entangle the error recovery mechanism in error message production. Unfortunately, every change to the grammar may significantly alter the mapping from parse states to diagnostic messages, creating a maintenance problem.

Merr is a tool that allows a compiler writer to associate diagnostic messages with syntax errors by example, avoiding the need to add error productions to the grammar or interpret integer parse states. From a specification of errors and messages, Merr runs the compiler on each example error to obtain the relevant parse state and input token, and generates a yyerror() function that maps parse states and input tokens to diagnostic messages. Merr enables useful syntax error messages in LR-based compilers in a manner that is robust in the presence of grammar changes.

Basically, you can take a list of syntactically incorrect programs and error messages for them, and the tool will run the parser to figure out which state the LR automaton will get stuck in for each broken program, and then ensure that whenever you hit that state, you report the error message you wrote. I like this idea an awful lot, because it means that even if you use the full power of LR parsing, you still have a better error message story than any other approach, including hand-coded recursive descent!

Recently, Francois Pottier has studied a generalization of this idea, in his new draft Reachability and error diagnosis in LR(1) automata.

Given an LR(1) automaton, what are the states in which an error can be detected? For each such “error state”, what is a minimal input sentence that causes an error in this state? We propose an algorithm that answers these questions. Such an algorithm allows building a collection of pairs of an erroneous input sentence and a diagnostic message, ensuring that this collection covers every error state, and maintaining this property as the grammar evolves. We report on an application of this technique to the CompCert ISO C99 parser, and discuss its strengths and limitations.
Basically, in Jeffrey's approach, you write broken programs and their error messages, and Pottier has woroked out an algorithm to generate a covering set of broken programs. Even more excitingly, Pottier has implemented this idea in a production parser generator --- the most recent version of his Menhir parser generator.

So now there's no excuse for using parser combinators, since they offer worse runtime performance and worse error messages! :)