Joachim Breitner

veggies: Haskell code generation from scratch

Published 2017-04-21 in sections English, Haskell.

How hard it is to write a compiler for Haskell Core? Not too hard, actually!

I wish we had a formally verified compiler for Haskell, or at least for GHC’s intermediate language Core. Now formalizing that part of GHC itself seems to be far out of reach, with the many phases the code goes through (Core to STG to CMM to Assembly or LLVM) and optimizations happening at all of these phases and the many complicated details to the highly tuned GHC runtime (pointer tagging, support for concurrency and garbage collection).

Introducing Veggies

So to make that goal of a formally verified compiler more feasible, I set out and implemented code generation from GHC’s intermediate language Core to LLVM IR, with simplicity as the main design driving factor.

You can find the result in the GitHub repository of veggies (the name derives from “verifiable GHC”). If you clone that and run ./boot.sh some-directory, you will find that you can use the program some-directory/bin/veggies just like like you would use ghc. It comes with the full base library, so your favorite variant of HelloWorld might just compile and run.

As of now, the code generation handles all the Core constructs (which is easy when you simply ignore all the types). It supports a good number of primitive operations, including pointers and arrays – I implement these as need – and has support for FFI calls into C.

Why you don’t want to use Veggies

Since the code generator was written with simplicity in mind, performance of the resulting code is abysmal: Everything is boxed, i.e. represented as pointer to some heap-allocated data, including “unboxed” integer values and “unboxed” tuples. This is very uniform and simplifies the code, but it is also slow, and because there is no garbage collection (and probably never will be for this project), will fill up your memory quickly.

Also, the code is currently only supports 64bit architectures, and this is hard-coded in many places.

There is no support for concurrency.

Why it might be interesting to you nevertheless

So if it is not really usable to run programs with, should you care about it? Probably not, but maybe you do for one of these reasons:

  • You always wondered how a compiler for Haskell actually works, and reading through a little over a thousands lines of code is less daunting than reading through the 34k lines of code that is GHC’s backend.
  • You have wacky ideas about Code generation for Haskell that you want to experiment with.
  • You have wacky ideas about Haskell that require special support in the backend, and want to prototype that.
  • You want to see how I use the GHC API to provide a ghc-like experience. (I copied GHC’s Main.hs and inserted a few hooks, an approach I copied from GHCJS).
  • You want to learn about running Haskell programs efficiently, and starting from veggies, you can implement all the trick of the trade yourself and enjoy observing the speed-ups you get.
  • You want to compile Haskell code to some weird platform that is supported by LLVM, but where you for some reason cannot run GHC’s runtime. (Because there are no threads and no garbage collection, the code generated by veggies does not require a runtime system.)
  • You want to formally verify Haskell code generation. Note that the code generator targets the same AST for LLVM IR that the vellvm2 project uses, so eventually, veggies can become a verified arrow in the top right corner map of the DeepSpec project.

So feel free to play around with veggies, and report any issues you have on the GitHub repository.

Comments

System FC Pro with coercion and equality is Haskell type system. There’s a paper, “Verification of System FC in Coq” in which they present a mechanized proof for the type system. They’ve also proved Haskell type safety through formalization of Haskell Core. Your information is incomplete.

#1 Insane XYZT am 2017-04-21

I am confused. Where am I saying something to the contrary? A type system proof is not a compiler correctness proof.

Also, they do not model Core, but rather a variant of System FC. For example there is no recursive let. letrec.

#2 Joachim Breitner am 2017-04-21

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.