Joachim Breitner

Winter is coming even more quickly

Published 2019-11-17 in sections English, Haskell.

TL;DR: I explain how I improved the performance of an interpreter for WebAssembly written in Haskell by plucking some low-hanging fruit.

Background

Motivated by my work at the DFINITY Foundation, I was looking into interpreters for WebAssembly written in Haskell, and found my colleague John Wiegley’s winter: A straight-forward port of the WebAssembly reference interpreter, written in Ocaml by Andreas Rossberg (another colleague of mine … I guess there is a common theme here.)

Obviously, an interpreter will never be as fast as a real compiler implementation (such as the one in v8, lucet or wasmtime). But for my purposes an interpreter is fine. Nevertheless, I don’t want it to be needlessly slow, and when I picked up wasm, it was clear that I had to work at least a little bit on performance.

None of this is to blame John, of course: I have no doubt that he could have sped it up at least as well! I assume it just wasn’t a priority at that time, and winter is a nicely done piece of Haskell.

A graph

I like statistics and graphs! In this blog post, I am always running the following very simple C program:

all the commits I produce while trying to make the code faster. This yields the following graph:

The fruit of my labor

Note that I had to draw this with a logarithmic y axis, because some of the improvements were so drastic that a linear graph would not be helpful.

But how did I get there? I will take you through a guided tour through all the commits that have improved performance, explain how I found an performance issue, what I did to improve things and why that improved things. To keep this post at a reasonable size, I split it into a series of blog post, faithful to the chronological order in which I performed them:

Here they are:

  1. Vectors
  2. SPECIALIZE
  3. Difference lists
  4. Export lists
  5. Eta-Expanding ReaderT
  6. Simpler code
  7. A Zipper
  8. Statistics (the making-of)

Was it worth it? I’d say so! Initially, running the program above would take 44 minutes (slow!) and continuously grows to require 6,7 GB of memory (Houston, we have a space leak!).

After all the steps above, memory consumption is constant and below 2MB, and the program finishes in 3,6s. Overall, this yields the following, very satisfying, statistics:

Improvement: Allocations: -99.60% Memory: -99.97% Time: -99.86% (Commits de9f4f8…5406efd)

By the way, the Ocaml reference interpreter takes 2,92s to run this (and it does a bit more, such as validation of the module). So there is still room for improvement…

Comments

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.