Joachim Breitner


Template Haskell recompilation

Published 2020-07-01 in sections English, Haskell.

I was wondering: What happens if I have a Haskell module with Template Haskell that embeds some information from the environment (time, environment variables). Will such a module be reliable recompiled? And what if it gets recompiled, but the source code produced by Template Haskell is actually unchanged (e.g., because the environment variable has not changed), will all depending modules be recompiled (which would be bad)?

Here is a quick experiment, using GHC-8.8:

/tmp/th-recom-test $ cat Foo.hs
{-# LANGUAGE TemplateHaskell #-}
{-# OPTIONS_GHC -fforce-recomp #-}
module Foo where

import Language.Haskell.TH
import Language.Haskell.TH.Syntax
import System.Process

theMinute :: String
theMinute = $(runIO (readProcess "date" ["+%M"] "") >>= stringE)
[jojo@kirk:2] Mi, der 01.07.2020 um 17:18 Uhr ☺
/tmp/th-recom-test $ cat Main.hs
import Foo
main = putStrLn theMinute

Note that I had to set {-# OPTIONS_GHC -fforce-recomp #-} – by default, GHC will not recompile a module, even if it uses Template Haskell and runIO. If you are reading from a file you can use addDependentFile to tell the compiler about that depenency, but that does not help with reading from the environment.

So here is the test, and we get the desired behaviour: The Foo module is recompiled every time, but unless the minute has changed (see my prompt), Main is not recomipled:

/tmp/th-recom-test $ ghc --make -O2 Main.hs -o test
[1 of 2] Compiling Foo              ( Foo.hs, Foo.o )
[2 of 2] Compiling Main             ( Main.hs, Main.o )
Linking test ...
[jojo@kirk:2] Mi, der 01.07.2020 um 17:20 Uhr ☺
/tmp/th-recom-test $ ghc --make -O2 Main.hs -o test
[1 of 2] Compiling Foo              ( Foo.hs, Foo.o )
Linking test ...
[jojo@kirk:2] Mi, der 01.07.2020 um 17:20 Uhr ☺
/tmp/th-recom-test $ ghc --make -O2 Main.hs -o test
[1 of 2] Compiling Foo              ( Foo.hs, Foo.o )
[2 of 2] Compiling Main             ( Main.hs, Main.o ) [Foo changed]
Linking test ...

So all well!

Update: It seems that while this works with ghc --make, the -fforce-recomp does not cause cabal build to rebuild the module. That’s unfortunate.

Managed by an eleven year old

Published 2020-06-07 in sections English, Digital World.

This weekend I had some pretty good uncle time. My eleven year old nephew (after beating me in a fair match of tennis) wanted to play with his Lego Mindstorms set. He never even touches it unless I am around to help him, but then he quiet enjoys. As expected, when I ask him what we should try to build, he tends to come up with completely unrealistic or impossible ideas, and we have to somehow reduce the scope to something doable.

This time round, inspired by their autonomous vacuum cleaner, he wanted to build something like that. That was convenient, because now I could sell him what I wanted to try to do, namely make the robot follow a wall at more or less constant distance, as a useful first step.

We mounted the infra red distance sensor at an 45° angle to the front-left, and then I built a very simple control loop – measure the distance, subtract 40, apply a factor, and feed that into the “steering” input of the drive action, and repeat. I must admit that I was pretty proud to see just how well that very simple circuit worked: The little robot turned sharp corners in both directions, and otherwise drove nicely parallel and with constant distance to the wall.

I was satisfied with that achievement and would have happily ended it here before we might be disappointed by more ambitious and then failing goals.

But my nephew had not forgotten about the vacuum cleaner, and casually asked if I could make the robot draw an outline of its path, and thus of the room, on the display. Ok, where do I begin to explain just how unfeasible that is … yes, it seemed one can draw to the display. But how should the robot know where it is? How far it turns? How far it went? This is programming by connecting boxes, and I would not expect such an interface to allow for complex logic (right?). And I would need trigonometry and stuff.

But he didn’t quite believe it, and thus got me thinking … indeed, the arithmetic block can evaluate more complex formulas, involving sin  and cos  … so maybe if I can somehow keep track of where the robot is heading … well, let’s give it a try.

So by dragging boxes and connecting wires, I implemented a simple logic (three variables, x and y for current position, alpha for current heading; in each loop add the “steering” input onto alpha, and add (sin(alpha),cos(alpha)) onto the current position; throw in some linear factors to calibrate; draw pixel at (x, y)). And, to my nephew’s joy and my astonishment, the robot was drawing a curvy line that clearly correlates with the taken path!

It wasn’t respecting the angles perfectly, a square might not properly close, but half an hour earlier I would have actively bet against that we would pull this off!

After a few turns

After a few turns

I was, I must admit, a bit proud. But not only about the technical nerding, but also about my uncleing: That night, according to his parents, my nephew said that he can’t remember ever being so happy!

A Telegram bot in Haskell on Amazon Lambda

Published 2020-04-06 in sections English, Haskell.

I just had a weekend full of very successful serious geekery. On a whim I thought: “Wouldn't it be nice if people could interact with my game Kaleidogen also via a telegram bot?” This led me to learn about how I write a Telegram bot in Haskell and how I can deploy such a Haskell program to Amazon Lambda. In particular the latter bit might be interesting to some of my readers, so here is how went about it.


Kaleidogen is a little contemplative game (or toy game where, starting from just unicolored disks, you combine abstract circular patterns to breed more interesting patterns. See my FARM 2019 talk for more details, or check out the source repository. BTW, I am looking for help turning it into an Android app!

KaleidogenBot in action

KaleidogenBot in action

Amazon Lambda

Amazon Lambda is the “Function as a service” offering of Amazon Web Services. The idea is that you don’t rent a server, where you have to deal with managing the whole system and that you are paying for constantly, but you just upload the code that responds to outside requests, and AWS takes care of the rest: Starting and stopping instances, providing a secure base system etc. When nobody is using the service, no cost occurs.

This sounds ideal for hosting a toy Telegram bot: Most of the time nobody will be using it, and I really don't want to have to baby sit yet another service on my server. On Amazon Lambda, I can probably just forget about it.

But Haskell is not one of the officially supported languages on Amazon Lambda. So to run Haskell on Lambda, one has to solve two problems:

  • how to invoke the Haskell code on the server, and
  • how to build Haskell so that it runs on the Amazon Linux distribution

A Haskell runtime for Lambda

For the first we need a custom runtime. While this sounds complicated, it is actually a pretty simple concept: A runtime is an executable called bootstrap that queries the Lambda Runtime Interface for the next request to handle. The Lambda documentation is phrased as if this runtime has to be a dispatcher that calls the separate function’s handler. But it could just do all the things directly.

I found the Haskell package aws-lambda-haskell-runtime which provides precisely that: A function

runLambda :: (LambdaOptions -> IO (Either String LambdaResult)) -> IO ()

that talks to the Lambda Runtime API and invokes its argument on each message. The package also provides Template Haskell magic to collect “handlers“ of any JSON-able type and generates a dispatcher, like you might expect from other, more dynamic languages. But that was too much magic for me, so I ignored that and just wrote the handler manually:

main :: IO ()
main = runLambda run
   run ::  LambdaOptions -> IO (Either String LambdaResult)
   run opts = do
    result <- handler (decodeObj (eventObject opts)) (decodeObj (contextObject opts))
    either (pure . Left . encodeObj) (pure . Right . LambdaResult . encodeObj) result

data Event = Event
  { path :: T.Text
  , body :: Maybe T.Text
  } deriving (Generic, FromJSON)

data Response = Response
  { statusCode :: Int
  , headers :: Value
  , body :: T.Text
  , isBase64Encoded :: Bool
  } deriving (Generic, ToJSON)

handler :: Event -> Context -> IO (Either String Response)
handler Event{body, path} context =

I expose my Lambda function to the world via Amazon’s API Gateway, configured to just proxy the HTTP requests. This means that my code receives a JSON data structure describing the HTTP request (here called Event, listing only the fields I care about), and it will respond with a Response, again as JSON.

The handler can then simply pattern-match on the path to decide what to do. For example this code handles URLs like /img/CAFFEEFACE.png, and responds with an image.

handler :: TC -> Event -> Context -> IO (Either String Response)
handler Event{body, path} context
    | Just bytes <- isImgPath path >>= T.decodeHex = do
        let pngData = genPurePNG bytes
        pure $ Right Response
            { statusCode = 200
            , headers = object [ "Content-Type" .= ("image/png" :: String) ]
            , isBase64Encoded = True
            , body = T.decodeUtf8 $ LBS.toStrict $ Base64.encode pngData

isImgPath :: T.Text -> Maybe T.Text
isImgPath  = T.stripPrefix "/img/" >=> T.stripSuffix ".png"

If this program would grow more, then one should probably use something more structured for routing here; maybe servant, or bridging towards wai apps (amost like wai-lamda, but that still assumes an existing runtime, instead of simply being the runtime). But for my purposes, no extra layers of indirection or abstraction are needed!

Deploying Haskell to Lambda

Building Haskell locally and deploying to different machiens is notoriously tricky; you often end up depending on a shared library that is not available on the other platform. The aws-lambda-haskell-runtime package, and similar projects like serverless-haskell, solve this using stack and Docker – two technologies that are probably great, but I never warmed up to them.

So instead adding layers and complexities, can I solve this instead my making things simpler? If I compiler my bootstrap into a static Linux binary, it should run on any Linux, including Amazon Linux.

Unfortunately, building Haskell programs statically is also notoriously tricky. But it is made much simpler by the work of Niklas Hambüchen and others in the context of the Nix package manager, coordinated in the static-haskell-nix project. The promise here is that once you have set up building your project with Nix, then getting a static version is just one flag away. The support is not completely upstreamed into nixpkgs proper yet, but their repository has a nix file that contains a nixpkgs set with their patches:

let pkgs = (import (sources.nixpkgs-static + "/survey/default.nix") {}).pkgs; in

This, plus a fairly standard nix setup to build the package, yields what I was hoping for:

$ nix-build -A kaleidogen
$ file result/bin/kaleidogen-amazon-lambda
result/bin/kaleidogen-amazon-lambda: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped
$ ls -sh result/bin/kaleidogen-amazon-lambda
6,7M result/bin/kaleidogen-amazon-lambda

If we put this file, named bootstrap, into a zip file and upload it to Amazon Lambda, then it just works! Creating the zip file is easily scripted using nix:

  function-zip = pkgs.runCommandNoCC "kaleidogen-lambda" {
    buildInputs = [ ];
  } ''
    mkdir -p $out
    cp ${kaleidogen}/bin/kaleidogen-amazon-lambda bootstrap
    zip $out/ bootstrap

So to upload this, I use this one-liner (line-wrapped for your convenience):

nix-build -A function-zip &&
aws lambda update-function-code --function-name kaleidogen \
  --zip-file fileb://result/

Thanks to how Nix pins all dependencies, I am fairly confident that I can return to this project in 4 months and still be able to build it.

Of course, I want continuous integration and deployment. So I build the project with GitHub Actions, using a cachix nix cache to significantly speed up the build, and auto-deploy to Lambda using aws-lambda-deploy; see my workflow file for details.

The Telegram part

The above allows me to run basically any stateless service, and a Telegram bot is nothing else: When configured to act as a WebHook, Telegram will send a request with a message to our Lambda function, where we can react on it.

The telegram-api package provides bindigs for the Telegram Bot API (although I had to use the repository version, as the version on Hackage has some bitrot). Slightly simplified I can write a handler for an Update:

handleUpdate :: Update -> TelegramClient ()
handleUpdate Update{ message = Just m } = do
  let c = ChatId (chat_id (chat m))
  liftIO $ printf "message from %s: %s\n" (maybe "?" user_first_name (from m)) (maybe "" T.unpack (text m))
  if "/start" `T.isPrefixOf` fromMaybe "" (text m)
  then do
    rm <- sendMessageM $ sendMessageRequest c "Hi! I am @KaleidogenBot. …"
    return ()
  else do
    m1 <- sendMessageM $ sendMessageRequest c "One moment…"
    withPNGFile  $ \pngFN -> do
      m2 <- uploadPhotoM $ uploadPhotoRequest c
        (FileUpload (Just "image/png") (FileUploadFile pngFN))
      return ()
handleUpdate _ u =
  liftIO $ putStrLn $ "Unhandled message: " ++ show u

and call this from the handler that I wrote above:

    | path == "/telegram" =
      case eitherDecode (LBS.fromStrict (T.encodeUtf8 (fromMaybe "" body))) of
        Left err -> …
        Right update -> do
          runTelegramClient token manager $ handleUpdate Nothing update
          pure $ Right Response
            { statusCode = 200
            , headers = object [ "Content-Type" .= ("text/plain" :: String) ]
            , isBase64Encoded = False
            , body = "Done"

Note that the Lambda code receives the request as JSON data structure with a body that contains the original HTTP request body. Which, in this case, is itself JSON, so we have to decode that.

All that is left to do is to tell Telegram where this code lives:

curl --request POST \
  --header 'content-type: application/json'
  --data '{"url": ""}'

As a little add on, I also created a Telegram game for Kaleidogen. A Telegram game is nothing but a webpage that runs inside Telegram, so it wasn’t much work to wrap the Web version of Kaleidogen that way, but the resulting Telegram game (which you can access via still looks pretty neat.

No /dev/dri/renderD128

I am mostly happy with this setup: My game is now available to more people in more ways. I don’t have to maintain any infrastructure. When nobody is using this bot no resources are wasted, and the costs of the service are neglectible -- this is unlikely to go beyond the free tier, and even if it would, the cost per generated image is roughly USD 0.000021.

There is one slight disappointment, though. What I find most intersting about Kaleidogen from a technical point of view is that when you play it in the browser, the images are not generated by my code. Instead, my code creates a WebGL shader program on the fly, and that program generates the image on your graphics card.

I even managed to make the GL rendering code work headlessly, i.e. from a command line program, using EGL and libgbm and a helper written in C. But it needs access to a graphics card via /dev/dri/renderD128. Amazon does not provide that to Lambda code, and neither do the other big Function-as-a-service providers. So I had to swallow my pride and reimplement the rendering in pure Haskell.

So if you think the bot is kinda slow, then that’s why. Despite properly optimizing the pure implementation (the inner loop does not do allocations and deals only with unboxed Double# values), the GL shader version is still three times as fast. Maybe in a few years GPU access will be so ubiquitous that it’s even on Amazon Lambda; then I can easily use that.

30 years of Haskell

Published 2020-04-01 in sections English, Haskell.

Vitaly Bragilevsky, in a mail to the GHC Steering Committee, reminded me that the first version of the Haskell programming language was released exactly 30 years ago. On April 1st. So that raises the question: Was Haskell just an April fool's joke that was never retracted?

The cover of the 1.0 Haskell report

The cover of the 1.0 Haskell report

My own first exposure to Haskell was in April 2005; the oldest piece of Haskell I could find on my machine is this part of a university assignment from April:

> pascal 1 = [1]
> pascal (n+1) = zipWith (+) (x ++ [0]) (0 : x) where x = pascal n

This means that I now have witnessed half of Haskell's existence. I have never regretted getting into Haskell, and every time I come back from having worked in other languages (which all have their merits too), I greatly enjoy the beauty and elegance of expressing my ideas in a lazy and strictly typed language with a concise syntax.

I am looking forward to witnessing (and, to a very small degree, shaping) the next 15 years of Haskell.

Animations in Kaleidogen

Published 2020-03-31 in sections English, Haskell.

A while ago I wrote a little game (or toy) called Kaleidogen. It is a relatively contemplative game where, starting from just unicolored disks, you combine abstract circular patterns to breed more interesting patterns. See my FARM 2019 talk for more details, or check out the source repository.

It has mostly been quiet with this game, but I finally got around to add a little bit of animation: When you have bred one of these patterns, you can animate its genesis, from nothing to a complex patterns, as you can see in this screencast:

Kaleidogen, animated

By the way: I am looking for collaborators who help me to get this into the Play Store properly, so let me know if you want to play around with Haskell, Android, Nix, OpenGL and cross-compilation.

git post-squash

Published 2020-02-03 in sections English, Digital World.

I wrote a little git tool that helps in an environment where PRs are merged using squash merges, but you still want to deal with feature branches properly.

What is a squash merge?

One popular workflow involving Git and Github is a squash-merge based workflow: You develop your feature on a feature branch (say featureA), adding commits as you go, possibly merging from master a few times:

M1 ─ M2 ─────────── M3 ────── M4            (master)
       ╲              ╲
        A1 ─ A2 ─ A3 ─ A4 ─ A5              (featureA)

When the feature is ready, you merge master into featureA a last time (e.g. to check on your CI infrastructure that this merge does not break the build):

M1 ─ M2 ─────────── M3 ────── M4            (master)
       ╲              ╲         ╲
        A1 ─ A2 ─ A3 ─ A4 ─ A5 ─ A6         (featureA)

and now you do a squash merge (or use Github’s green “Squash merge” button, or’s squash merge action). The result is a new commit M5 on master that contains all the changes from featureA:

M1 ─ M2 ─────────── M3 ────── M4 ─ M5       (master)
       ╲              ╲         ╲
        A1 ─ A2 ─ A3 ─ A4 ─ A5 ─ A6         (featureA)

Note that there is no line from A6 to M5. This means that the git history of master is clean, and does not contain the usually boring and unhelpful history of how featureA came to be; no “fix typo” commits, no “merge master into featureA” commit.

But the downside is that, as far as git is concerned, this commit is totally unrelated to the featureA branch. This is not a problem as long as featureA lives on its own. But it becomes a problem if there are feature branches building off featureA:

What is the problem with squash merge?

Consider the situation above, but add featureB to the mix, a feature branch that was created off featureA:

M1 ─ M2 ─────────── M3 ────── M4 ── M5      (master)
       ╲              ╲         ╲
        A1 ─ A2 ─ A3 ─ A4 ─ A5 ─ A6         (featureA)
               ╲         ╲
                B1 ────── B2 ─ B3           (featureB)

We now want to bring the latest changes from featureA and master into featureB. Merging featureA into featureB is straight-forward:

M1 ─ M2 ─────────── M3 ────── M4 ── M5      (master)
       ╲              ╲         ╲
        A1 ─ A2 ─ A3 ─ A4 ─ A5 ─ A6         (featureA)
               ╲         ╲         ╲
                B1 ────── B2 ─ B3 ─ B4      (featureB)

But if we run git merge master now, we are likely running into very unfortunate git conflicts. Because to git, M5 is unrelated to featureA, it does not know that all the changes already have been merged into featureB when we created the merge commit B4!

But we know that M5 contains nothing that isn’t already in featureB, because M5 was a squash commit of A6.

The manual way of resolving this is to run

git merge -s ours master

which tells git: Pretend that we merged master into this, but don’t actually touch any of the files, everything on the current branch is already in the form we want. This way, we get

M1 ─ M2 ─────────── M3 ────── M4 ── M5      (master)
       ╲              ╲         ╲     ╲
        A1 ─ A2 ─ A3 ─ A4 ─ A5 ─ A6    ╲    (featureA)
               ╲         ╲         ╲    ╲
                B1 ────── B2 ─ B3 ─ B4 ─ B5 (featureB)

Note: git merge -s ours is not the same as the git merge -X ours! See the manpage for git merge for details.

How does git-post-squash help?

While the manual way works, one needs to be careful: If master has progressed further, or if featureA was not fully up-to-date before the squash merge, using git merge -s ours will easily and silently undo changes that were already committed to master.

So instead run

git post-squash master

which will do git merge -s ours, but it will

  • find the right commit on master to use (it may not be the latest) and
  • double-check that nothing is lost.

It does so by picking the latest commit on master that has the same tree as some commit on the current branch that is not yet on master.

In the example above, it would pick M5 because it has the same tree as A6, which is a commit that exists on featureB, but not on master.

Convinced? Go and get it on

Faster Winter: Statistics (the making-of)

Published 2019-11-24 in sections English, Haskell.

(This is an appendix to the “faster winter” series, please see that post for background information.)

Did you like the graph and the stats that I produced? Just for completeness, I am including the various scripts I used. Nothing super exciting to see here, but maybe someone finds this useful.

This little shell one-liner collects the run-time statistics for each commit in the interesting range (line-wrapped for your convenience):

for h in $(git log 1cea7652f48fad348af914cb6a56b39f8dd99c6a^..5406efd9e057aebdcf94d14b1bc6b5469454faf3 --format=%H)
  echo -n "$h"
  git checkout -q "$h"
  cabal new-build -v0
  echo -n ":"
  rm -f stats/$h.txt
  for i in $(seq 1 5)
    cabal -v0 new-run exe:wasm-invoke -- -w loop.wasm  -f canister_init +RTS -t >/dev/null 2>> stats/$h.txt
    echo -n .

A small Perl script takes the minimum for each measurement across the five runs, and produces a CSV file:


use List::Util qw(min);

my @alloc;
my @in_use;
my @time;

while (<>) {
  m!<<ghc: (\d+) bytes, \d+ GCs, \d+/\d+ avg/max bytes residency \(\d+ samples\), (\d+)M in use, [\d.]+ INIT \(([\d.]+) elapsed\), [\d.]+ MUT \(([\d.]+) elapsed\), [\d.]+ GC \(([\d.]+) elapsed\) :ghc>>! or die $!;
  push @alloc, 0+$1;
  push @in_use, $2;
  push @time, $3+$4+$5;

printf "%d;%d;%f\n", min(@alloc), min(@in_use), min(@time);

To create a full file for all the commits in the range that have files, I used this bash one-liner (again line-wrapped for your convenience):

echo 'commit;allocations;memory;time' > stats.csv
for h in $(git log 1cea7652f48fad348af914cb6a56b39f8dd99c6a^..5406efd9e057aebdcf94d14b1bc6b5469454faf3 --format=%H|tac)
  git log -n 1 --oneline $h
  test -f stats/$h.txt && echo "$(echo $h|cut -c-7);$(./ < stats/$h.txt)" | tee -a stats.csv

The stats can be turned into the graphc using pgfplots by compiling this LaTeX file:


% From
    /pgfplots/flexible xticklabels from table/.code n args={3}{%


\pgfplotsset{every axis/.style={ymin=0}}
  scale only axis,
  axis y line*=left,
  ylabel=Allocation (bytes),
  flexible xticklabels from table={stats.csv}{[index]0}{col sep=semicolon},
  xticklabel style={rotate=90, anchor=east, text height=1.5ex, font=\ttfamily, color=black},
\addplot[const plot mark mid, color=skyblue1]
  table [x expr=\coordindex+1, y index=1, col sep=semicolon] {stats.csv};

  scale only axis,
  axis y line*=right,
  ylabel=Memory (MB),
  x tick style={draw=none},
\addplot[const plot mark mid, color=green]
  table [x expr=\coordindex+1, y index=2, col sep=semicolon] {stats.csv};

  scale only axis,
  axis y line*=right,
  ylabel=Time (seconds),
  x tick style={draw=none},
\pgfplotsset{every outer y axis line/.style={xshift=2cm}, every tick/.style={xshift=2cm}, every y tick label/.style={xshift=2cm} }
\addplot[const plot mark mid, color=red]
  table [x expr=\coordindex+1, y index=3, col sep=semicolon] {stats.csv};

And finally this Perl script allows me to paste any two lines from the CSV file and produces appropriate Markdown for the “improvement” lines in my posts:


my $first = 1;

my $commit;
my $alloc;
my $in_use;
my $time;

while (<>) {
  /(.*);(.*);(.*);(.*)/ or die;
  unless ($first) {
    printf "**Improvement**: Allocations: %+.2f%%  Memory: %+.2f%%  Time: %+.2f%% (Commit [%s...%s](\n",
      (100 * ($2/$alloc - 1)),
      (100 * ($3/$in_use - 1)),
      (100 * ($4/$time - 1)),
  $first = 0;
  $commit = $1;
  $alloc = $2;
  $in_use = $3;
  $time = $4;

Faster Winter 7: The Zipper

Published 2019-11-24 in sections English, Haskell.

(This is the seventh optimization presented in the “faster winter” series, please see that post for background information.)

The last bit of performance could be considered a domain-specific optimization, as one might describe it as “introducing a control stack to the interpreter”. But in a different light, one could consider it the introduction of a more appropriate data structure, by applying a “zipper”-like transformation to the existing data structure.

The problem is that the state of the system (datatype Code) is a stack of values and a stack of instructions. Simplifying this for this post, we might have

data Code = Code [Value] [Instr]
data Instr
  = Const Value | Add | Sub | Return
  | Labled Int Code

The interpreter gets such a Code, but does not always just execute the top-most instruction: If that is a Labeled instruction, it has to execute the next instruction in the argument of that Labeled. (It is not very important at this point to understand what a Labeled is used for). So we might have a Code that looks like the following:

c1 = Code [] [Labeled 1 (Code [2] [Labeled 2 (Code [3,4] [Add]), Sub]), Return]

The next instruction to execute is actually the Add. But in order to find that, the function step has to look under the first Labeled, look under the second Labeled, then execute step (Code [3,4] [Add]) = Code [7] [], and finally wrap it again in the two Labeled, to produce:

c2 = Code [] [Labeled 1 (Code [2] [Labeled 2 (Code [7] []), Sub]), Return]

Then eval calls step again, and step has to look inside the Labeled again to find the next instruction to execute.

It would be easier if the next instruction to execute would be presented to step more conveniently, right as a field of Code. But in order to do this, we have to move the Labeled entries “out of the way”. I do that by adding a new, first parameter to Code where I keep a stack of all the Label constructor that were in the way, in reverse order. So the c1 above would now be

data Code = Code Control [Value] Instr
data Control = Empty | Labeled Int Code

c1' = Code (Labeled 2 (Code (Labeled 1 (Code Empty [] [Return])) [2] [Sub]) [3,4] [Add]

Can you see how this relates to c1 above? The important piece here is that the interpreter finds the next instruction to execute always at the head of the instruction list right of the outermost code, and as long as there is something to execute there, it doesn't have to touch the Control stack at all.

This required touching some more lines of code than the previous optimizations, but doing so didn't require much creativity, as the old and new Code types are in clear correspondance, and that guides me in how to use adjust the code. And it’s certainly worth it:

Improvement: Allocations: -46.17% Time: -36.88% (Commit e66f1e0...57f3e9d)

Faster Winter 6: Simpler Code

Published 2019-11-24 in sections English, Haskell.

(This is the sixth optimization presented in the “faster winter” series, please see that post for background information.)

The Wasm reference interpreter has a function step that performs one single one step of evaluation, by taking the state before (in a type called code), and returning the state after the step. The function eval calls step, checks if the result is “done” (no instructions left to do), and if it is not done, recursively calls eval again. This way, evaluation progresses step by step until it is done.

The Haskell port follows the same pattern of a single-step step and a driver eval, but it chose to write the code continuation passing style: Instead of returning, the function step takes a function that it passes the result to. So the code looks like this (slightly simplified):

type CEvalT m a = ReaderT Config (ExceptT EvalError m) a

step :: PrimMonad => Code -> (Code -> CEvalT m r) -> CEvalT m r
step c k =  … k new_c …

eval :: PrimMonad => Code -> CEvalT m (Stack Value)
eval c =
    if is_done c
    then stack c
    else step c eval

There must have been a good reason to prefer this style over the plain style, but I was curious if it was really helpful. So I changed it to the following, more straight-forward code:

type CEvalT m a = ReaderT Config (ExceptT EvalError m) a

step :: PrimMonad => Code -> CEvalT m Code
step c k =  … new_c …

eval :: PrimMonad => Code f m -> CEvalT f m (Stack Value)
eval c =
    if is_done c
    then stack c
    else step c >>= eval

And indeed, the simpler code worked better:

Improvement: Allocations: -9.6 Time: -16.91% (Commit eb8add3...e66f1e0)

Again, avoiding function values (as we construct them as the contination closure) might have helped here.

Or maybe the simpler code allowed GHC to notice that the Code value, which is simply a tuple with a different name, is constructed by step but immediatelly deconstructed by eval, and thus GHC could optimize that away (by “unboxing” the argument and/or result of step and simply passing the components of the tuple).

And finally, independent of all the performance questions, it also makes the code easier to understand.

Faster Winter 5: Eta-Expanding ReaderT

Published 2019-11-22 in sections English, Haskell.

(This is the fifth optimization presented in the “faster winter” series, please see that post for background information.)

Another good approach to performance turning is look at the code after GHC optimized it. So I planted a

{-# OPTIONS_GHC -ddump-simpl -dsuppress-coercions -dsuppress-unfoldings -dsuppress-module-prefixes #-}

at the top of Eval.hs, and looked through the code. This is not Haskell, but rather “GHC Core”, a simpler functional programming language that GHC uses internally. It is more verbose and less pretty, so it takes a bit of getting used to, but it’s nothing to be afraid of when you are a somewhat experienced Haskell developer.

There I found (much simplified) the following pattern:

step :: Instr -> Config -> IO Result
step e = case e of
  Add -> \c ->do stuff …
  Del -> \c ->do stuff …

That’s bad! Yes, Haskell is a functional language, and passing around anonymous functions is very nice to write expressive code, and for most purposes it is not too slow … but in an inner loop, you really don’t want any such closures. So where did this come from? And as expected, the Haskell source did not have those inner lambdas. Instead, it was using a very innocent looking monad transformer:

step :: Instr -> ReaderT Config IO Result
step e = case e of
  Add -> do stuff …
  Del -> do stuff …

A ReaderT r m a is just a different way of writing r -> m a that allows us to use do-notation or the monad combinators without having to pass the r around explicity, and as such it is indeed very convenient. But not as efficient as if we had written

step :: Instr -> Config -> IO Result
step e c = case e of
  Add ->do stuff …
  Del ->do stuff …

where the step function takes two arguments right away, and no anonymous functions are created.

Why doesn’t our amazing Haskell compiler figure out that this would be better? Because it is not better in all situations: If we store step e :: ReaderT Config IO Result somewhere and and use it many times, with the same e but passing many different c :: Config, then we have to do the case e analysis only once. This can sometimes be better, so the compiler has to leave it in that form, in case we did it intentionally.

(Incidentially, the question of how to allow the compiler to eta-expand more functions seems to eternally haunt me, and its pursuit even led to a PhD thesis.

So how can we fix it? One relatively crude way is to shove it into the compiler face that we really want step to be a function with two parameters by wrapping the whole body in, well, a lambda.. But we still want to use the Reader monad in the body of step

So I came up with this:

step :: Instr -> ReaderT Config IO Result
step e = ReaderT $ \c -> ($ c) $ runReaderT $ case e of
  Add ->do stuff …
  Del ->do stuff …

Now the \c -> is outside the case, the compiler adds it to the arguments of step and we get the code that we want (confirmed by a quick peek at the Core).

Improvement: Allocations: -23.20% Time: -23.00% (Commit f5a0dd2...894070f)

I used this pattern in more than once place, so I wanted to abstract it into a little helper definition. But that’s not so easy: If I just write

etaReaderT :: ReaderT r m a -> ReaderT r m a
etaReaderT m = ReaderT $ \c -> ($ c) $ runReaderT m

step :: Instr -> ReaderT Config IO Result
step e = etaReaderT $ case e of
  Add ->do stuff …
  Del ->do stuff …

then the whole thing doesn't work any more! Because now, the case e is again “outside” the \c ->.

I whined on twitter about this, and Sebastian Graf reminded me helpfully of GHC.Exts.oneShot, a little magic function that was added to GHC 5 years ago … by some forgetful person: me.

If we use this in the right place inside etaReaderT it tells GHC in a soothing voice “hey! it’s ok! you can move this lambda out of cases. believe me. it’s gonna be ok”. And with this, it works:

etaReaderT :: ReaderT r m a -> ReaderT r m a
etaReaderT = ReaderT . oneShot . runReaderT

I wonder if this function would make a good addition to Control.Monad.Trans.Reader.

Incidentally, if you look at the code at the end of all my optimizations, there is no mention of etaReaderT any more: Subsequent optimizations simplified the code so much that eventually GHC was able to do this transformation without my help.