Joachim Breitner

Blog

veggies: Haskell code generation from scratch

Published 2017-04-21 in sections English, Haskell.

How hard it is to write a compiler for Haskell Core? Not too hard, actually!

I wish we had a formally verified compiler for Haskell, or at least for GHC’s intermediate language Core. Now formalizing that part of GHC itself seems to be far out of reach, with the many phases the code goes through (Core to STG to CMM to Assembly or LLVM) and optimizations happening at all of these phases and the many complicated details to the highly tuned GHC runtime (pointer tagging, support for concurrency and garbage collection).

Introducing Veggies

So to make that goal of a formally verified compiler more feasible, I set out and implemented code generation from GHC’s intermediate language Core to LLVM IR, with simplicity as the main design driving factor.

You can find the result in the GitHub repository of veggies (the name derives from “verifiable GHC”). If you clone that and run ./boot.sh some-directory, you will find that you can use the program some-directory/bin/veggies just like like you would use ghc. It comes with the full base library, so your favorite variant of HelloWorld might just compile and run.

As of now, the code generation handles all the Core constructs (which is easy when you simply ignore all the types). It supports a good number of primitive operations, including pointers and arrays – I implement these as need – and has support for FFI calls into C.

Why you don't want to use Veggies

Since the code generator was written with simplicity in mind, performance of the resulting code is abysmal: Everything is boxed, i.e. represented as pointer to some heap-allocated data, including “unboxed” integer values and “unboxed” tuples. This is very uniform and simplifies the code, but it is also slow, and because there is no garbage collection (and probably never will be for this project), will fill up your memory quickly.

Also, the code is currently only supports 64bit architectures, and this is hard-coded in many places.

There is no support for concurrency.

Why it might be interesting to you nevertheless

So if it is not really usable to run programs with, should you care about it? Probably not, but maybe you do for one of these reasons:

  • You always wondered how a compiler for Haskell actually works, and reading through a little over a thousands lines of code is less daunting than reading through the 34k lines of code that is GHC’s backend.
  • You have wacky ideas about Code generation for Haskell that you want to experiment with.
  • You have wacky ideas about Haskell that require special support in the backend, and want to prototype that.
  • You want to see how I use the GHC API to provide a ghc-like experience. (I copied GHC’s Main.hs and inserted a few hooks, an approach I copied from GHCJS).
  • You want to learn about running Haskell programs efficiently, and starting from veggies, you can implement all the trick of the trade yourself and enjoy observing the speed-ups you get.
  • You want to compile Haskell code to some weird platform that is supported by LLVM, but where you for some reason cannot run GHC’s runtime. (Because there are no threads and no garbage collection, the code generated by veggies does not require a runtime system.)
  • You want to formally verify Haskell code generation. Note that the code generator targets the same AST for LLVM IR that the vellvm2 project uses, so eventually, veggies can become a verified arrow in the top right corner map of the DeepSpec project.

So feel free to play around with veggies, and report any issues you have on the GitHub repository.

Birthday greetings communication behaviour

Published 2017-03-28 in sections English.

Randall Munroe recently mapped how he communicated with his social circle. As I got older recently, I had an opportunity to create a similar statistics that shows how people close to me chose to fulfil their social obligations (given the current geographic circumstances):

Communication variants

Communication variants

(Diagram created with the xkcd-font and using these two stackoverflow answers.)

In related news: Heating 3½ US cups of water to a boil takes 7 minutes and 40 seconds on one particular gas stove, but only 3 minutes and 50 seconds with an electric kettle, despite the 110V-induced limitation to 1.5kW.

(Diagram updated on March 30, as the actual mail is slower than the other channels.)

Why prove programs equivalent when your compiler can do that for you?

Published 2017-02-06 in sections English, Haskell.

Last week, while working on CodeWorld, via a sequence of yak shavings, I ended up creating a nicely small library that provides Control.Applicative.Succs, a new applicative functor. And because I am trying to keep my Haskell karma good1, I wanted to actually prove that my code fulfills the Applicative and Monad laws.

This led me to inserted writing long comments into my code, filled with lines like this:

The second Applicative law:

  pure (.) <*> Succs u us <*> Succs v vs <*> Succs w ws
= Succs (.) [] <*> Succs u us <*> Succs v vs <*> Succs w ws
= Succs (u .) (map (.) us) <*> Succs v vs <*> Succs w ws
= Succs (u . v) (map ($v) (map (.) us) ++ map (u .) vs) <*> Succs w ws
= Succs (u . v) (map (($v).(.)) us ++ map (u .) vs) <*> Succs w ws
= Succs ((u . v) w) (map ($w) (map (($v).(.)) us ++ map (u .) vs) ++ map (u.v) ws)
= Succs ((u . v) w) (map (($w).($v).(.)) us ++ map (($w).(u.)) vs ++ map (u.v) ws)
= Succs (u (v w)) (map (\u -> u (v w)) us ++ map (\v -> u (v w)) vs ++ map (\w -> u (v w)) ws)
= Succs (u (v w)) (map ($(v w)) us ++ map u (map ($w) vs ++ map v ws))
= Succs u us <*> Succs (v w) (map ($w) vs ++ map v ws)
= Succs u us <*> (Succs v vs <*> Succs w ws)

Honk if you have done something like this before!

I proved all the laws, but I was very unhappy. I have a PhD on something about Haskell and theorem proving. I have worked with Isabelle, Agda and Coq. Both Haskell and theorem proving is decades old. And yet, I sit here, and tediously write manual proofs by hand. Is this really the best we can do?

Of course I could have taken my code, rewritten it in, say, Agda, and proved it correct there. But (right now) I don’t care about Agda code. I care about my Haskell code! I don’t want to write it twice, worry about copying mistakes and mismatchs in semantics, and have external proofs to maintain. Instead, I want to prove where I code, and have the proofs checked together with my code!

Then it dawned to me that this is, to some extent, possible. The Haskell compiler comes with a sophisticated program transformation machinery, which is meant to simplify and optimize code. But it can also be used to prove Haskell expressions to be equivalent! The idea is simple: Take two expressions, run both through the compiler’s simplifier, and check if the results are the same. If they are, then the expressions are, as far as the compiler is concerned, equivalent.

A handful of hours later, I was able to write proof tasks like

app_law_2 = (\ a b (c::Succs a) -> pure (.) <*> a <*> b <*> c)
        === (\ a b c -> a <*> (b <*> c))

and others into my source file, and the compiler would tell me happily:

[1 of 1] Compiling Successors       ( Successors.hs, Successors.o )
GHC.Proof: Proving getCurrent_proof1 …
GHC.Proof: Proving getCurrent_proof2 …
GHC.Proof: Proving getCurrent_proof3 …
GHC.Proof: Proving ap_star …
GHC.Proof: Proving getSuccs_proof1 …
GHC.Proof: Proving getSuccs_proof2 …
GHC.Proof: Proving getSuccs_proof3 …
GHC.Proof: Proving app_law_1 …
GHC.Proof: Proving app_law_2 …
GHC.Proof: Proving app_law_3 …
GHC.Proof: Proving app_law_4 …
GHC.Proof: Proving monad_law_1 …
GHC.Proof: Proving monad_law_2 …
GHC.Proof: Proving monad_law_3 …
GHC.Proof: Proving return_pure …
GHC.Proof proved 15 equalities

This is how I want to prove stuff about my code!

Do you also want to prove stuff about your code? I packaged this up as a GHC plugin in the Haskell library ghc-proofs (not yet on Hackage). The README of the repository has a bit more detail on how to use this plugin, how it works, what its limitations are and where this is heading.

This is still only a small step, but finally there is a step towards low threshold program equivalence proofs in Haskell.


  1. Or rather recover my karma after such abominations such as ghc-dup, seal-module or ghc-heap-view.

Trump is so mean

Published 2017-01-29 in sections Philadelphia, Deutsch.

Vor zweieinhalb Monaten berichtete ich an dieser Stelle von meinen Eindrücken nach der Präsidentschaftswahl (Danke für die motivierenden Rückmeldungen!). Inzwischen ist Donald Trump Präsident, und nun geht es so richtig los.

Am Tag der Amtseinführung habe ich nicht viel mitbekommen. In Philadelphia gab wohl es Proteste, insbesondere am Tag drauf, als hier wie in vielen anderen Städten in und außerhalb der USA der „Women’s March“ statt fand. Über Facebook konnte ich mitverfolgen, welche meiner Bekannten teilnahmen, das war es aber auch erst mal.

Zumindest in Philadelphia halten viele wenig von Trumps Verordnungen

Zumindest in Philadelphia halten viele wenig von Trumps Verordnungen

Interessant wurden die nächsten Tage, als sich zeigte, dass Trump es ernst meint und mit einer Verfügung nach der anderen seine Vorstellungen in die Tag umsetzen wollte. Ich will hier nicht alles wiederholen, was schon ausführlichst in den Nachrichten, auch in Deutschland, behandelt wurde (die BBC hat eine gute Zusammenfassung der ersten Woche).

Für mich und mein akademisches Umfeld am relevantesten waren zwei Entwicklungen: Zum einen verfügte Trump einen Maulkorberlass für viele Behörden, und für das Umweltamt sogar einen kompletten Einstellungs- und Auftragstop, und von der Seite des Weißen Hauses verschwanden Abschnitte zum Klimawandel. Auch wenn sich ein paar Nationalparkranger dem Anfangs widersetzten – das ist nicht mehr ein Umfeld von freier Wissenschaft und Ratio-geleiteter Politik.

Noch heftiger war die Reaktion dann als Trump plötzlich und ohne Vorwarnung die Einreise von Bürgern sieben vorwiegend muslimischer Staaten, darunter dem Iran, stark einschränkte. Einige Reisende oder Wiederkehrende aus diesen Staaten wurden nun am Flughafen länger befragt, abgewiesen oder direkt zurückgeschickt. In meinem Freundeskreis hier sind auch Wissenschaftler aus dem Iran, deren Vertrauen in eine verlässliche Zukunft in den USA nun schwer beschädigt ist. Denn betroffen sind nicht etwa nur Bewerber um neue Visas, die sich nun auf längere Wartezeiten einstellen müssten, sondern auch solche, die schon längst ein Visum – oder gar eine permanente Aufenthaltsgenehmigung (green card)1 – haben und sich in den Staaten eine Existenz aufgebaut haben.

Eine Anspielung auf das bekannte Niemöller-Zitat

Eine Anspielung auf das bekannte Niemöller-Zitat

Über Facebook erfuhr ich dass einige meiner Freunde, insbesondere eben jene Iranerin, an einer spontanen Demonstration heute Nachmittag am Flughafen von Philadelphia teilnehmen wollten, und dem schloss ich mich an. Allerings nicht nur ich, so dass schon die Autobahn zum Flughafen von den Autos empörter Bürger verstopft war und zeiweise die Autobahnausfahrten gesperrt waren.

Einige parkten weiter weg und liefen zum Flughafen (und waren schneller)

Einige parkten weiter weg und liefen zum Flughafen (und waren schneller)

Die Empfangshalle des Flughafens, dem angedachten Ort für den Protest, war angesichts von über 5000 Teilnehmern schon längst voll. Die in großer Zahl (und mit Pferden) aufmaschierte Polizei sperrte den Zugang und verlegte den Protest vor das Gebäude.

Eines von vielen Kindern auf der Demo

Eines von vielen Kindern auf der Demo

Ich mischte mich also eine Weile unter das dort protestierende Volk, lauschte den Sprechchören („No hate, no fear – refugees are welcome here“, „Let them in“, „This is what democracy looks like“) und bewunderte die unzähligen Pappschilder. Die Bilder hier sind eine Auswahl der besten (und weitere gibt es auf meiner Bilderseite).

So etwas wie Beamtenbeleidigung kennt man im Land der free speech nicht

So etwas wie Beamtenbeleidigung kennt man im Land der free speech nicht

Global almost-constants for Haskell

Published 2017-01-20 in sections English, Haskell.

More than five years ago I blogged about the “configuration problem” and a proposed solution for Haskell, which turned into some Template Haskell hacks in the seal-module package.

With the new GHC proposal process in plase, I am suddenly much more inclined to write up my weird wishes for the Haskell language in proposal form, to make them more coherent, get feedback, and maybe (maybe) actually get them implemented. But even if the proposal is rejected it is still a nice forum to discuss these ideas.

So I turned my Template Haskell hack into a proposed new syntactic feature. The idea is shamelessly stolen from Isabelle, including some of the keywords, and would allow you to write

context fixes progName in
  foo :: Maybe Int -> Either String Int
  foo Nothing  = Left $ progName ++ ": no number given"
  foo (Just i) = bar i

  bar :: Int -> Either String Int
  bar 0 = Left $ progName ++ ": zero no good"
  bar n = Right $ n + 1

instead of

foo :: String -> Maybe Int -> Either String Int
foo progName Nothing  = Left $ progName ++ ": no number given"
foo progName (Just i) = bar progName  i

bar :: String -> Int -> Either String Int
bar progName 0 = Left $ progName ++ ": zero no good"
bar progName n = Right $ n + 1

when you want to have an “almost constant” parameter.

I am happy to get feedback at the GitHub pull request.

TikZ aesthetics

Published 2017-01-06 in sections English, Digital World.

Every year since 2012, I typeset the problems and solutions for the German math event Tag der Mathematik, which is organized by the Zentrum für Mathematik and reaches 1600 students from various parts of Germany. For that, I often reach to the LaTeX drawing package TikZ, and I really like the sober aesthetics of a nicely done TikZ drawing. So mostly for my own enjoyment, I collect the prettiest here.

On a global scale they are still rather mundane, and for really impressive and educating examples, I recommend the TikZ Gallery.

microG on Jolla

Published 2016-11-23 in sections English, Digital World.

I am a incorrigibly in picking non-mainstream, open smartphones, and then struggling hard. Back then in 2008, I tried to use the OpenMoko FreeRunner, but eventually gave up because of hardware glitches and reverted to my good old Siemens S35. It was not that I would not be willing to put up with inconveniences, but as soon as it makes live more difficult for the people I communicate with, it becomes hard to sustain.

Two years ago I tried again, and got myself a Jolla phone, running Sailfish OS. Things are much nicer now: The hardware is mature, battery live is good, and the Android compatibility layer enables me to run many important apps that are hard to replace, especially the Deutsche Bahn Navigator and various messengers, namely Telegram, Facebook Messenger, Threema and GroupMe.

Some apps that require Google Play Services, which provides a bunch of common tasks and usually comes with the Google Play store would not run on my phone, as Google Play is not supported on Sailfish OS. So far, the most annoying ones of that sort were Uber and Lyft, making me pay for expensive taxis when others would ride cheaper, but I can live with that. I tried to install Google Play Services from shady sources, but it would regularly crash.

Signal on Jolla

Now in Philadelphia, people urged me to use the Signal messenger, and I was convinced by its support for good end-to-end crypto, while still supporting offline messages and allowing me to switch from my phone to my desktop and back during a conversation. The official Signal app uses Google Cloud Messaging (GCM, part of Google Play Services) to get push updates about new posts, and while I do not oppose this use of Google services (it really is just a ping without any metadata), this is a problem on Sailfish OS.

Luckily, the Signal client is open source, and someone created a “LibreSignal” edition that replaced the use of GCM with websockets, and indeed, this worked on my phone, and I could communicate.

Things were not ideal, though: I would often have to restart the app to get newly received messages; messages that I send via Signal Desktop would often not show up on the phone and, most severe, basically after every three messages, sending more messages from Desktop would stop working for my correspondents, which freaked them out. (Strangely it continued working from their phone app, so we coped for a while.)

So again, my choice of non-standard devices causes inconveniences to others. This, and the fact that the original authors of Signal and the maintainers of LibreSignal got into a fight that ended LibreSignal discontinued, meant that I have to change something about this situation. I was almost ready to give in and get myself a Samsung S7 or something boring of the sort, but then I decided to tackle this issue once more, following some of the more obscure instructions out there, trying to get vanilla Signal working on my phone. About a day later, I got it, and this is how I did it.

microG

So I need Google Play Services somehow, but installing the “real thing” did not seem to be very promising (I tried, and regularly got pop-ups telling me that Play Services has crashed.) But I found some references to a project called “microG”, which is an independent re-implementation of (some of) of the play services, in particular including GCM.

Installing microG itself was easy, as you can add their repository to F-Droid. I installed the core services, the services framework and the fake store apps. If this had been all that was to do, things would be easy!

Play Store detection work arounds

But Signal would still complain about the lack of Google Play Services. It asks Android if an app with a certain name is installed, and would refuse to work if this app does not exist. For some reason, the microG apps cannot just have the names of the “real” Google apps.

There seem to be two ways of working around this: Patching Signal, or enabling Signature Spoofing.

The initially most promising instructions (which are in a README in a tarball on a fishy file hoster linked from an answer on the Jolla support forum…) suggested patching Signal, and actually came both with a version of an app called “Lucky Patcher” as well as a patched Android package, but both about two years old. I tried a recent version of the Lucky Patcher, but it failed to patch the current version of Signal.

Signature Spoofing

So on to Signature Spoofing. This is a feature of some non-standard Android builds that allow apps (such as microG) to fake the existence of other apps (the Play Store), and is recommended by the microG project. Sailfish OS’s Android compatibility layer “Alien Dalvik” does not support it out of the box, but there is a tool “tingle” that adds this feature to existing Android systems. One just has to get the /system/framework/framework.jar file, put it into the input folder of this project, run python main.py, select 2, and copy the framework.jar from output/ back. Great.

Deodexing Alien Dalvik

Only that it only works on “deodexed” files. I did not know anything about odexed Android Java classes (and did not really want to know), but there was not way around. Following this explanation I gathered that one finds files foo.odex in the Android system folder, runs some tool on them to create a classes.dex file, and adds that to the corresponding foo.jar or foo.apk file, copies this back to the phone and deletes the foo.odex file.

The annoying this is that one does not only have to do it for framework.jar in order to please tingle, because if one does it to one odex file, one has to do to all! It seems that for people using Windows, the Universal Deodexer V5 seems to be a convenient tool, but I had to go more manually.

So I first fetched “smali”, compiled it using ./gradlew build. Then I fetched the folders /opt/alien/system/framework and /opt/alien/system/app from the phone (e.g. using scp). Keep a backup of these in case something breaks. Then I ran these commands (disclaimer: I fetched these from my bash history and slightly cleaned them up. This is not a fire-and-forget script! Use it when you know what it and you are doing):

cd framework
for file in *.odex
do
  java -jar ~/build/smali/baksmali/build/libs/baksmali.jar deodex $file -o out
  java -jar ~/build/smali/smali/build/libs/smali.jar a out -o classes.dex
  zip -u $(basename $file .odex).jar classes.dex
  rm -rf out classes.dex $file
done
cd ..

cd app
for file in *.odex
do
  java -jar ~/build/smali/baksmali/build/libs/baksmali.jar deodex -d ../framework $file -o out
  java -jar ~/build/smali/smali/build/libs/smali.jar a out -o classes.dex
  zip -u $(basename $file .odex).apk classes.dex
  rm -rf out classes.dex $file
done
cd ..

The resulting framework.jar can now be patched with tingle:

mv framework/framework.jar ~/build/tingle/input
cd ~/build/tingle
./main.py
# select 2
cd -
mv ~/build/tingle/output/framework.jar framework/framework.jar

Now I copy these framework and app folders back on my phone, and restart Dalvik:

devel-su systemctl restart aliendalvik.service

It might start a bit slower than usually, but eventually, all the Android apps should work as before.

The final bit that was missing in my case was that I had to reinstall Signal: If it is installed before microG is installed, it does not get permission to use GCM, and when it tries (while registering: After generating the keys) it just crashes. I copied /data/data/org.thoughtcrime.secretsms/ before removing Signal and moved it back after (with cp -a to preserve permissions) so that I could keep my history.

And now, it seems, vanilla Signal is working just fine on my Jolla phone!

What’s missing

Am I completely happy with Signal? No! An important feature that it is lacking is a way to get out all data (message history including media files) in a file format that can be read without Signal; e.g. YAML files or clean HTML code. I do want to be able to re-read some of the more interesting conversations when I am 74 or 75, and I doubt that there will be a Signal App, or even Android, then. I hope that this becomes available in time, maybe in the Desktop version.

I would also hope that pidgin gets support to the Signal protocol, so that I conveniently use one program for all my messaging needs on the desktop.

Finally it would be nice if my Signal identity was less tied to one phone number. I have a German and a US phone number, and would want to be reachable under both on all my clients. (If you want to contact me on Signal, use my US phone number.)

Alternatives

Could I have avoided this hassle by simply convincing people to use something other than Signal? Tricky, at the moment. Telegram (which works super reliable for me, and has a pidgin plugin) has dubious crypto and does not support crypto while using multiple clients. Threema has no desktop client that I know of. OTR on top of Jabber does not support offline messages. So nothing great seems to exist right now.

In the long run, the best bet seems to be OMEMO (which is, in essence, the Signal protocol) on top of Jabber. It is currently supported by one Android Jabber client (Conversations) and one Desktop application (gajim, via a plugin). I should keep an eye on pidgin support for OMEMO and other development around this.

Drei Jahreszeiten

Published 2016-11-20 in sections Philadelphia, Deutsch.

Hier in Pennsylvania tickt das Wetter etwas anders als bei uns…

Sommer

Dieses Wochenende fuhr ich mit einer Freundin in die Poconos Mountains. Wir bezogen eine kleine Holzhütte im Wald. Samstag morgen frühstückten wir draußen. Wir wanderten im T-Shirt durch den Beltzville Park und picknickten mittags am Ufer von der Sonne beschienen und unter blauem Himmel. Temperatur vielleicht 26°C.

Frühstück in der Sonne

Frühstück in der Sonne

Herbst

Im Anschluss wanderten wir durch die Lehigh Gorge. Ein kühler Ostwind blies uns um die Ohren und lupfte das Laub in die Höhe. Wir futterten in Jim Thorpe und als wir wieder im Auto waren, fielen die ersten Regentropfen aufs Dach. Temperatur vielleicht 10°C.

Winter

Irgendwann hörte ich abends den Regen nicht mehr – er wurde zu Schnee. Und Sonntag morgen war die Hütte, das Auto und der Wald um uns herum mit einer 15cm dicken Schneeschicht bedeckt. Das Thermometer zeigte -3°C. Wir gingen wieder spazieren, diesmal um den Gipfel des Camelback Mountain herum. Alles war weiß und dick eingepudert,

Kein Frühstück im Schnee

Kein Frühstück im Schnee

Drei Jahreszeiten an einem Wochenende – das nenn ich effizienten Urlaub.

(Mehr Bilder auf meiner Bilderseite.)

Nach der Wahl

Published 2016-11-15 in sections Deutsch, Philadelphia.

„Wie sieht’s aus in Trump-Land?“

Gute Frage. Ich kann ja mal ein paar Eindrücke schildern.

Wahlkampf

Anfangs war alles noch nett und lustig. Mit Arbeitskollegen schaute ich das erste Fernsehduell zwischen Clinton und Trump. Es wurde viel getrunken, vor allem wenn Trump den Mund aufmachte, aber insgesamt machte Hillary Trump fertig. Sah also gut aus.

Auf Facebook (und den auf Facebook verlinkten Medien) wurde sich über Trump’s Persönlichkeit ausgelassen, und ich wundere mich, warum so wenig über seine Politik geredet wird – lieber ein Arschloch als Präsident, der das richtige macht, als ein Traumschwiegersohn mit den falschen Zielen, würde ich sagen – aber wohl zurecht wurde ich darauf hingewiesen, dass man auf seine Wahlversprechen eh nichts geben kann. Trotzdem halte ich fest dass die Übermoralisierung des Politikers hier weiter fortgeschritten ist als in Deutschland.

Die letzte Woche vor der Wahl war überraschend ruhig. Die Prognosen sahen gut aus (Gewinnwahrscheinlichkeit für Clinton war bei ⅔). Viele meine Kollegen beteiligten sich ehrenamtlich am Wahlkampf, klingelten Telefonlisten durch oder marschierten von Haus zu Haus. Dabei geht es nicht darum, Wechselwähler oder gar Trumpwähler umzustimmen, sondern lediglich, die demokratisch gesinnten Wähler auch zum Wählen zu kriegen.

Wahltag

Am Wahltag war -- soweit ich das erkennen kann -- gerade zu heitere Stimmung. Das Wetter war gut, und auf dem Campus trug jeder stolz seinen „I voted“-Aufkleber auf der Brust, den man hier bekommt. Man fragte sich gegenseitig, ob man schon wählen war.

Abends verfolgte eine Freundin und ich das Geschehen in einer Kneipe hier in West-Philadelphia, es lief CNN. Als verkündet wurde, dass Michigan an die Republikaner ging, war ihr klar, dass hier etwas gewaltig schief lief. CNN berichtete noch von neuen Meldungen aus den verschiedenen Staaten, aber die Prognose auf nytimes.com sagte schon eine Wahrscheinlichkeit von >95% für Trump. Die Wahl war gelaufen, und sie lief anders als von allen in meiner Umgebung hier erhofft, und auch anders als erwartet. Für mich fühlte es sich wie eine Wiederholung des Brexit-Wahlabends an. Auch Pennsylvania kippte und wählte insgesamt für Trump. Wir hatten keine Lust mehr auf Kneipe und verzogen uns.

Nach der Wahl

Selbst das Wetter war deprimiert. Davor noch herrlich sonnig, war es jetzt plötzlich kalt und regnerisch. Tränen hier in meiner WG. Eine Mitbewohnerin macht sich Sorgen um ihre Krankenversicherung wenn Obamacare gestrichen wird. An der Uni gedrückte Stimmung. Auf Facebook gingen die Emotionen um. Freunde luden Freunde zum gemeinsamen Wundenlecken ein.

Am zweiten Abend gab es in der Innenstadt Protestmärsche gegen Trump mit mehreren Tausend Teilnehmern. Gleichzeitig gab es Berichte über sogenannte „hate crimes“ im ganzen Land (eine von Schwarzen genutzte Kirche wurde noch vor der Wahl abgefackelt, weitere andere nach der Wahl beschmiert und in Philadelphia (Hakenkreuz-Graffiti und Nazi-Sprüche an Wänden, sexistische Sprüche in die Autofenster einer Professorin geritzt). Facebook-Freunde von mir erkundigen sich nach geeigneten Kampfsportarten und Selbstverteidigungskursen um bei eventuellen Auseinandersetzungen ihre Freunde und sich selbst schützen zu können. Es werden Parallelen zu Deutschland nach dem ersten Weltkrieg und Hitlers Machtübernahme gezogen.

An der Universität werden schwarze Studenten per „GroupMe“ mit rassistischen Sprüchen überzogen. Große Aufregung mit mehreren Rundmails des Präsidiums an alle Mitglieder der Studenten, FBI eingeschaltet, und Ermittlungen in Oklahoma, wo die Täter wohl sind.

Eine andere Rundmail der Präsidentin an alle Mitarbeiter, dass einige Studenten jetzt nach der Wahl sicherlich eine schwere Zeit durchmachen, und an solle doch bitte nachsichtig und einfühlsam zu sein, und flexibel was Hausaufgaben oder so angeht.

Und ich?

Unmittelbar betrifft mich die Wahl von Trump nicht. Ich bin kein Muslim, ich bin kein Mexikaner, ich bin keine Frau, nicht Queer und ich bin legal hier. Noch fühle ich mich durch den Straßenverkehr und die in Deutschland sicherlich so verboten schmalen Treppen stärker bedroht. Mittelfristig macht mir die zu erwartende Klimapolitik und Trumps Einstellung zur NATO durchaus sorgen. Aber ich bin Optimist, vielleicht hat ein unberechenbarer US-Präsident eher einen Fukushima-Moment mit 180°-Wende als die Parteipolitikerin.

Ich finde es bedenklich, wie segregiert das Land hier ist. In Pennsylvania gibt es zwei große Städte, und hier habe ich kein einziges Trump-Schild in den Vorgärten gesehen. Keiner, den ich kenne, hat offen für die Republikaner gestimmt. Die knapp 50% der Wähler, die Trump an die Macht gebracht haben -- wir kennen sie hier nicht, und sie kennen uns nicht. Das ist in Deutschland doch anders: Da gehen CDU- und SPD-Wähler gemeinsam zum Sport, auch ein Grünenwähler kennt vermutlich einen FDP-Anhänger, und selbst AFDler leben unter uns, und man kann mit ihnen reden. Ich wünsche den USA, irgendwie diese Spaltung zu überwinden und eine gesündere Demokratie zu entwickeln, vielleicht sogar eine mit Verhältniswahlrecht, vielleicht mit Wahlbeteiligung deutlich über 50%, besonders nach einem so kontroversen Wahlkampf. Man wird ja wohl noch träumen dürfen.

Showcasing Applicative

Published 2016-10-26 in sections English, Haskell.

My plan for this week’s lecture of the CIS 194 Haskell course at the University of Pennsylvania is to dwell a bit on the concept of Functor, Applicative and Monad, and to highlight the value of the Applicative abstraction.

I quite like the example that I came up with, so I want to share it here. In the interest of long-term archival and stand-alone presentation, I include all the material in this post.1

Imports

In case you want to follow along, start with these imports:

import Data.Char
import Data.Maybe
import Data.List

import System.Environment
import System.IO
import System.Exit

The parser

The starting point for this exercise is a fairly standard parser-combinator monad, which happens to be the result of the student’s homework from last week:

newtype Parser a = P (String -> Maybe (a, String))

runParser :: Parser t -> String -> Maybe (t, String)
runParser (P p) = p

parse :: Parser a -> String -> Maybe a
parse p input = case runParser p input of
    Just (result, "") -> Just result
    _ -> Nothing -- handles both no result and leftover input

noParserP :: Parser a
noParserP = P (\_ -> Nothing)

pureParserP :: a -> Parser a
pureParserP x = P (\input -> Just (x,input))

instance Functor Parser where
    fmap f p = P $ \input -> do
	(x, rest) <- runParser p input
	return (f x, rest)

instance Applicative Parser where
    pure = pureParserP
    p1 <*> p2 = P $ \input -> do
        (f, rest1) <- runParser p1 input
        (x, rest2) <- runParser p2 rest1
        return (f x, rest2)

instance Monad Parser where
    return = pure
    p1 >>= k = P $ \input -> do
        (x, rest1) <- runParser p1 input
        runParser (k x) rest1

anyCharP :: Parser Char
anyCharP = P $ \input -> case input of
    (c:rest) -> Just (c, rest)
    []       -> Nothing

charP :: Char -> Parser ()
charP c = do
    c' <- anyCharP
    if c == c' then return ()
               else noParserP

anyCharButP :: Char -> Parser Char
anyCharButP c = do
    c' <- anyCharP
    if c /= c' then return c'
               else noParserP

letterOrDigitP :: Parser Char
letterOrDigitP = do
    c <- anyCharP
    if isAlphaNum c then return c else noParserP

orElseP :: Parser a -> Parser a -> Parser a
orElseP p1 p2 = P $ \input -> case runParser p1 input of
    Just r -> Just r
    Nothing -> runParser p2 input

manyP :: Parser a -> Parser [a]
manyP p = (pure (:) <*> p <*> manyP p) `orElseP` pure []

many1P :: Parser a -> Parser [a]
many1P p = pure (:) <*> p <*> manyP p

sepByP :: Parser a -> Parser () -> Parser [a]
sepByP p1 p2 = (pure (:) <*> p1 <*> (manyP (p2 *> p1))) `orElseP` pure []

A parser using this library for, for example, CSV files could take this form:

parseCSVP :: Parser [[String]]
parseCSVP = manyP parseLine
  where
    parseLine = parseCell `sepByP` charP ',' <* charP '\n'
    parseCell = do
        charP '"'
        content <- manyP (anyCharButP '"')
        charP '"'
        return content

We want EBNF

Often when we write a parser for a file format, we might also want to have a formal specification of the format. A common form for such a specification is EBNF. This might look as follows, for a CSV file:

cell = '"', {not-quote}, '"';
line = (cell, {',', cell} | ''), newline;
csv  = {line};

It is straightforward to create a Haskell data type to represent an EBNF syntax description. Here is a simple EBNF library (data type and pretty-printer) for your convenience:

data RHS
  = Terminal String
  | NonTerminal String
  | Choice RHS RHS
  | Sequence RHS RHS
  | Optional RHS
  | Repetition RHS
  deriving (Show, Eq)

ppRHS :: RHS -> String
ppRHS = go 0
  where
    go _ (Terminal s)     = surround "'" "'" $ concatMap quote s
    go _ (NonTerminal s)  = s
    go a (Choice x1 x2)   = p a 1 $ go 1 x1 ++ " | " ++ go 1 x2
    go a (Sequence x1 x2) = p a 2 $ go 2 x1 ++ ", "  ++ go 2 x2
    go _ (Optional x)     = surround "[" "]" $ go 0 x
    go _ (Repetition x)   = surround "{" "}" $ go 0 x

    surround c1 c2 x = c1 ++ x ++ c2

    p a n | a > n     = surround "(" ")"
          | otherwise = id

    quote '\'' = "\\'"
    quote '\\' = "\\\\"
    quote c    = [c]

type Production = (String, RHS)
type BNF = [Production]

ppBNF :: BNF -> String
ppBNF = unlines . map (\(i,rhs) -> i ++ " = " ++ ppRHS rhs ++ ";")

Code to produce EBNF

We had a good time writing combinators that create complex parsers from primitive pieces. Let us do the same for EBNF grammars. We could simply work on the RHS type directly, but we can do something more nifty: We create a data type that keeps track, via a phantom type parameter, of what Haskell type the given EBNF syntax is the specification:

newtype Grammar a = G RHS

ppGrammar :: Grammar a -> String
ppGrammar (G rhs) = ppRHS rhs

So a value of type Grammar t is a description of the textual representation of the Haskell type t.

Here is one simple example:

anyCharG :: Grammar Char
anyCharG = G (NonTerminal "char")

Here is another one. This one does not describe any interesting Haskell type, but is useful when spelling out the special characters in the syntax described by the grammar:

charG :: Char -> Grammar ()
charG c = G (Terminal [c])

A combinator that creates new grammar from two existing grammars:

orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G rhs1) (G rhs2) = G (Choice rhs1 rhs2)

We want the convenience of our well-known type classes in order to combine these values some more:

instance Functor Grammar where
    fmap _ (G rhs) = G rhs

instance Applicative Grammar where
    pure x = G (Terminal "")
    (G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)

Note how the Functor instance does not actually use the function. How should it? There are no values inside a Grammar!

We cannot define a Monad instance for Grammar: We would start with (G rhs1) >>= k = …, but there is simply no way of getting a value of type a that we can feed to k. So we will do without a Monad instance. This is interesting, and we will come back to that later.

Like with the parser, we can now begin to build on the primitive example to build more complicated combinators:

manyG :: Grammar a -> Grammar [a]
manyG p = (pure (:) <*> p <*> manyG p) `orElseG` pure []

many1G :: Grammar a -> Grammar [a]
many1G p = pure (:) <*> p <*> manyG p

sepByG :: Grammar a -> Grammar () -> Grammar [a]
sepByG p1 p2 = ((:) <$> p1 <*> (manyG (p2 *> p1))) `orElseG` pure []

Let us run a small example:

dottedWordsG :: Grammar [String]
dottedWordsG = many1G (manyG anyCharG <* charG '.')
*Main> putStrLn $ ppGrammar dottedWordsG
'', ('', char, ('', char, ('', char, ('', char, ('', char, ('', …

Oh my, that is not good. Looks like the recursion in manyG does not work well, so we need to avoid that. But anyways we want to be explicit in the EBNF grammars about where something can be repeated, so let us just make many a primitive:

manyG :: Grammar a -> Grammar [a]
manyG (G rhs) = G (Repetition rhs)

With this definition, we already get a simple grammar for dottedWordsG:

*Main> putStrLn $ ppGrammar dottedWordsG
'', {char}, '.', {{char}, '.'}

This already looks like a proper EBNF grammar. One thing that is not nice about it is that there is an empty string ('') in a sequence (…,…). We do not want that.

Why is it there in the first place? Because our Applicative instance is not lawful! Remember that pure id <*> g == g should hold. One way to achieve that is to improve the Applicative instance to optimize this case away:

instance Applicative Grammar where
    pure x = G (Terminal "")
    G (Terminal "") <*> G rhs2 = G rhs2
    G rhs1 <*> G (Terminal "") = G rhs1
    (G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)
	```
Now we get what we want:
*Main> putStrLn $ ppGrammar dottedWordsG
{char}, '.', {{char}, '.'}

Remember our parser for CSV files above? Let me repeat it here, this time using only Applicative combinators, i.e. avoiding (>>=), (>>), return and do-notation:

parseCSVP :: Parser [[String]]
parseCSVP = manyP parseLine
  where
    parseLine = parseCell `sepByP` charG ',' <* charP '\n'
    parseCell = charP '"' *> manyP (anyCharButP '"') <* charP '"'

And now we try to rewrite the code to produce Grammar instead of Parser. This is straightforward with the exception of anyCharButP. The parser code for that inherently monadic, and we just do not have a monad instance. So we work around the issue by making that a “primitive” grammar, i.e. introducing a non-terminal in the EBNF without a production rule – pretty much like we did for anyCharG:

primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)

parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
  where
    parseLine = parseCell `sepByG` charG ',' <* charG '\n'
    parseCell = charG '"' *> manyG (primitiveG "not-quote") <* charG '"'

Of course the names parse… are not quite right any more, but let us just leave that for now.

Here is the result:

*Main> putStrLn $ ppGrammar parseCSVG
{('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), '
'}

The line break is weird. We do not really want newlines in the grammar. So let us make that primitive as well, and replace charG '\n' with newlineG:

newlineG :: Grammar ()
newlineG = primitiveG "newline"

Now we get

*Main> putStrLn $ ppGrammar parseCSVG
{('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), newline}

which is nice and correct, but still not quite the easily readable EBNF that we saw further up.

Code to produce EBNF, with productions

We currently let our grammars produce only the right-hand side of one EBNF production, but really, we want to produce a RHS that may refer to other productions. So let us change the type accordingly:

newtype Grammar a = G (BNF, RHS)

runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]

ppGrammar :: String -> Grammar a -> String
ppGrammar main g = ppBNF $ runGrammer main g

Now we have to adjust all our primitive combinators (but not the derived ones!):

charG :: Char -> Grammar ()
charG c = G ([], Terminal [c])

anyCharG :: Grammar Char
anyCharG = G ([], NonTerminal "char")

manyG :: Grammar a -> Grammar [a]
manyG (G (prods, rhs)) = G (prods, Repetition rhs)

mergeProds :: [Production] -> [Production] -> [Production]
mergeProds prods1 prods2 = nub $ prods1 ++ prods2

orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G (prods1, rhs1)) (G (prods2, rhs2))
    = G (mergeProds prods1 prods2, Choice rhs1 rhs2)

instance Functor Grammar where
    fmap _ (G bnf) = G bnf

instance Applicative Grammar where
    pure x = G ([], Terminal "")
    G (prods1, Terminal "") <*> G (prods2, rhs2)
        = G (mergeProds prods1 prods2, rhs2)
    G (prods1, rhs1) <*> G (prods2, Terminal "")
        = G (mergeProds prods1 prods2, rhs1)
    G (prods1, rhs1) <*> G (prods2, rhs2)
        = G (mergeProds prods1 prods2, Sequence rhs1 rhs2)

primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)

The use of nub when combining productions removes duplicates that might be used in different parts of the grammar. Not efficient, but good enough for now.

Did we gain anything? Not yet:

*Main> putStr $ ppGrammar "csv" (parseCSVG)
csv = {('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), newline};

But we can now introduce a function that lets us tell the system where to give names to a piece of grammar:

nonTerminal :: String -> Grammar a -> Grammar a
nonTerminal name (G (prods, rhs))
  = G (prods ++ [(name, rhs)], NonTerminal name)

Ample use of this in parseCSVG yields the desired result:

parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
  where
    parseLine = nonTerminal "line" $
        parseCell `sepByG` charG ',' <* newline
    parseCell = nonTerminal "cell" $
        charG '"' *> manyG (primitiveG "not-quote") <* charG '"
*Main> putStr $ ppGrammar "csv" (parseCSVG)
cell = '"', {not-quote}, '"';
line = (cell, {',', cell} | ''), newline;
csv = {line};

This is great!

Unifying parsing and grammar-generating

Note how simliar parseCSVG and parseCSVP are! Would it not be great if we could implement that functionality only once, and get both a parser and a grammar description out of it? This way, the two would never be out of sync!

And surely this must be possible. The tool to reach for is of course to define a type class that abstracts over the parts where Parser and Grammer differ. So we have to identify all functions that are primitive in one of the two worlds, and turn them into type class methods. This includes char and orElse. It includes many, too: Although manyP is not primitive, manyG is. It also includes nonTerminal, which does not exist in the world of parsers (yet), but we need it for the grammars.

The primitiveG function is tricky. We use it in grammars when the code that we might use while parsing is not expressible as a grammar. So the solution is to let it take two arguments: A String, when used as a descriptive non-terminal in a grammar, and a Parser a, used in the parsing code.

Finally, the type classes that we except, Applicative (and thus Functor), are added as constraints on our type class:

class Applicative f => Descr f where
    char :: Char -> f ()
    many :: f a -> f [a]
    orElse :: f a -> f a -> f a
    primitive :: String -> Parser a -> f a
    nonTerminal :: String -> f a -> f a

The instances are easily written:

instance Descr Parser where
    char = charP
    many = manyP
    orElse = orElseP
    primitive _ p = p
    nonTerminal _ p = p

instance Descr Grammar where
    char = charG
    many = manyG
    orElse = orElseG
    primitive s _ = primitiveG s
    nonTerminal s g = nonTerminal s g

And we can now take the derived definitions, of which so far we had two copies, and define them once and for all:

many1 :: Descr f => f a -> f [a]
many1 p = pure (:) <*> p <*> many p

anyChar :: Descr f => f Char
anyChar = primitive "char" anyCharP

dottedWords :: Descr f => f [String]
dottedWords = many1 (many anyChar <* char '.')

sepBy :: Descr f => f a -> f () -> f [a]
sepBy p1 p2 = ((:) <$> p1 <*> (many (p2 *> p1))) `orElse` pure []

newline :: Descr f => f ()
newline = primitive "newline" (charP '\n')

And thus we now have our CSV parser/grammar generator:

parseCSV :: Descr f => f [[String]]
parseCSV = many parseLine
  where
    parseLine = nonTerminal "line" $
        parseCell `sepBy` char ',' <* newline
    parseCell = nonTerminal "cell" $
        char '"' *> many (primitive "not-quote" (anyCharButP '"')) <* char '"'

We can now use this definition both to parse and to generate grammars:

*Main> putStr $ ppGrammar2 "csv" (parseCSV)
cell = '"', {not-quote}, '"';
line = (cell, {',', cell} | ''), newline;
csv = {line};
*Main> parse parseCSV "\"ab\",\"cd\"\n\"\",\"de\"\n\n"
Just [["ab","cd"],["","de"],[]]

The INI file parser and grammar

As a final exercise, let us transform the INI file parser into a combined thing. Here is the parser (another artifact of last week’s homework) again using applicative style2:

parseINIP :: Parser INIFile
parseINIP = many1P parseSection
  where
    parseSection =
        (,) <$  charP '['
            <*> parseIdent
            <*  charP ']'
            <*  charP '\n'
            <*> (catMaybes <$> manyP parseLine)
    parseIdent = many1P letterOrDigitP
    parseLine = parseDecl `orElseP` parseComment `orElseP` parseEmpty

    parseDecl = Just <$> (
        (,) <*> parseIdent
            <*  manyP (charP ' ')
            <*  charP '='
            <*  manyP (charP ' ')
            <*> many1P (anyCharButP '\n')
            <*  charP '\n')

    parseComment =
        Nothing <$ charP '#'
                <* many1P (anyCharButP '\n')
                <* charP '\n'

    parseEmpty = Nothing <$ charP '\n'

Transforming that to a generic description is quite straightforward. We use primitive again to wrap letterOrDigitP:

descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
  where
    parseSection =
        (,) <*  char '['
            <*> parseIdent
            <*  char ']'
            <*  newline
            <*> (catMaybes <$> many parseLine)
    parseIdent = many1 (primitive "alphanum" letterOrDigitP)
    parseLine = parseDecl `orElse` parseComment `orElse` parseEmpty

    parseDecl = Just <$> (
        (,) <*> parseIdent
            <*  many (char ' ')
            <*  char '='
            <*  many (char ' ')
            <*> many1 (primitive "non-newline" (anyCharButP '\n'))
	    <*  newline)

    parseComment =
        Nothing <$ char '#'
                <* many1 (primitive "non-newline" (anyCharButP '\n'))
		<* newline

    parseEmpty = Nothing <$ newline

This yields this not very helpful grammar (abbreviated here):

*Main> putStr $ ppGrammar2 "ini" descrINI
ini = '[', alphanum, {alphanum}, ']', newline, {alphanum, {alphanum}, {' '}…

But with a few uses of nonTerminal, we get something really nice:

descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
  where
    parseSection = nonTerminal "section" $
        (,) <$  char '['
            <*> parseIdent
            <*  char ']'
            <*  newline
            <*> (catMaybes <$> many parseLine)
    parseIdent = nonTerminal "identifier" $
        many1 (primitive "alphanum" letterOrDigitP)
    parseLine = nonTerminal "line" $
        parseDecl `orElse` parseComment `orElse` parseEmpty

    parseDecl = nonTerminal "declaration" $ Just <$> (
        (,) <$> parseIdent
            <*  spaces
            <*  char '='
            <*  spaces
            <*> remainder)

    parseComment = nonTerminal "comment" $
        Nothing <$ char '#' <* remainder

    remainder = nonTerminal "line-remainder" $
        many1 (primitive "non-newline" (anyCharButP '\n')) <* newline

    parseEmpty = Nothing <$ newline

    spaces = nonTerminal "spaces" $ many (char ' ')
*Main> putStr $ ppGrammar "ini" descrINI
identifier = alphanum, {alphanum};
spaces = {' '};
line-remainder = non-newline, {non-newline}, newline;
declaration = identifier, spaces, '=', spaces, line-remainder;
comment = '#', line-remainder;
line = declaration | comment | newline;
section = '[', identifier, ']', newline, {line};
ini = section, {section};

Recursion (variant 1)

What if we want to write a parser/grammar-generator that is able to generate the following grammar, which describes terms that are additions and multiplications of natural numbers:

const = digit, {digit};
spaces = {' ' | newline};
atom = const | '(', spaces, expr, spaces, ')', spaces;
mult = atom, {spaces, '*', spaces, atom}, spaces;
plus = mult, {spaces, '+', spaces, mult}, spaces;
expr = plus;

The production of expr is recursive (via plus, mult, atom). We have seen above that simply defining a Grammar a recursively does not go well.

One solution is to add a new combinator for explicit recursion, which replaces nonTerminal in the method:

class Applicative f => Descr f where
    
    recNonTerminal :: String -> (f a -> f a) -> f a

instance Descr Parser where
    
    recNonTerminal _ p = let r = p r in r

instance Descr Grammar where
    
    recNonTerminal = recNonTerminalG

recNonTerminalG :: String -> (Grammar a -> Grammar a) -> Grammar a
recNonTerminalG name f =
    let G (prods, rhs) = f (G ([], NonTerminal name))
    in G (prods ++ [(name, rhs)], NonTerminal name)

nonTerminal :: Descr f => String -> f a -> f a
nonTerminal name p = recNonTerminal name (const p)

runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, NonTerminal nt)) | main == nt = prods
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]

The change in runGrammer avoids adding a pointless expr = expr production to the output.

This lets us define a parser/grammar-generator for the arithmetic expressions given above:

data Expr = Plus Expr Expr | Mult Expr Expr | Const Integer
    deriving Show

mkPlus :: Expr -> [Expr] -> Expr
mkPlus = foldl Plus

mkMult :: Expr -> [Expr] -> Expr
mkMult = foldl Mult

parseExpr :: Descr f => f Expr
parseExpr = recNonTerminal "expr" $ \ exp ->
    ePlus exp

ePlus :: Descr f => f Expr -> f Expr
ePlus exp = nonTerminal "plus" $
    mkPlus <$> eMult exp
           <*> many (spaces *> char '+' *> spaces *> eMult exp)
           <*  spaces

eMult :: Descr f => f Expr -> f Expr
eMult exp = nonTerminal "mult" $
    mkPlus <$> eAtom exp
           <*> many (spaces *> char '*' *> spaces *> eAtom exp)
           <*  spaces

eAtom :: Descr f => f Expr -> f Expr
eAtom exp = nonTerminal "atom" $
    aConst `orElse` eParens exp

aConst :: Descr f => f Expr
aConst = nonTerminal "const" $ Const . read <$> many1 digit

eParens :: Descr f => f a -> f a
eParens inner =
    id <$  char '('
       <*  spaces
       <*> inner
       <*  spaces
       <*  char ')'
       <*  spaces

And indeed, this works:

*Main> putStr $ ppGrammar "expr" parseExpr
const = digit, {digit};
spaces = {' ' | newline};
atom = const | '(', spaces, expr, spaces, ')', spaces;
mult = atom, {spaces, '*', spaces, atom}, spaces;
plus = mult, {spaces, '+', spaces, mult}, spaces;
expr = plus;

Recursion (variant 2)

Interestingly, there is another solution to this problem, which avoids introducing recNonTerminal and explicitly passing around the recursive call (i.e. the exp in the example). To implement that we have to adjust our Grammar type as follows:

newtype Grammar a = G ([String] -> (BNF, RHS))

The idea is that the list of strings is those non-terminals that we are currently defining. So in nonTerminal, we check if the non-terminal to be introduced is currently in the process of being defined, and then simply ignore the body. This way, the recursion is stopped automatically:

nonTerminalG :: String -> (Grammar a) -> Grammar a
nonTerminalG name (G g) = G $ \seen ->
    if name `elem` seen
    then ([], NonTerminal name)
    else let (prods, rhs) = g (name : seen)
         in (prods ++ [(name, rhs)], NonTerminal name)

After adjusting the other primitives of Grammar (including the Functor and Applicative instances, wich now again have nonTerminal) to type-check again, we observe that this parser/grammar generator for expressions, with genuine recursion, works now:

parseExp :: Descr f => f Expr
parseExp = nonTerminal "expr" $
    ePlus

ePlus :: Descr f => f Expr
ePlus = nonTerminal "plus" $
    mkPlus <$> eMult
           <*> many (spaces *> char '+' *> spaces *> eMult)
           <*  spaces

eMult :: Descr f => f Expr
eMult = nonTerminal "mult" $
    mkPlus <$> eAtom
           <*> many (spaces *> char '*' *> spaces *> eAtom)
           <*  spaces

eAtom :: Descr f => f Expr
eAtom = nonTerminal "atom" $
    aConst `orElse` eParens parseExp

Note that the recursion is only going to work if there is at least one call to nonTerminal somewhere around the recursive calls. We still cannot implement many as naively as above.

Homework

If you want to play more with this: The homework is to define a parser/grammar-generator for EBNF itself, as specified in this variant:

identifier = letter, {letter | digit | '-'};
spaces = {' ' | newline};
quoted-char = non-quote-or-backslash | '\\', '\\' | '\\', '\'';
terminal = '\'', {quoted-char}, '\'', spaces;
non-terminal = identifier, spaces;
option = '[', spaces, rhs, spaces, ']', spaces;
repetition = '{', spaces, rhs, spaces, '}', spaces;
group = '(', spaces, rhs, spaces, ')', spaces;
atom = terminal | non-terminal | option | repetition | group;
sequence = atom, {spaces, ',', spaces, atom}, spaces;
choice = sequence, {spaces, '|', spaces, sequence}, spaces;
rhs = choice;
production = identifier, spaces, '=', spaces, rhs, ';', spaces;
bnf = production, {production};

This grammar is set up so that the precedence of , and | is correctly implemented: a , b | c will parse as (a, b) | c.

In this syntax for BNF, terminal characters are quoted, i.e. inside '…', a ' is replaced by \' and a \ is replaced by \\ – this is done by the function quote in ppRHS.

If you do this, you should able to round-trip with the pretty-printer, i.e. parse back what it wrote:

*Main> let bnf1 = runGrammer "expr" parseExpr
*Main> let bnf2 = runGrammer "expr" parseBNF
*Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF
*Main> f bnf1 == bnf1
True
*Main> f bnf2 == bnf2
True

The last line is quite meta: We are using parseBNF as a parser on the pretty-printed grammar produced from interpreting parseBNF as a grammar.

Conclusion

We have again seen an example of the excellent support for abstraction in Haskell: Being able to define so very different things such as a parser and a grammar description with the same code is great. Type classes helped us here.

Note that it was crucial that our combined parser/grammars are only able to use the methods of Applicative, and not Monad. Applicative is less powerful, so by giving less power to the user of our Descr interface, the other side, i.e. the implementation, can be more powerful.

The reason why Applicative is ok, but Monad is not, is that in Applicative, the results do not affect the shape of the computation, whereas in Monad, the whole point of the bind operator (>>=) is that the result of the computation is used to decide the next computation. And while this is perfectly fine for a parser, it just makes no sense for a grammar generator, where there simply are no values around!

We have also seen that a phantom type, namely the parameter of Grammar, can be useful, as it lets the type system make sure we do not write nonsense. For example, the type of orElseG ensures that both grammars that are combined here indeed describe something of the same type.


  1. It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using Applicative yesterday.

  2. I like how in this alignment of <*> and <* the > point out where the arguments are that are being passed to the function on the left.