Joachim Breitner

Blog

Left recursive parser combinators via sharing

Published 2023-09-10 in sections English, Haskell.

At this year’s ICFP in Seattle I gave a talk about my rec-def Haskell library, which I have blogged about before here. While my functional pearl paper focuses on a concrete use-case and the tricks of the implementation, in my talk I put the emphasis on the high-level idea: it beholds of a declarative lazy functional like Haskell that recursive equations just work whenever they describe a (unique) solution. Like in the paper, I used equations between sets as the running example, and only conjectured that it should also work for other domains, in particular parser combinators.

Naturally, someone called my bluff and asked if I actually tried it. I had not, but I should have, because it works nicely and is actually more straight-forward than with sets. I wrote up a prototype and showed it off a few days later as a lightning talk at Haskell Symposium; here is the write up that goes along with that.

Parser combinators

Parser combinators are libraries that provide little functions (combinators) that you compose to define your parser directly in your programming language, as opposed to using external tools that read some grammar description and generate parser code, and are quite popular in Haskell (e.g. parsec, attoparsec, megaparsec).

Let us define a little parser that recognizes sequences of as:

ghci> let aaa = tok 'a' *> aaa <|> pure ()
ghci> parse aaa "aaaa"
Just ()
ghci> parse aaa "aabaa"
Nothing

Left-recursion

This works nicely, but just because we were lucky: We wrote the parser to recurse on the right (of the *>), and this happens to work. If we put the recursive call first, it doesn’t anymore:

ghci> let aaa = aaa <* tok 'a' <|> pure ()
ghci> parse aaa "aaaa"
^CInterrupted.

This is a well-known problem (see for example Nicolas Wu’s overview paper), all the common parser combinator libraries cannot handle it and the usual advise is to refactor your grammar to avoid left recursion.

But there are some libraries that can handle left recursion, at least with a little help from the programmer. I found two variations:

  • The library provides an explicit fix point combinator, and as long as that is used, left-recursion works. This is for example described by Frost, Hafiz and Callaghan by, and (of course) Oleg Kiselyov has an implementation of this too.

  • The library expects explicit labels on recursive productions, so that the library can recognize left-recursion. I found an implementation of this idea in the Agda.Utils.Parser.MemoisedCPS module in the Agda code, the gll library seems to follow this style and Jaro discusses it as well.

I took the module from the Agda source and simplified a bit for the purposes of this demonstration (Parser.hs). Indeed, I can make the left-recursive grammar work:

ghci> let aaa = memoise ":-)" $ aaa <* tok 'a' <|> pure ()
ghci> parse aaa "aaaa"
Just ()
ghci> parse aaa "aabaa"
Nothing

It does not matter what I pass to memoise, as long as I do not pass the same key when memoising a different parser.

For reference, an excerpt of the the API of Parser:

data Parser k tok a -- k is type of keys, tok type of tokens (e.g. Char)
instance Functor (Parser k tok)
instance Applicative (Parser k tok)
instance Alternative (Parser k tok)
instance Monad (Parser k tok)
parse :: Parser k tok a -> [tok] -> Maybe a
sat :: (tok -> Bool) -> Parser k tok tok
tok :: Eq tok => tok -> Parser k tok tok
memoise :: Ord k => k -> Parser k tok a -> Parser k tok a

Left-recursion through sharing

To follow the agenda set out in my talk, I now want to wrap that parser in a way that relieves me from having to insert the calls to memoise. To start, I import that parser qualified, define a newtype around it, and start lifting some of the functions:

import qualified Parser as P

newtype Parser tok a = MkP { unP :: P.Parser Unique tok a }

parse :: Parser tok a -> [tok] -> Maybe a
parses (MkP p) = P.parse p

sat :: Typeable tok => (tok -> Bool) -> Parser tok tok
sat p = MkP (P.sat p)

tok :: Eq tok => tok -> Parser tok tok
tok t = MkP (P.tok t)

So far, nothing interesting had to happen, because so far I cannot build recursive parsers. The first interesting combinator that allows me to do that is <*> from the Applicative class, so I should use memoise there. The question is: Where does the unique key come from?

Proprioception

As with the rec-def library, pure code won’t do, and I have to get my hands dirty: I really want a fresh unique label out of thin air. To that end, I define the following combinator, with naming aided by Richard Eisenberg:

propriocept :: (Unique -> a) -> a
propriocept f = unsafePerformIO $ f <$> newUnique

A thunk defined with propriocept will know about it’s own identity, and will be able to tell itself apart from other such thunks. This gives us a form of observable sharing, precisely what we need. But before we return to our parser combinators, let us briefly explore this combinator.

Using propriocept I can define an operation cons :: [Int] -> [Int] that records (the hash of) this Unique in the list:

ghci> let cons xs = propriocept (\x -> hashUnique x : xs)
ghci> :t cons
cons :: [Int] -> [Int]

This lets us see the identity of a list cell, that is, of the concrete object in memory.

Naturally, if we construct a finite list, each list cell is different:

ghci> cons (cons (cons []))
[1,2,3]

And if we do that again, we see that fresh list cells are allocated:

ghci> cons (cons (cons []))
[4,5,6]

We can create an infinite list; if we do it without sharing, every cell is separate:

ghci> take 20 (acyclic 0)
[7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]

but if we tie the knot using sharing, all the cells in the list are actually the same:

ghci> let cyclic = cons cyclic
ghci> take 20 cyclic
[27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27]

We can achieve the same using fix from Data.Function:

ghci> import Data.Function
ghci> take 20 (fix cons)
[28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28]

I explore these heap structures more visually in a series of screencasts.

So with propriocept we can distinguish different heap objects, and also recognize when we come across the same heap object again.

Left-recursion through sharing (cont.)

With that we return to our parser. We define a smart constructor for the new Parser that passes the unique from propriocept to the underlying parser’s memoise function:

withMemo :: P.Parser Unique tok a -> Parser tok a
withMemo p = propriocept $ \u -> MkP $ P.memoise u p

If we now use this in the definition of all possibly recursive parsers, then the necessary calls to memoise will be in place:

instance Functor (Parser tok) where
  fmap f p = withMemo (fmap f (unP p))

instance Applicative (Parser tok) where
  pure x = MkP (pure x)
  p1 <*> p2 = withMemo (unP p1 <*> unP p2)

instance Alternative (Parser tok) where
  empty = MkP empty
  p1 <|> p2 = withMemo (unP p1 <|> unP p2)

instance Monad (Parser tok) where
  return = pure
  p1 >>= f = withMemo $ unP p1 >>= unP . f

And indeed, it works (see RParser.hs for the full code):

ghci> let aaa = aaa <* tok 'a' <|> pure ()
ghci> parse aaa "aaaa"
Just ()
ghci> parse aaa "aabaa"
Nothing

A larger example

Let us try this on a larger example, and parse (simple) BNF grammars. Here is a data type describing them

type Ident = String
type RuleRhs = [Seq]
type Seq = [Atom]
data Atom = Lit String | NonTerm Ident deriving Show
type Rule = (Ident, RuleRhs)
type BNF = [Rule]

For the concrete syntax, I’d like to be able to parse something like

numExp :: String
numExp = unlines
    [ "term   := sum;"
    , "pdigit := '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';"
    , "digit  := '0' | pdigit;"
    , "pnum   := pdigit | pnum digit;"
    , "num    := '0' | pnum;"
    , "prod   := atom | atom '*' prod;"
    , "sum    := prod | prod '+' sum;"
    , "atom   := num | '(' term ')';"
    ]

so here is a possible parser; mostly straight-forward use of parser combinator:

type P = Parser Char

snoc :: [a] -> a -> [a]
snoc xs x = xs ++ [x]

l :: P a -> P a
l p = p <|> l p <* sat isSpace
quote :: P Char
quote = tok '\''
quoted :: P a -> P a
quoted p = quote *> p <* quote
str :: P String
str = some (sat (not . (== '\'')))
ident :: P Ident
ident = some (sat (\c -> isAlphaNum c && isAscii c))
atom :: P Atom
atom = Lit     <$> l (quoted str)
   <|> NonTerm <$> l ident
eps :: P ()
eps = void $ l (tok 'ε')
sep :: P ()
sep = void $ some (sat isSpace)
sq :: P Seq
sq = []   <$ eps
 <|> snoc <$> sq <* sep <*> atom
 <|> pure <$> atom
ruleRhs :: P RuleRhs
ruleRhs = pure <$> sq
      <|> snoc <$> ruleRhs <* l (tok '|') <*> sq
rule :: P Rule
rule = (,) <$> l ident <* l (tok ':' *> tok '=') <*> ruleRhs <* l (tok ';')
bnf :: P BNF
bnf = pure <$> rule
  <|> snoc <$> bnf <*> rule

I somewhat sillily used snoc rather than (:) to build my lists, just so that I can show off all the left-recursion in this grammar.

Sharing is tricky

Let’s try it:

ghci> parse bnf numExp
^CInterrupted.

What a pity, it does not work! What went wrong?

The underlying library can handle left-recursion if it can recognize it by seeing a memoise label passed again. This works fine in all the places where we re-use a parser definition (e.g. in bnf), but it really requires that values are shared!

If we look carefully at our definition of l (which parses a lexeme, i.e. something possibly followed by whitespace), then it recurses via a fresh function call, and the program will keep expanding the definition – just like the acyclic above:

l :: P a -> P a
l p = p <|> l p <* sat isSpace

The fix (sic!) is to make sure that the recursive call is using the parser we are currently defining, which we can easily do with a local definition:

l :: P a -> P a
l p = p'
  where p' = p <|> p' <* sat isSpace

With this little fix, the parser can parse the example grammar:

ghci> parse bnf numExp
Just [("term",[[NonTerm "sum"]]),("pdigit",[[Lit "1"],…

Going meta

The main demonstration is over, but since we now have already have a parser for grammar descriptions at hand, let’s go a bit further and dynamically construct a parser from such a description. The parser should only accept strings according to that grammar, and return a parse tree annotated with the non-terminals used:

interp :: BNF -> P String
interp bnf = parsers M.! start
  where
    start :: Ident
    start = fst (head bnf)

    parsers :: M.Map Ident (P String)
    parsers = M.fromList [ (i, parseRule i rhs) | (i, rhs) <- bnf ]

    parseRule :: Ident -> RuleRhs -> P String
    parseRule ident rhs = trace <$> asum (map parseSeq rhs)
      where trace s = ident ++ "(" ++ s ++ ")"

    parseSeq :: Seq -> P String
    parseSeq = fmap concat . traverse parseAtom

    parseAtom :: Atom -> P String
    parseAtom (Lit s) = traverse tok s
    parseAtom (NonTerm i) = parsers M.! i

Let’s see it in action (full code in BNFEx.hs):

ghci> Just bnfp = parse bnf numExp
ghci> :t bnfp
bnfp :: BNF
ghci> parse (inter
interact  interp
ghci> parse (interp bnfp) "12+3*4"
Just "term(sum(prod(atom(num(pnum(pnum(pdigit(1))digit(pdigit(2))))))+sum(prod(atom(num(pnum(pdigit(3))))*prod(atom(num(pnum(pdigit(4)))))))))"
ghci> parse (interp bnfp) "12+3*4+"
Nothing

It is worth noting that the numExp grammar is also left-recursive, so implementing interp with a conventional parser combinator library would not have worked. But thanks to our propriocept tick, it does! Again, the sharing is important; in the code above it is the map parsers that is defined in terms of itself, and will ensure that the left-recursive productions will work.

Closing thoughts

I am using unsafePerformIO, so I need to justify its use. Clearly, propriocept is not a pure function, and it’s type is a lie. In general, using it will break the nice equational properties of Haskell, as we have seen in our experiments with cons.

In the case of our parser library, however, we use it in specific ways, namely to feed a fresh name to memoise. Assuming the underlying parser library’s behavior does not observably depend on where and with which key memoise is used, this results in a properly pure interface, and all is well again. (NB: I did not investigate if this assumption actually holds for the parser library used here, or if it may for example affect the order of parse trees returned.)

I also expect that this implementation, which will memoise every parser involved, will be rather slow. It seems plausible to analyze the graph structure and figure out which memoise calls are actually needed to break left-recursion (at least if we drop the Monad instance or always memoise >>=).

If you liked this post, you might enjoy reading the paper about rec-def, watch one of my talks about it (MuniHac, BOBKonf, ICFP23; the presentation evolved over time), or if you just want to see more about how things are laid out on Haskell’s heap, go to my screen casts exploring the Haskell heap.

Generating bibtex bibliographies from DOIs via DBLP

Published 2023-07-12 in sections English, Digital World.

I sometimes write papers and part of paper writing is assembling the bibliography. In my case, this is done using BibTeX. So when I need to add another citation, I have to find suitable data in Bibtex format.

Often I copy snippets from .bib files from earlier paper.

Or I search for the paper on DBLP, which in my experience has highest quality BibTeX entries and best coverage of computer science related publications, copy it to my .bib file, and change the key to whatever I want to refer the paper by.

But in the days of pervasive use of DOIs (digital object identifiers) for almost all publications, manually keeping the data in bibtex files seems outdated. Instead I’d rather just put the two pieces of data I care about: the key that I want to use for citation, and the doi. The rest I do not want to be bothered with.

So I wrote a small script that takes a .yaml file like

entries:
  unsafePerformIO: 10.1007/10722298_3
  dejafu: 10.1145/2804302.2804306
  runST: 10.1145/3158152
  quickcheck: 10.1145/351240.351266
  optimiser: 10.1016/S0167-6423(97)00029-4
  sabry: 10.1017/s0956796897002943
  concurrent: 10.1145/237721.237794
  launchbury: 10.1145/158511.158618
  datafun: 10.1145/2951913.2951948
  observable-sharing: 10.1007/3-540-46674-6_7
  kildall-73: 10.1145/512927.512945
  kam-ullman-76: 10.1145/321921.321938
  spygame: 10.1145/3371101
  cocaml: 10.3233/FI-2017-1473
  secrets: 10.1017/S0956796802004331
  modular: 10.1017/S0956796817000016
  longley: 10.1145/317636.317775
  nievergelt: 10.1145/800152.804906
  runST2: 10.1145/3527326
  polakow: 10.1145/2804302.2804309
  lvars: 10.1145/2502323.2502326
  typesafe-sharing: 10.1145/1596638.1596653
  pure-functional: 10.1007/978-3-642-14162-1_17
  clairvoyant: 10.1145/3341718
subs:
  - replace: Peyton Jones
    with: '{Peyton Jones}'

and turns it into a nice .bibtex file:

$ ./doi2bib.py < doibib.yaml > dblp.bib
$ head dblp.bib
@inproceedings{unsafePerformIO,
  author       = {Simon L. {Peyton Jones} and
                  Simon Marlow and
                  Conal Elliott},
  editor       = {Pieter W. M. Koopman and
                  Chris Clack},
  title        = {Stretching the Storage Manager: Weak Pointers and Stable Names in
                  Haskell},
  booktitle    = {Implementation of Functional Languages, 11th International Workshop,
                  IFL'99, Lochem, The Netherlands, September 7-10, 1999, Selected Papers},

The last bit allows me to do some fine-tuning of the file, because unfortunately, not even DBLP BibTeX files are perfect, for example in the presence of two family names.

Now I have less moving parts to worry about, and a more consistent bibliography.

The script is rather small, so I’ll just share it here:

#!/usr/bin/env python3

import sys
import yaml
import requests
import requests_cache
import re

requests_cache.install_cache(backend='sqlite')

data = yaml.safe_load(sys.stdin)

for key, doi in data['entries'].items():
    bib = requests.get(f"https://dblp.org/doi/{doi}.bib").text
    bib = re.sub('{DBLP.*,', '{' + key + ',', bib)
    for subs in data['subs']:
        bib = re.sub(subs['replace'], subs['with'], bib)
    print(bib)

There are similar projects out there, e.g. dblpbibtex in C++ and dblpbib in Ruby. These allow direct use of \cite{DBLP:rec/conf/isit/BreitnerS20} in Latex, which is also nice, but for now I like to choose more speaking citation keys myself.

ICFP Pearl preprint on rec-def

Published 2023-06-22 in sections English, Haskell.

I submitted a Functional Pearl to this year’s ICFP and it got accepted!

It is about the idea of using Haskell’s inherent ability to define recursive equations, and use them for more than just functions and lazy data structures. I blogged about this before (introducing the idea, behind the scenes, applications to program analysis, graph algorithms and minesweeper), but hopefully the paper brings out the idea even more clearly. The constructive feedback from a few friendly readers (Claudio, Sebastian, and also the anonymous reviewers) certainly improved the paper.

Abstract

Haskell’s laziness allows the programmer to solve some problems naturally and declaratively via recursive equations. Unfortunately, if the input is “too recursive”, these very elegant idioms can fall into the dreaded black hole, and the programmer has to resort to more pedestrian approaches.

It does not have to be that way: We built variants of common pure data structures (Booleans, sets) where recursive definitions are productive. Internally, the infamous unsafePerformIO is at work, but the user only sees a beautiful and pure API, and their pretty recursive idioms – magically – work again.

If you are curious, please have a look at the current version of the paper. Any feedback is welcome; even more if it comes before July 11, because then I can include it in the camera ready version.


There are still some open questions around this work. What bothers me maybe most is the lack of a denotational semantics that unifies the partial order underlying the Haskell fragment, and the partial order underlying the domain of the embedded equations.

The crux of the probem is maybe best captured by this question:

Imagine an untyped lambda calculus with constructors, lazy evaluation, and an operation rseq that recursively evaluates constructors, but terminates in the presence of cycles. So for example

rseq (let x    = 1 :: x    in x   ) ≡ ()
rseq (let x () = 1 :: x () in x ()) ≡ ⊥

In this language, knot tying is observable. What is the “nicest” denotational semantics.

Update: I made some progress via a discussion on the Haskell Discource and started some rough notes on a denotational semantics.

The curious case of the half-half Bitcoin ECDSA nonces

Published 2023-06-07 in sections English, Digital World.

This is the week of the Gulaschprogrammiernacht, a yearly Chaos Computer Club even in Karlsruhe, so it was exactly a year ago that I sat in my AirBnB room and went over the slides for my talk “Lattice Attacks on Ethereum, Bitcoin, and HTTPS” that I would give there.

It reports on research done with Nadia Heninger while I was in Phildalephia, and I really liked giving that talk: At some point we look at some rather odd signatures we found on the bitcoin blockchain, and part of the signature (the “nonce”) happens to share some bytes with the secret key. A clear case of some buffer overlap in a memory unsafe language, which I, as a fan of languages like Haskell, are very happy to sneer at!

A sneery slide

But last year, as I was going over the slides I looked at the raw data again for some reason, and I found that we overlooked something: Not only was the the upper half ot the nonce equal to the lower half of the secret key, but he lower half of the nonce was also equal to the upper half of the message hash!

This now looks much less like an accident to me, and more like a (overly) simple form of deterministic nonce creation… so much for my nice anecdote. (I still used the anecdote in my talk, followed up with an “actually”.)

When I told Nadia about this, she got curious as well, and quickly saw that from a signature with such a nonce, one can rather easily extract the secret key. So together with her student Dylan Rowe, we implemented this analysis and searched the bitcoin blockchain for more instance of such signatures. We did find a few, and were even able to trace them back to a somewhat infamous bitcoin activist going under the pseudonym Amaclin.

This research and sleuthing turned into another paper, “The curious case of the half-half Bitcoin ECDSA nonces”, to be presented at AfricaCrypt 2023. Enjoy!

Giving back to OPLSS

Published 2023-06-04 in sections English, Haskell.

Nine years ago, when I was a PhD student, I attended the Oregon Programming Language Summer School in Eugene. I had a great time and learned a lot.

The OPLSS’14 full image

Learning some of the things I learned there, and meeting some of the people I met there, also led to me graduating, which led to me becoming a PostDoc at UPenn, which led to me later joining DFINITY to implement the Motoko programming language and help design and specify the public interface of their “Internet Computer”, including the response certification (video).

So when the ICDevs non-profit offered a development bounty for a Motoko library implementing the merkle trees involved in certification, this sounded like a fun little coding task, so I completed it; likely with less effort than it would have taken someone who first had to get into these topics.

The bounty was quite generous, at US$ 10k, and I was too vain to “just” have it donated to some large charity, as I recently with a few coding and consulting gigs, and looked for more personal. So, the ICDevs guys and I agreed to donate the money to this year’s OPLSS, where I heard it can cover the cost of about 8 students, and hopefully helps the PL cause.

(You will not find us listed as sponsors because for some reason, a “donation” instead of “sponsorship” comes with less strings attached to the organizers.)

More thoughts on a bootstrappable GHC

Published 2023-04-26 in sections English, Haskell.

The bootstrappable builds project tries to find ways of building all our software from source, without relying on binary artifacts. A noble goal, and one that is often thwarted by languages with self-hosting compilers, like GHC: In order to build GHC, you need GHC. A Pull Request against nixpkgs, adding first steps of the bootstrapping pipeline, reminded me of the issue with GHC, which I have noted down some thoughts about before and I played around a bit more.

The most promising attempt to bootstrap GHC was done by rekado in 2017. He observed that Hugs is maybe the most recently maintained bootstrappable (since written in C) Haskell compiler, but noticed that “it cannot deal with mutually recursive module dependencies, which is a feature that even the earliest versions of GHC rely on. This means that running a variant of GHC inside of Hugs is not going to work without major changes.” He then tries to bootstrap another very old Haskell compiler (nhc) with Hugs, and makes good but incomplete progress.

This made me wonder: What if Hugs supported mutually recursive modules? Would that make a big difference? Anthony Clayden keeps advocating Hugs as a viable Haskell implementation, so maybe if that was the main blocker, then adding support to Hugs for that is probably not too hard (at least in a compile-the-strongly-connected-component-as-one-unit mode) and worthwhile?

All of GHC in one file?

That reminded me of a situation I was in before, where I had to combine multiple Haskell modules into one before: For my talk “Lock-step simulation is child’s play” I wrote a multi-player game, a simulation environment for it, and a presentation tool around it, all in the CodeWorld programming environment, which supports only a single module. So I hacked the a small tool hs-all-in-one that takes multiple Haskell modules and combines them into one, mangling the names to avoid name clashes.

This made me wonder: Can I turn all of GHC into one module, and compile that?

At this point I have probably left the direct path towards bootstrapping, but I kinda good hooked.

  1. Using GHC’s hadrian/ghci tool, I got it to produce the necessary generated files (e.g. from happy grammars) and spit out the lit of modules that make up GHC, which I could feed to hs-all-in-one.

  2. It uses haskell-src-exts for parsing, and it was almost able to parse all of that. It has a different opinion about how MultiWayIf should be indented, whether EmptyCase needs {} and issues pretty-printing some promoted values, but otherwise the round-tripping worked fine, and I as able to generate a large file (680,000 loc, 41 MB) that passes GHC’s parser.

  3. It also uses haskell-names to resolve names.

    This library is less up-to-date with various Haskell features, so I added support for renaming in some pragmas (ANN, SPECIALIZE), pattern signatures etc.

    For my previous use-case I could just combine all the imports, knowing that I would not introduce conflicts. For GHC, this is far from true: Some modules import Data.Map.Strict, others Data.Map.Lazy, and yet others introduce names that clash with stuff imported from the prelude… so I had to change the tool to fully qualify all imported values. This isn’t so bad, I can do that using haskell-names, if I somehow know what all the modules in base, containers, transformers and array export.

    The haskell-names library itself comes with a hard-coded database of base exports, but it is incomplete and doesn’t help me with, say, containers.

    I then wrote a little parser for the .txt files that haddock produces for the benefit of hoogle, and that are conveniently installed along the packages (at least on nix). This would have been great, if these files wouldn’t simply omit all reexported entities! I added some manual hacks (Data.Map has the same exports as Data.IntMap; Prelude exports all entities as known by haskell-names, but those that are also exported from Data.List, use the symbol from there…)

    I played this game of whack-a-mole for a while, solving many of the problems that GHC’s renamer reports, but eventually stopped to write this blog post. I am fairly confident that this could be pulled through, though.

Back to bootstrapping

So what if we could pull this through? We’d have a very large code file that GHC may be able to compile to produce a ghc binary without exhausting my RAM. But that doesn’t help with bootstrapping yet.

If lack of support for recursive modules is all that Hugs is missing, we’d be done indeed. But quite contrary, it is probably the least of our worries, given that contemporary GHC uses many many other features not supported by Hugs.

Some of them a syntactic and can easily be rewritten to more normal Haskell in a preprocessing step (e.g. MultiWayIf).

Others are deep and hard (GADTs, Pattern synonyms, Type Families), and prohibit attempting to compile a current version of GHC (even if its all one module) with Hugs. So one would certainly have to go back in time and find a version of GHC that is not yet using all these features. For example, the first use of GADTs was introduced by Simon Marlow in 2011, so this suggests going back to at least GHC 7.0.4, maybe earlier.

Still, being able to mangle the source code before passing it to Hugs is probably a useful thing. This poses the question whether Hugs can compile such a tool; in particular, is it capable of compiling haskell-src-exts, which I am not too optimistic about either. Did someone check this already?

So one plan of attack could be

  1. Identify an old version of GHC that

    • One can use to bootstrap subsequent versions until today.
    • Is old enough to use as few features not supported by hugs as possible.
    • Is still new enough so that one can obtain a compatible toolchain.
  2. Wrangle the build system to tell you which files to compile, with which preprocessor flags etc.

  3. Boostrap all pre-processing tools used by GHC (cpphs or use plan cpp, happy, alex).

  4. For every language feature not supported by Hugs, either

    • Implement it in Hugs,
    • Manually edit the source code to avoid compiling the problematic code, if it is optional (e.g. in an optimization pass)
    • Rewrite the problematic code
    • Write a pre-processing tool (like the one above) that compiles the feature away
  5. Similarly, since Hugs probably ships a base that is different than what GHC, or the libraries used by GHC expects, either adjust Hugs’ base, or modify the GHC code that uses it.

My actual plan, though, for now is to throw these thoughts out, maybe make some noise on Discourse, Mastodon, Twitter and lobste.rs, and then let it sit and hope someone else will pick it up.

rclone, WebDav, Mailbox.org

Published 2023-04-19 in sections English, Digital World.

Just a small googleable note for those trying to set this up too:

If you try to access your “My Files” mailbox.org WebDav storage using rclone, then the URL is not just https://dav.mailbox.org/servlet/webdav.infostore/, as written on the mailbox.org documentation, but actually https://dav.mailbox.org/servlet/webdav.infostore/Userstore/Your Name.

(You can start with https://dav.mailbox.org/servlet/webdav.infostore/ and then use rclone ls to find out the full path, but then it may be more convenient to change it to point directly to the “My Files” space where you can actually create files.)

Farewell quimby and fry, welcome richard

Published 2023-04-17 in sections English, Digital World.

For a little more than two decades, I have been running one or two dedicated servers for a fair number of services. At some time or the other, it was hosting

  • A web server for my personal website
  • A web server for various private and professional webpages for friends and family
  • An email server with IMAP, SMTP, Spam filtering, for me and family
  • A mailing list server for various free software project
  • A DNS server
  • A Half-Life and Counter Strike server, with a statistics web page
  • The web page for my Counter Strike clan, running on a custom Perl-and-Mysql based CMS
  • The web page for my high school class, running on the same system (this was before everyone used, or had used, Facebook, and even supported tagging people on images…)
  • A Docbook-and-SVN-based documentation management system that my brother and I built and even sold to a few customers.
  • A custom SVN-, Perl and Template-Toolkit based static site generating CMS, before that was cool, also with one or two actual customers.
  • A SVN- and later Git based site for collaborative editing of math and computer science lecture notes; back then there was no Overleaf
  • A Jabber server
  • The backend for an online adaption of the board game “Sim Serim” which got the author to gift me the real thing
  • An SVN server
  • A darcs server
  • A git server
  • A tool to track darcs patches that were submitted by mailing lists
  • A blog aggregator (a “planet”) for friends, and one for Ghana’s Free Software community
  • An Owncloud instance for family
  • Virtual machines maintained by friends, to share resources more cheaply
  • A OpenVPN and later tinc based VPN for my machines
  • Jobs that RSS feeds to IMAP (using feed2imap and later feed2imap-go)
  • Jobs that send email greetings to a friend of mine that I have collected from his wedding guests, and are now delivered at decreasing rate over the next decades.
  • Online questionnaires to gather data for a genealogy project
  • Jobs that send an email with a daily summary of family events from that date.
  • A Django app to organize a larger family gathering
  • Multiple Mediawiki instances
  • A freenet node and a tor node
  • Code that demonstrated the Cross-site authentication attack
  • … and probably more stuff that I don’t remember anymore

It’s not you, it’s me

Running this kind of service on my own was great fun and a huge learning experience, and in some cases at that time the only way to have a certain feature. But in recent years my interests shifted a bit, more into Programming Languages Theory (and practice) than Devops, and I was no longer paying attention as much as these services require. This gave me a bad conscience, especially in terms of security updates.

Especially running your own email server, while still possible, isn’t fire-and-forget: One has to stay on top of spam protection measures, both on the receiving end (spamassassin etc.) as well as when sending email (DKIM, configuring mailing lists to rewrite sender etc.)

Also some family members were commercially relying on these servers and services working, which was no longer tenable.

Weaning off

Therefore, more than a year ago, I decided to wind this down. Turns out that getting rid of responsibilities takes at least as long as taking them on, especially if your “customers” are content and a move to something else is but an annoyance. But last weekend I was finally able to turn the machines, called quimby and fry, off.

Many of the services above were already no longer active when I started the weaning off (Jabber, Freenet, Tor). So what happened to the rest?

  • For emails, we all moved to https://mailbox.org. Happy to pay for such a crucial service.
  • For the mailing lists, especially for the Tip-Toi-Hacking project , https://jpberlin.de/ has decent enough rates that I don’t feel bad for paying for it.
  • Managing DNS is made super easy by dnscontrol; I’m using Hetzner’s DNS servers, but thanks to that tool that does not matter a lot
  • For those websites that are static enough, Github pages is nice. This includes the lecture notes site.
  • For those websites that need a little bit of server-side logic, e.g. for complex redirects and normalization procedures, or access control, but no state, I found that netlify introduced their Edge functions feature just in time. This was crucial for my main website.
  • I got rid of the backend of the two-player game Sum Serum completely, by using WebRTC. In terms of hosting, it is now just a boring static website, which is the best kind of website.
  • I converted all SVN and Darcs repositories to git, and pushed them to GitHub.
  • Although I was mildly proud to have kept the websites of my high school class and Counter Strike clan live and functional for many years after anyone stopped caring about them, I decided it is silly to keep doing that. I briefly thought about entombing that running system in a locked down virtual machine or so, but in the end simply used wget to create a static mirror of them, which is now hosted on netlify (using Edge Functions to restrict public access to a few pages).

In the end, I was not able to get rid of all services, so there is still a server running for me (smaller now, and virtual):

  • My photo album, https://bilder.joachim-breitner.de/, which is a bit too big for something like netlify.
  • Some dumb static webspace used by a family member’s business for internal reasons, which likewise is a bit too large for something like netlify or github actions, and not as critical as other sites.
  • The feed2imap jobs
  • For one Mediawiki instanced used for a genealogy project of a relative of mine I could not find a suitable hosted alternative. However, it is HTTP-AUTH-password-protected, so I am a bit less worried about staying on top of security updates for this PHP-drive site.
  • Also, I am still running the daily email job from the genealogy project, but now through nullmailer and Amazon SNS, to worry a bit less about the intricacies of modern email.

Debian → Nix

I took this opportunity to set up a new server for the residual services. I have been using Debian since 2001 and was a Debian Developer from 2003 to 2022, and it is a great operating system and a great project.

But after learning about Nix at DFINITY, and using NixOS on my laptop for almost two years, managing a system in a non-declarative way simply feels … wrong. Similar to programming in a non-functional programming language.

Hence I have shut down my Debian-based systems (two virtual machines called quimby and fry and the surrounding Dom0 freddy – my machines are named after Simpsons side kicks and (sometimes) Futurama characters), and am using the NixOS-based aarch64 host richard (because of the [“Nix” in the name]) instead.

git-multisect

Published 2023-02-27 in sections English, Digital World.

Because it seemed useful, and it seems it did not exist yet, I created a small tool called git-multisect. You point it to a git repo, give it a version range and a command to run, and it will list those commits that affect the output of the command.

Here is an example (with a slightly complex invocation):

$ git-multisect.py \
	-C ~/build/nixpkgs \
	-f $(nix flake metadata ~/nixos --json|jq -r ".locks.nodes[.locks.nodes[.locks.root].inputs.nixpkgs].locked.rev") \
	-t release-22.11 \
	-c 'nix path-info --derivation '\
	   '~/nixos#nixosConfigurations.richard.config.system.build.toplevel '\
	   '--no-update-lock-file '\
	   '--override-input nixpkgs ~/"build/nixpkgs?rev=$REV"' \
	--hide-stderr --log-option=--pretty=medium
Found 39 commits
[39 total,  0 relevant,  0 irrelevant,  0 skipped, 39 unknown] inspecing 2fb7d74 ...
[39 total,  0 relevant,  0 irrelevant,  0 skipped, 39 unknown] inspecing 569163e ...
[39 total,  0 relevant,  0 irrelevant,  0 skipped, 39 unknown] inspecing 5642ce8 ...
[39 total,  0 relevant,  0 irrelevant,  0 skipped, 39 unknown] inspecing e0c8cf8 ...
[39 total,  0 relevant,  1 irrelevant,  8 skipped, 30 unknown] inspecing 89d0361 ...
[39 total,  0 relevant,  1 irrelevant,  8 skipped, 30 unknown] inspecing d1c7e63 ...
[39 total,  0 relevant,  2 irrelevant,  9 skipped, 28 unknown] inspecing e6d5772 ...
[39 total,  0 relevant,  3 irrelevant,  9 skipped, 27 unknown] inspecing a099526 ...
[39 total,  1 relevant,  4 irrelevant,  9 skipped, 25 unknown] inspecing 854312a ...
[39 total,  1 relevant,  5 irrelevant, 10 skipped, 23 unknown] inspecing 95043dc ...
[39 total,  1 relevant,  6 irrelevant, 10 skipped, 22 unknown] inspecing 0cf4274 ...
[39 total,  2 relevant,  8 irrelevant, 29 skipped,  0 unknown] done

commit a0995268af8ba0336a81344a3bf6a50d6d6481b2
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Sat Feb 18 10:45:11 2023 -0800

    linux_{5_15,6_1}: revert patch to fix Equinix Metal bonded networking with `ice` driver (#216955)
…
commit 0cf4274b5d06325bd16dbf879a30981bc283e58a
Merge: 95043dc713d 532f3aa6052
Author: Pierre Bourdon <delroth@gmail.com>
Date:   Sun Feb 19 23:37:48 2023 +0900

    Merge pull request #217121 from NixOS/backport-216463-to-release-22.11

    [Backport release-22.11] sudo: 1.9.12p2 -> 1.9.13

As you can see it tries to be clever to not even look at commits that come between two commits where the output is equal (like git-bisect).

Over at https://github.com/nomeata/git-multisect you’ll find more explanations and examples. Hope you’ll maybe remember this when you find you’ll need it, and that it will prove to be useful.

Pro-charity consulting

Published 2023-01-08 in sections English.

A few times last year I was asked to help out with some software project – debugging a particularly gnarly nix issue, helping with GHC internals or reviewing Canister code of applications running on the Internet Computer. Usually, these are relatively small jobs, at least reasonably interesting, and it was often friends that approached me.

So I do want to help.

But I did not want to help for free. Incentives matter, and I wanted to incentivize careful use of my time. And of course I like to see my work valued. So normally, one would just charge an hourly fee.

But having to go through all the necessary legalese (setting up a contract, maybe an NDA, thinking about taxes and liabilities, possibly getting permission from the current employer) tilts the scale and would not made it worth it for me anymore.

But instead of just turning it down, a few times now already I used the following model: pro-charity consulting, where you still pay for what you get, but you don’t pay me, but instead a charity.

The rules of pro-charity consulting

The rough rules are as follows:

  • There is no contract. I trust you and you trust me, else we wouldn’t even be doing this.
  • There is no liability. You probably have a rough idea of what I can deliver and you trust me, so you don’t get to sue me if I screw up.
  • There is (usually) no NDA. Of course I’ll keep your secrets, but again you don’t get to sue me if I screw up.
  • I tell you about my usual hourly rate if this was a commercial agreement. (This may depend a lot on the field I’m helping you in.)
  • Afterwards I’ll also let you know how much time I have spend on helping you.
  • You’ll make a donation on a charity of our choosing. We’ll pick something that we both find worth supporting.
  • You can donate whatever the work was worth to you, but the expectation is that it shouldn’t be cheaper for you than if you’d have hired me commercially. (Unless I failed to fulfill your expectations.)
  • Legally, the donation is by you (e.g. you get to keep any tax benefits).
  • Morally, it is by me. For example if donations can come with a message (like on https://donate.openstreetmap.org/) it would be appreciated to mention me.
  • According to the German saying “Tue Gutes und rede darüber!” we both get to brag about this donation (target charity, amount, client, kind of work), e.g. on Blogs, Mastodon, Twitter, unless we have a different agreement. See below for some examples.

I’m writing the rules out here so that I can just point to this blog post when I next have to explain this scheme.

Pro-charity gigs so far