Joachim Breitner

Don’t think, just defunctionalize

Published 2020-12-22 in sections English, Haskell.

TL;DR: CPS-conversion and defunctionalization can help you to come up with a constant-stack algorithm.

Update: Turns out I inadvertedly plagiarized the talk The Best Refactoring You’ve Never Heard Of by James Koppel. Please consider this a form of sincere flattery.

The starting point

Today, I’ll take you on a another little walk through the land of program transformations. Let’s begin with a simple binary tree, with value of unknown type in the leaves, as well as the canonical map function:

MuniHac 2019 and Haskell Love 2020.

Comments

You probably are already familiar with this, but this idea goes back to “Definitional Interpreters for Higher-Order Languages” by Reynolds. His purpose was to avoid the defined interpreter depending on details of the language it’s defined within, e.g. evaluation order.

It was then focused on Olivier Danvy and Darius Biernacki and others in the dual papers “A Functional Correspondence between Evaluators and Abstract Machines” and “From Interpreter to Compiler and Virtual Machine: a Functional Derivation”.

There were several papers using the same idea in different contexts throughout the years afterward which I collected here: http://lambda-the-ultimate.org/node/2423#comment-38384

I use this technique in a similar way to explain a GCC optimization, and here I go through an almost identical derivation as yours for the same reason. “Defunctionalization at Work” also by Danvy (and Nielsen) shows similar examples and others including the bidirectional nature of these transforms.

Finally, the result isn’t “stack-less”. It’s just that we’ve made the stack explicit, namely K' is the stack. It’s, of course, well-known that that recursion can be eliminated with an explicit stack, and CPS transforming followed by defunctionalization is one of the nicer ways of showing this. Defunctionalizing CPSed code will always produce a type isomorphic to a list, i.e. a stack (or something simpler if there is no recursion). This won’t be true if the original code had control operators like callCC.

At any rate, this transformation, which turns control structure into an explicit data structure, is indeed often very enlightening. It’s also reasonably reversible allowing you to understand and/or reimplement low-level code in a high-level manner.

#1 Derek Elkins am 2020-12-22

I think it’s instructive to at least briefly mention possible performance implications in the case of GHC/Haskell. One would expect that by playing clever tricks we could save at least something in running rime. In reality, map6 performs more than 2x worse than map1 on my (rather old) laptop with a (rather recent) GHC 8.10.2 with -O2 on a complete tree of height 23 and (+ 1) as the function I learned it the hard way when trying to optimize Haskell’s binary-trees in the Benchmarks Game once…

#2 Artem Pelenitsyn am 2020-12-26

Thanks for your defunctionalising blog post, linked below.

Three thoughts:

  1. You might want to mention The Zipper (due to Huet). It’s a standard, repeatable way of doing the bit that made your head spin under “Think?”.

  2. Under “The imperative loop” I think you actually make the program worse. Before you have up and down, two loops that jump to each other – they are both join points. Super efficient. (I have not actually compiled it to check, but I’m reasonably sure.) But by using an Either as the parameter you add extra allocation and pattern matching to every iteration.

  3. Indeed SpecConstr may well precisely undo the change you have just made.

  4. If memory serves, Martin C. Henson’s book, Elements of Functional Languages, Blackwell Scientific Publications (1987) takes defunctionalisation as its central theme. It’s barely known but it’s an excellent book. I have a copy but I don’t know how available it is.

#3 Simon Peyton Jones am 2020-12-28

Thanks for these points!

I guess I should have mentioned the Zipper (but several commenters have since :-)).

I am not sure if it would really involve no thinking – a zipper allows the program to navigate efficiently into all directions, and (typically) has the same type of values everywhere. But the tree map function has to change value types, and there is a type-based distinction between “visited” and “to be visited” nodes. So that somehow needs to be taken into account here…

#4 Joachim Breitner am 2020-12-28

the tree map function has to change value types, and there is a type-based distinction between “visited” and “to be visited” nodes. So that somehow needs to be taken into account here…

The relevant extension of the zipper technique is Clowns to the Left of me, Jokers to the Right.

#5 Dr. ÉRDI Gergő am 2021-01-23

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.