Joachim Breitner

Faster Winter 7: The Zipper

Published 2019-11-24 in sections English, Haskell, Internet Computer.

(This is the seventh optimization presented in the “faster winter” series, please see that post for background information.)

The last bit of performance optimization could be considered a domain-specific optimization, as one might describe it as “introducing a control stack to the interpreter”. But in a different light, one could consider it the introduction of a more appropriate data structure, by applying a “zipper”-like transformation to the existing data structure.

The problem is that the state of the system (datatype Code) is a stack of values and a stack of instructions. Simplifying this for this post, we might have

data Code = Code [Value] [Instr]
data Instr
  = Const Value | Add | Sub | Return
  | Labeled Int Code

The interpreter gets such a Code, but does not always just execute the top-most instruction: If that is a Labeled instruction, it has to execute the next instruction in the argument of that Labeled. (It is not very important at this point to understand what a Labeled is used for). So we might have a Code that looks like the following:

c1 = Code [] [Labeled 1 (Code [2] [Labeled 2 (Code [3,4] [Add]), Sub]), Return]

The next instruction to execute is actually the Add. But in order to find that, the function step has to look under the first Labeled, look under the second Labeled, then execute step (Code [3,4] [Add]) = Code [7] [], and finally wrap it again in the two Labeled, to produce:

c2 = Code [] [Labeled 1 (Code [2] [Labeled 2 (Code [7] []), Sub]), Return]

Then eval calls step again, and step has to look inside the Labeled again to find the next instruction to execute.

It would be easier if the next instruction to execute would be presented to step more conveniently, right as a field of Code. But in order to do this, we have to move the Labeled entries “out of the way”. I do that by adding a new, first parameter to Code where I keep a stack of all the Label constructor that were in the way, in reverse order. So the c1 above would now be

data Code = Code Control [Value] [Instr]
data Control = Empty | Labeled Int Code

c1' = Code (Labeled 2 (Code (Labeled 1 (Code Empty [] [Return])) [2] [Sub]) [3,4] [Add]

Can you see how this relates to c1 above? The important piece here is that the interpreter finds the next instruction to execute always at the head of the instruction list right of the outermost code, and as long as there is something to execute there, it doesn’t have to touch the Control stack at all.

This required touching some more lines of code than the previous optimizations, but doing so didn’t require much creativity, as the old and new Code types are in clear correspondance, and that guides me in how to use adjust the code. And it’s certainly worth it:

Improvement: Allocations: -46.17% Time: -36.88% (Commit e66f1e0…57f3e9d)

Comments

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.