Joachim Breitner

Faster Winter 2: SPECIALIZE

Published 2019-11-18 in sections English, Haskell.

(This is the second optimization presented in the “faster winter” series, please see that post for background information.)

The evaluator of the WebAssembly interpreter is highly polymorphic, as you can tell from the type signature of its main function:

type CEvalT f m a = ReaderT (Config f m) (ExceptT EvalError m) a
eval :: (Regioned f, MonadRef m, Show1 f)
     => Code f m -> CEvalT f m (Stack Value)

This mean the caller can choose whatever monad m this should use. This is very flexible, but it is also really hard to make that run fast: Every line in do notation is a call to the monadic bind operator (>>=), and since this operator is provided by the caller, the code of eval has to do do many calls to this unknown function.

GHC can do a much better job at optimizing this code if it knows which concrete monad is used, so that it knows the actual definition of that monad’s (>>=). Because then the compiler can inline it and make the whole monad overhead go away.

The wasm-invoke program actually uses the plain IO monad for m. We could now rewrite the whole module to not be polymorphic in m, but use IO instead. But that’s a lot of work and some user might genuinely want a different monad (e.g. ST s, or something else).

Luckily we can tell the compiler: “Let’s keep eval with this general type signature, but please also provide versions specifically for IO, and use that if possible”, by giving a SPECIALIZE pragma:

{-# SPECIALIZE eval ::
       Code Identity IO -> CEvalT Identity IO (Stack Value) #-}

This runs much faster now:

Improvement: Allocations: -48.65% Memory: +52.56% Time: -50.29% (Commit 4061fe6...2e284f8)

Again, as when we introduced mutable vectors, memory usage increases a bit, likely due to more stuff happening between garbage collections.

The code runs faster, but memory consumption is still absurd. So it is high time to hunt down the space leak, as the interpreter should not need any significant amounts of memory, and instead execute the program using a constant and low amount of memory. More about that in the next entry of this series.

Comments

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.