Joachim Breitner

Faster Winter 1: Vectors

Published 2019-11-18 in sections English, Haskell.

(This is the first optimization presented in the “faster winter” series, please see that post for background information.)

Immutable vectors

The first very low hanging fruit was to be found in the Wasm.Runtime.Memory module. The memory of the Wasm module was defined as follows:

import Data.Vector (Vector)

data MemoryInst m = MemoryInst
  { _miContent :: Mutable m (Vector Word8)
  , _miMax :: Maybe Size
  }

An immutable vector in a mutable box (Mutable m is something like IORef or STRef, whatever is appropriate for the monad m).

This means that every write to the memory has to copy the whole memory into a new vector. This is nice from a semantic point of view (e.g. you can take a snapshot of the memory at a specific point in time, evaluate the code some more, and the immutable vector is untouched), but obviously very inefficient.

Mutable vectors

So I changed it to a mutable vector (MVector) that allows in-place updates:

import Data.Vector (MVector)

data MemoryInst m = MemoryInst
  { _miContent :: Mutable m (MVector (PrimState m) Word8)
  , _miMax :: Maybe Size
  }

I still need to place that mutable vector in a mutable box, because an MVector still has fixed size but Wasm programs can resize their memory. This mean we still copy the whole memory when the memory grows, but not during normal writes.

Improvement: Allocations: -97.71% Memory: +16.87% Time: -98.88% (Commit af6adfc...254d700)

This is much more … usable!

Oddly, memory consumption increases,and I am not entirely sure why: My guess is that because it runs faster, the garbage collector runs less often, the program can allocate more stuff in between garbage collector runs.

Unboxed vectors

In a second step I also replaced the vector with an unboxed vector: Above we have the normal Data.Vector vector, which stores pointers (8 bytes!) to elements, even if the element is a single byte like here. By importing Data.Vector.Unboxed (no other change!) I get a nicely packed array of bytes.

Interestingly, while it improved maximum memory usage quite a little bit, it looks like it made the program actually a bit slower – potentially because it now had to box and unbox these Word8 values.

Improvement: Memory: -27.04% Time: +15.82% (4061fe6)

There may be more improvements possible here (e.g. when writing a Word32, maybe that can be done in one go, instead of writing the four bytes separately), but I did not follow this clue any further for now.

Instead, the next thing that looked like it was worth investigating was overly polymorphic code.

Comments

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.