# Joachim Breitner's Homepage

## Constructing a list in a Monad

In Haskell, lists are ubiquitous. In real-world Haskell, monads are ubiquitous. So it should be an easy task to fill a list with values obtained in a monad, i.e. a list of n random numbers (in the `Rand`

monad), or getting some input from the user, and storing it in a list, until the user chooses to finish the input.

### The task

For now, assume a function `getInput :: IO Int`

; we want to run it 10000 and then have the list of the values returned. Sounds simple and straight forward, but it is actually really hard to get the run-time behaviour that we naively and optimistically expect.

### The expectation

In an imperative setting, we would have a simple counting loop. In the loop, we run `getInput`

, allocate a new cell for the linked list, and append it to the list. Clearly, this

- consumes no stack space and
- at the end of the loop, the list is already fully evaluated in memory, ready to be used.

Can we achieve that with Haskell? Let’s look at a few alternatives (complete code available)

### Attempt 1: The accumulator

I’ll start with the probably worst attempt. Unfortunately, from what I have seen from correcting Haskell exams, this is what beginners would write, especially if they have the above imperative algorithm in mind, and know about the “convert loops to recursion with accumulators” approach to solving Haskell problems:

accumAppend :: Int -> IO [Int] accumAppend n = go n [] where go 0 l = return l go n l = do {x <- getInput; go (n-1) (l++[x])}

To check the space usage of this, I pass `-K`

to the GHC runtime and see what limit is just enough to let this program calculate *n*`accumAppend 10000`

and fully evaluate the resulting list. It turns out that we need **328336** bytes of stack space.

So what about the second requirement; that the list is fully evaluated? Experienced Haskellers see it immediatelly: The accumulator is not used in the loop, so each iteration adds a thunk around the previous value, that would add the next entry to the end. Only when the list is actually used, these are run, each traversing the whole list, causing quadratic cost. We can verify this using ghc-vis:

Clearly (and not surprisingly) this is not the best solution.

### Attempt 2: The accumulator (forced)

Lets try to fix the second issue, by making sure that the list is always fully evaluated. We import `Control.DeepSeq`

and write

accumAppendSeq :: Int -> IO [Int] accumAppendSeq n = go n [] where go 0 l = return l go n l = do {x <- getInput ; go (n-1) $!! (l++[x])}

As the accumulator gets fully evaluated before being passed to the next invocation of `go`

, the result will not contain thunks. But note that this has also fixed the stack consumption: The code now runs the smallest setting for `-K`

, **8** bytes. This works as both `go`

and `deepseq`

are tail-recursive.

But this is still not the solution we are looking for, as the quadradic cost caused by using `++`

is still there.

### Attempt 3: The accumulator (reversed)

Since we know that `l++[x]`

is expensive, but `x:l`

is cheap, we can simply use this to update the accumulator. This has the slightly annoying consequence that the entries of the list are in the reverse order, but we can fix that in the base case:

accumReverse :: Int -> IO [Int] accumReverse n = go n [] where go 0 l = return (reverse l) go n l = do {x <- getInput ; go (n-1) (x:l)}

And indeed, we get noticably faster code that still runs in **8** bytes of stack. Unfortunately, we still don’t construct the final list directly.

### Attempt 4: Naive recursion

Maybe we can do without an accumulator. The most naive such code would be

listRecurse :: Int -> IO [Int] listRecurse n = go n where go 0 = return [] go n = do {x <- getInput; r <- go (n-1); return (x:r)}

and I’d expect that most experienced Haskellers would write that code (if they are asked not to use a library function). And indeed, this code does create the final list directly. But it requires a large stack or **164616** bytes. The reason is that this is no longer tail recursive: After calling `go`

again we need to take the result and prepend `x`

### Attempt 5: library functions

Maybe we should simply not worry about the implementation, and use library functions? Clearly, the authors of such libraries have found the best solution. So how does

listReplicateM :: Int -> IO [Int] listReplicateM n = Control.Monad.replicateM n getInput

fare? Unfortunately, not better than the naive recursion. In fact, the stack space requirement is the same **164616** bytes. The same holds when using special libraries for generating and consuming list, e.g. the Streams module from the vector package:

listStreams :: Int -> IO [Int] listStreams n = MStream.toList $ MStream.replicateM n getInput

### Attempt 6: Difference lists

It is time to dig deeper into our box of tricks. What if we have lists with constant time Snoc (a.k.a. appending a new element)? Then Attempt 3 would not require reversing the list. Such lists are provided by difference lists; there are libraries for that, but we can simply implement them using lambdas:

accumDList :: Int -> IO [Int] accumDList n = go n id where go 0 l = return (l []) go n l = do {x <- getInput; go (n-1) (l . (x:))}

Indeed, no stack is required (**8** bytes are enough). But we are cheating here: We are still not constructing the final list, but rather we combine functions that append elements to lists. So when `accumDList`

returns, what we have in memory, is a thunk and sequence of partial function applications *in the wrong order*, and evaluating the thunk does basically the same thing as `reverse`

in Attempt 3. It might be a bit faster, is still not satisfying enough:

### Attempt 7: Unsafe functions

Even deepter in the box of tricks, somewhere in the grit and dust where one usually should not reach, there is a function that can help us out here: `unsafeInterleaveIO`

. This allow us to modify the naive recursion in a way that *first* creates the Cons-cell, and *then* continues the loop:

listUnsafeInterleave :: Int -> IO [Int] listUnsafeInterleave n = do {l <- go n; return $!! l} where go 0 = return [] go n = do x <- getInput r <- unsafeInterleaveIO (go (n-1)) return (x:r)

The stack consumption is **8** bytes. It is worth noting that the all to `deepseq`

(in `$!!`

) will actually drive the evaluation of `go`

. Also, it guarantees that the unsafe effects of `unsafeInterleaveIO`

are confined to the execution of `listUnsafeInterleave`

, i.e. are actually safe.

### Run time statistics

I ran criterion over the functions (generating lists of length 1000), and found these results:

- accumAppend: 6.5 ms
- accumAppendSeq: 8.2 ms
- accumReverse: 14.0 us
- listRecurse: 8.0 us (same as listReplicateM and listStreams)
- accumDList: 12.0 us
- listUnsafeInterleave: 174.0 us

### Conclusion

There are many ways to write a list-constructing list in Haskell, none are perfect – they either fill the heap, or do not construct the list directly (usually requiring a second pass through it), or rely on `IO`

-specific unsafe functions. The naive recursion performs the best, but may cause a stack overflow (note, though that from GHC 7.8, the stack will be unlimited). The next best option seems to be the difference lists

I’m still wondering: What would be required from Haskell, GHC or the monads in question to have a fully satisfactory solution here, i.e. one that is as fast as the naive recursion, as stack efficient as the difference list solution, and allocates no thunks, only list cells?

## Comments

(Note that I noticed that the numbering was broken and I fixed that; I believe you are referring to the difference list solution.)

AFAIK, there is no way around the stack space or heap space trade off when building lists, at least for non-mutating code. You have to pick one or the other. I don’t think this is a haskell specific problem.

I fail to see, what the difficulty is for a (Haskell) compiler to recognize, that the written algorithm is best compiled to a (possibly partly unrolled) loop and that the thunks are immediately evaluated (not even built), as the data is used afterwards anyway.

> go n l = do {x <- getInput ; go (n-1) $!! (x : l)}

what about

liftList :: Int -> IO [Int]

liftList n = go n (return [])

where

go 0 = id

go n = liftM2 (:) getInput . go (n-1)

?

It seems to execute in reasonable time but I'm not sure about the stack consumption... (by the way, how exactly do you do that?)

You can check the stack by limiting it (+RTS -K...), and see if it uses a non-trivial amount of it.

If I just write it

go 0 = return []

go n = liftM2 (:) getInput $ go $ n-1

it is exactly the same as listRecurse... I guess I'm a the naive kind!

Small talk explaining that: https://www.youtube.com/watch?v=EiIZlX_k89Y&ab_channel=ErlangSolutions or https://www.youtube.com/watch?v=VDA1kyWZYHc&ab_channel=LambdaWorld

(same talk on 2 conferences, slides linked in the comments).

At least that would be my 2 cents :)

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.