Joachim Breitner

Constructing a list in a Monad

Published 2013-11-13 in sections English, Haskell.

In Haskell, lists are ubiquitous. In real-world Haskell, monads are ubiquitous. So it should be an easy task to fill a list with values obtained in a monad, i.e. a list of n random numbers (in the Rand monad), or getting some input from the user, and storing it in a list, until the user chooses to finish the input.

The task

For now, assume a function getInput :: IO Int; we want to run it 10000 and then have the list of the values returned. Sounds simple and straight forward, but it is actually really hard to get the run-time behaviour that we naively and optimistically expect.

The expectation

In an imperative setting, we would have a simple counting loop. In the loop, we run getInput, allocate a new cell for the linked list, and append it to the list. Clearly, this

  • consumes no stack space and
  • at the end of the loop, the list is already fully evaluated in memory, ready to be used.

Can we achieve that with Haskell? Let’s look at a few alternatives (complete code available)

Attempt 1: The accumulator

I’ll start with the probably worst attempt. Unfortunately, from what I have seen from correcting Haskell exams, this is what beginners would write, especially if they have the above imperative algorithm in mind, and know about the “convert loops to recursion with accumulators” approach to solving Haskell problems:

accumAppend :: Int -> IO [Int]
accumAppend n = go n []
  where
    go 0 l = return l
    go n l = do {x <- getInput; go (n-1) (l++[x])}

To check the space usage of this, I pass -Kn to the GHC runtime and see what limit is just enough to let this program calculate accumAppend 10000 and fully evaluate the resulting list. It turns out that we need 328336 bytes of stack space.

So what about the second requirement; that the list is fully evaluated? Experienced Haskellers see it immediatelly: The accumulator is not used in the loop, so each iteration adds a thunk around the previous value, that would add the next entry to the end. Only when the list is actually used, these are run, each traversing the whole list, causing quadratic cost. We can verify this using ghc-vis:

Clearly (and not surprisingly) this is not the best solution.

Attempt 2: The accumulator (forced)

Lets try to fix the second issue, by making sure that the list is always fully evaluated. We import Control.DeepSeq and write

accumAppendSeq :: Int -> IO [Int]
accumAppendSeq n = go n []
  where
    go 0 l = return l
    go n l = do {x <- getInput ; go (n-1) $!! (l++[x])}

As the accumulator gets fully evaluated before being passed to the next invocation of go, the result will not contain thunks. But note that this has also fixed the stack consumption: The code now runs the smallest setting for -K, 8 bytes. This works as both go and deepseq are tail-recursive.

But this is still not the solution we are looking for, as the quadradic cost caused by using ++ is still there.

Attempt 3: The accumulator (reversed)

Since we know that l++[x] is expensive, but x:l is cheap, we can simply use this to update the accumulator. This has the slightly annoying consequence that the entries of the list are in the reverse order, but we can fix that in the base case:

accumReverse :: Int -> IO [Int]
accumReverse n = go n []
  where
    go 0 l = return (reverse l)
    go n l = do {x <- getInput ; go (n-1) (x:l)}

And indeed, we get noticably faster code that still runs in 8 bytes of stack. Unfortunately, we still don’t construct the final list directly.

Attempt 4: Naive recursion

Maybe we can do without an accumulator. The most naive such code would be

listRecurse :: Int -> IO [Int]
listRecurse n = go n
  where
    go 0 = return []
    go n = do {x <- getInput; r <- go (n-1); return (x:r)}

and I’d expect that most experienced Haskellers would write that code (if they are asked not to use a library function). And indeed, this code does create the final list directly. But it requires a large stack or 164616 bytes. The reason is that this is no longer tail recursive: After calling go again we need to take the result and prepend x

Attempt 5: library functions

Maybe we should simply not worry about the implementation, and use library functions? Clearly, the authors of such libraries have found the best solution. So how does

listReplicateM :: Int -> IO [Int]
listReplicateM n = Control.Monad.replicateM n getInput

fare? Unfortunately, not better than the naive recursion. In fact, the stack space requirement is the same 164616 bytes. The same holds when using special libraries for generating and consuming list, e.g. the Streams module from the vector package:

listStreams :: Int -> IO [Int]
listStreams n = MStream.toList $ MStream.replicateM n getInput

Attempt 6: Difference lists

It is time to dig deeper into our box of tricks. What if we have lists with constant time Snoc (a.k.a. appending a new element)? Then Attempt 3 would not require reversing the list. Such lists are provided by difference lists; there are libraries for that, but we can simply implement them using lambdas:

accumDList :: Int -> IO [Int]
accumDList n = go n id
  where
    go 0 l = return (l [])
    go n l = do {x <- getInput; go (n-1) (l . (x:))}

Indeed, no stack is required (8 bytes are enough). But we are cheating here: We are still not constructing the final list, but rather we combine functions that append elements to lists. So when accumDList returns, what we have in memory, is a thunk and sequence of partial function applications in the wrong order, and evaluating the thunk does basically the same thing as reverse in Attempt 3. It might be a bit faster, is still not satisfying enough:

Attempt 7: Unsafe functions

Even deepter in the box of tricks, somewhere in the grit and dust where one usually should not reach, there is a function that can help us out here: unsafeInterleaveIO. This allow us to modify the naive recursion in a way that first creates the Cons-cell, and then continues the loop:

listUnsafeInterleave :: Int -> IO [Int]
listUnsafeInterleave n = do {l <- go n; return $!! l}
  where
    go 0 = return []
    go n = do x <- getInput
              r <- unsafeInterleaveIO (go (n-1))
              return (x:r)

The stack consumption is 8 bytes. It is worth noting that the all to deepseq (in $!!) will actually drive the evaluation of go. Also, it guarantees that the unsafe effects of unsafeInterleaveIO are confined to the execution of listUnsafeInterleave, i.e. are actually safe.

 Run time statistics

I ran criterion over the functions (generating lists of length 1000), and found these results:

  • accumAppend: 6.5 ms
  • accumAppendSeq: 8.2 ms
  • accumReverse: 14.0 us
  • listRecurse: 8.0 us (same as listReplicateM and listStreams)
  • accumDList: 12.0 us
  • listUnsafeInterleave: 174.0 us

Conclusion

There are many ways to write a list-constructing list in Haskell, none are perfect – they either fill the heap, or do not construct the list directly (usually requiring a second pass through it), or rely on IO-specific unsafe functions. The naive recursion performs the best, but may cause a stack overflow (note, though that from GHC 7.8, the stack will be unlimited).  The next best option seems to be the difference lists

I’m still wondering: What would be required from Haskell, GHC or the monads in question to have a fully satisfactory solution here, i.e. one that is as fast as the naive recursion, as stack efficient as the difference list solution, and allocates no thunks, only list cells?

Comments

Rewrite your get function to provide a conduit.
#1 Anonymous am 2013-11-13
And then?
#2 Joachim Breitner (Homepage) am 2013-11-13
How is attempt 5 not a perfect solution? Is the cost of evaluating the thunk compared to regular cons cell really significant in any practical way? I highly doubt it. Have you compared attempt 5 and 6 in a benchmark?
#3 CJ van den Berg am 2013-11-13
I planned to add some benchmarks, but I’m mostly aiming at the beauty of functional code that will, on the machine, be the slickest possible, or at least as good as naive imperative code. And for this task, there should be no need to add thunks.

(Note that I noticed that the numbering was broken and I fixed that; I believe you are referring to the difference list solution.)
#4 Joachim Breitner (Homepage) am 2013-11-13
Statistics added.
#5 Joachim Breitner (Homepage) am 2013-11-13
Yes, I was referring to the difference list solution as ‘perfect’. Your benchmarks look good to me. For a real world appication though, I would probably even prefer the reverse based solution, purely because it is the simplest solution with acceptable performance (IMO). Unless of course you’re not worried about stack space, in which case replicateM wins for both simplicity and performance.

AFAIK, there is no way around the stack space or heap space trade off when building lists, at least for non-mutating code. You have to pick one or the other. I don’t think this is a haskell specific problem.
#6 CJ van den Berg am 2013-11-13
That's exactly the reason why I stopped learning more Haskell. It's rarely possible, to have an intuitive, correct and reasonable efficient solution with Haskell. And that's what I was looking for when learning Haskell.

I fail to see, what the difficulty is for a (Haskell) compiler to recognize, that the written algorithm is best compiled to a (possibly partly unrolled) loop and that the thunks are immediately evaluated (not even built), as the data is used afterwards anyway.
#7 Patrick am 2013-11-13
Use Pipes, it's designed exactly for this purpose. Not constructing a list as such, but producing/consuming streams from monadic code in constant space.
#8 Anonymous am 2013-11-13
I know about pipes and conduits; I was nevertheless curious about how well constructing a list will work.
#9 Joachim Breitner (Homepage) am 2013-11-14
Why do you not just force the accumulator in example 3?

> go n l = do {x <- getInput ; go (n-1) $!! (x : l)}
#10 Anonymous am 2013-11-14
You mean the accumulator with `reverse`? Because it is not needed: The Cons-Operation never creates thunks, to the accumulator is indeed created fully evaluated!
#11 Joachim Breitner (Homepage) am 2013-11-14
Hi,
what about

liftList :: Int -> IO [Int]
liftList n = go n (return [])
where
go 0 = id
go n = liftM2 (:) getInput . go (n-1)

?

It seems to execute in reasonable time but I'm not sure about the stack consumption... (by the way, how exactly do you do that?)
#12 malarbol am 2013-11-27
This will also consume stack space; the liftM does nothing different from listRecurse.

You can check the stack by limiting it (+RTS -K...), and see if it uses a non-trivial amount of it.
#13 Joachim Breitner (Homepage) am 2013-11-27
oh, of course!
If I just write it

go 0 = return []
go n = liftM2 (:) getInput $ go $ n-1

it is exactly the same as listRecurse... I guess I'm a the naive kind!
#14 malarbol am 2013-11-27
But the weird thing is that it does use less memory and time that listRecurse...
#15 malarbol am 2013-11-27

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.