Just some small thoughts about cyclic lists in Haskell. Every Haskell beginner knows that you can use infinite lists, as long as you don’t fully evalute them. So, this is perfectly valid
endless = [0..] -- All natural numbers!
main = print (endless !! 10)
It will not crash, but print "10" as the list has not been fully used.
What happens if we take a piece of the list further down, let’s say at position 1000000000:
endles = [0..]
main = print (endless !! (10^9) )
If you try this at home, better run "ulimit -S -v 1000000" before, because then you’ll get "test: out of memory (requested 1048576 bytes)" instead of a sluggishly swapping machine. What happened? The long list will be evaluated, and fills the memory.
Does this mean that we can not use arbitrary long lists? Let’s try a special case: A list that is infinitely long, but repeating the same values all over again:
list = "Oh, happy day! Oh, happy day. " cycle' list = list ++ cycle' list main = do let rlist = cycle' list print ( rlist !! (10^9), rlist !! 0 )
We repeat the (finite) list endlessly, and then try to pick the 1000000000th element. We also pick the first element again, to make sure the compiler does not cheat by forgetting the first 999999999 elements (It’s actually pretty nice that the compiler will forget these elements, but not what I want to demonstrate here). Running that code, sure enough, fills up the memory.
But maybe I was coding badly. At least I have re-implemented a function that already exists, which is bad practice. Let’s try with Haskell’s own cycle:
list = "Oh, happy day! Oh, happy day. " main = do let rlist = cycle list print ( rlist !! (10^9), rlist !! 0 )
Now it takes a while, but surprisingly, the memory gauge does not skyrocket, and in the end I’m told that the 1000000000th character in my infinite character string is 'd'. This leads me to the conclusion that the Haskell library uses black magic. Or does it? Here is the definition of cycle:
cycle xs = xs' where xs' = xs ++ xs'
What is the difference to our cycle'? Here, the result is given a name (xs'), which is used again inside the function. So while our cycle' appends the list over and over again, filling up the RAM, their cycle ties a loop and makes the end of the list refer to it’s beginning. And my list lookup will no longer evaluate the list up to infinity, but just run around in, well, cycles until it has counted down from 1000000000. I could even ask for the last element of this list, and it will not use any more RAM than a small, finite list, while endlessly searching for the end of the list.
Despite Haskell being a very high level language, I sometimes wonder how my data will look like on the physical memory. And as you can see, it can make a difference. Some more thoughts on this were written down by Duncan Coutts.
Unfortunately, issues like this mean that the user of a functional language has to keep non-functional issues in mind, and must have some model of how language features are going to be implemented, to have a hope of producing code that will be executed efficiently. Sometimes these issues will result in a requirement on the implementation, like the requirement that Scheme compilers and interpreters handle tail-recursion without an ever-growing stack.
The definition of `cycle' in the standard library creates a circular data structure. The seminal paper on this is Bird 84, "Using circular programs to eliminate multiple traversals of data" and Lloyd Allison has additional work online at http://www.csse.monash.edu.au/~lloyd/tildeFP/1989SPE/