This is the first of several posts on the topic of Haskell’s laziness. After several weeks of playing, I’m coming to the conclusion that laziness-by-default is a hinderance rather than a virtue. Let’s start at the start though by trying to add some numbers together.
-- Non tail recursive; 5Mb of live objects at end.
mysum [] = 0
mysum (x:xs) = x + mysum xs
main = putStrLn $ show $ mysum $ take 100000 $ [1..]
As the comment says, this is a dumb version. It consumes 5Mb of memory because it’s not tail recursive.
Incidentally, after causing my machine to thrash several time during my experiments, I found it useful to use ‘ulimit’ to restrict the maximum heap size available to the process. Also, you can pass extra args to your haskell app to get it to report real-time memory stats, like this:
ghc sum.hs && /bin/bash -c 'ulimit -Sv 100000; ./a.out +RTS -Sstderr'
Anyhow, the memory blowup is easy to fix; just pass an ‘accumulator’ parameter when you do the recursive call:
-- Tail recursive, but 3.5Mb of live objects at end.
mysuma acc [] = acc
mysuma acc (x:xs) = mysuma (acc+x) xs
main = putStrLn $ show $ mysuma 0 $ take 100000 $ [1..]
Hmm, it’s now tail recursive but it still consumes 3.5Mb? This is where Haskell’s laziness makes things quite different from ocaml and other strict languages. When we pass the accumulated value, haskell does not actually evaluate the addition prior to making the recursive call. It will delay the computation until its value is actually required. So, on each recursive call, the accumulator looks like an unevaluated “1+2” and then “1+2+3” etc.
We can fix this by explicitly telling haskell to evaluate the addition prior to making the call:
-- Tail recursive, with 'seq' to force immediate evaluation of addition.
-- 40k of live objects at end.
mysumas acc [] = acc
mysumas acc (x:xs) = (acc+x) `seq` mysumas (acc+x) xs
main = putStrLn $ show $ mysumas 0 $ take 100000 $ [1..]
Finally we have a program which only consumes a tiny amount of heap space. But it took a surprising amount of effort. There’s lots more information about this situation on the haskell wiki.