Joachim Breitner

More recursive definitions

Published 2022-09-03 in sections English, Haskell.

Haskell is a pure and lazy programming language, and the laziness allows us to write some algorithms very elegantly, by recursively referring to already calculated values. A typical example is the following definition of the Fibonacci numbers, as an infinite stream:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

Elegant graph traversals

A maybe more practical example is the following calculation of the transitive closure of a graph:

import qualified Data.Set as S
import qualified Data.Map as M

type Graph = M.Map Int [Int]

transitive1 :: Graph -> Graph
transitive1 g = M.map S.toList sets
  where
    sets :: M.Map Int (S.Set Int)
    sets = M.mapWithKey (\v vs -> S.insert v (S.unions [ sets M.! v' | v' <- vs ])) g

We represent graphs as maps from vertex to their successors vertex, and define the resulting map sets recursively: The set of reachable vertices from a vertex v is v itself, plus those reachable by its successors vs, for which we query sets.

And, despite this apparent self-referential recursion, it works!

ghci> transitive1 $ M.fromList [(1,[3]),(2,[1,3]),(3,[])]
fromList [(1,[1,3]),(2,[1,2,3]),(3,[3])]

Cyclic graphs ruin it all

These tricks can be very impressive … until someone tries to use it on a cyclic graph and the program just hangs until we abort it:

ghci> transitive1 $ M.fromList [(1,[2,3]),(2,[1,3]),(3,[])]
fromList [(1,fromList ^CInterrupted.

At this point we are thrown back to implement a more pedestrian graph traversal, typically keeping explicit track of vertices that we have seen already:

transitive2 :: Graph -> Graph
transitive2 g = M.fromList [ (v, S.toList (go S.empty [v])) | v <- M.keys g ]
  where
    go :: S.Set Int -> [Int] -> S.Set Int
    go seen [] = seen
    go seen (v:vs) | v `S.member` seen = go seen vs
    go seen (v:vs) = go (S.insert v seen) (g M.! v ++ vs)

I have written that seen/todo recursion idiom so often in the past, I can almost write it blindly And indeed, this code handles cyclic graphs just fine:

ghci> transitive2 $ M.fromList [(1,[2,3]),(2,[1,3]),(3,[])]
fromList [(1,[1,2,3]),(2,[1,2,3]),(3,[3])]

But this is a bit anticlimactic – Haskell is supposed to be a declarative language, and transitive1 declares my intent just fine!

We can have it all

It seems there actually is a way to write essentially the code in transitive1, and still get the right result in all cases, and I have just published a possible implementation as rec-def. In the module Data.Recursive.Set we find an API that resembles that of Set, with a type RSet a, and in addition to conversion functions from and to sets, we find the two operations that we needed in transitive1:

insert :: Ord a => a -> RSet a -> RSet a
unions :: Ord a => [RSet a] -> RSet a
get :: RSet a -> Set a

Let’s try that:

import qualified Data.Recursive.Set as RS

transitive2 :: Graph -> Graph
transitive2 g = M.map (S.toList . RS.get) sets
  where
    sets :: M.Map Int (RS.RSet Int)
    sets = M.mapWithKey (\v vs -> RS.insert v (RS.unions [ sets M.! v' | v' <- vs ])) g

And indeed it works! Magic!

ghci> transitive2 $ M.fromList [(1,[3]),(2,[1,3]),(3,[])]
fromList [(1,[1,3]),(2,[1,2,3]),(3,[3])]
ghci> transitive2 $ M.fromList [(1,[2,3]),(2,[1,3]),(3,[])]
fromList [(1,[1,2,3]),(2,[1,2,3]),(3,[3])]

To show off some more, here are small examples:

ghci> let s = RS.insert 42 s in RS.get s
fromList [42]
ghci> :{
  let s1 = RS.insert 23 s2
      s2 = RS.insert 42 s1
  in RS.get s1
 :}
fromList [23,42]

How is that possible? Is it still Haskell?

The internal workings of the RSet a type will be the topic of a future blog post; let me just briefly mention that it uses unsafe features under the hood, and just keeps applying the equations you gave until a fixed-point is reached. Because it starts with the empty set and all operations provided by Data.Recursive.Set are monotonous (e.g. no difference) it will eventually find the least fixed point.

Despite the unsafe machinery under the hood, I claim that Data.Recursive.Set is itself nicely safe, and does not destroy Haskell’s nice properties like purity, referential transparency and equational reasoning. If you disagree, I’d like to hear about it (here, on Twitter, Reddit or Discourse)! There is a brief discussion at the end of the tutorial in Data.Recursive.Example.

More than sets

The library also provides Data.Recursive.Bool for recursive equations with booleans (preferring False) and Data.Recursive.DualBool (preferring True), and some operations like member :: Ord a => a -> RSet a -> RBool can actually connect different types. I plan to add other data types (natural numbers, maps, Maybe, with suitable orders) as demand arises and as I come across nice small example use-cases for the documentation (e.g. finding shortest paths in a graph).

I believe this idiom is practically useful in a wide range of applications (which of course all have some underlying graph structure – but then almost everything in Computer Science is a graph). My original motivation was a program analysis. Imagine you want to find out from where in your program you can run into a division by zero. As long as your program does not have recursion, you can simply keep track of a boolean flag while you traverse the program, keeping track a mapping from function names to whether they can divide by zero – all nice and elegant. But once you allow mutually recursive functions, things become tricky. Unless you use RBool! Simply use laziness, pass the analysis result down when analyzing the function’s right-hand sides, and it just works!

Comments

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.