Joachim Breitner

rec-def: Program analysis case study

Published 2022-09-14 in sections English, Haskell.

At this week’s International Conference on Functional Programming I showed my rec-def Haskell library to a few people. As this crowd appreciates writing compilers, and example from the realm of program analysis is quite compelling.

To Throw or not to throw

Here is our little toy language to analyze: It has variables, lambdas and applications, non-recursive (lazy) let bindings and, so that we have something to analyze, a way to throw and to catch exceptions:

type Var = String

data Exp
  = Var Var
  | Throw
  | Catch Exp
  | Lam Var Exp
  | App Exp Exp
  | Let Var Exp Exp

Given such an expression, we would like to know whether it might throw an exception. Such an analysis is easy to write: We traverse the syntax tree, remembering in the env which of the variables may throw an exception:

canThrow1 :: Exp -> Bool
canThrow1 = go M.empty
  where
    go :: M.Map Var Bool -> Exp -> Bool
    go env (Var v)       = env M.! v
    go env Throw         = True
    go env (Catch e)     = False
    go env (Lam v e)     = go (M.insert v False env) e
    go env (App e1 e2)   = go env e1 || go env e2
    go env (Let v e1 e2) = go env' e2
      where
        env_bind = M.singleton v (go env e1)
        env' = M.union env_bind env

The most interesting case is the one for Let, where we extend the environment env with the information about the additional variable env_bind, which is calculated from analyzing the right-hand side e1.

So far so good:

ghci> someVal = Lam "y" (Var "y")
ghci> canThrow1 $ Throw
True
ghci> canThrow1 $ Let "x" Throw someVal
False
ghci> canThrow1 $ Let "x" Throw (App (Var "x") someVal)
True

Let it rec

To spice things up, let us add a recursive let to the language:

data Exp

  | LetRec [(Var, Exp)] Exp

How can we support this new constructor in canThrow1? Let use naively follow the pattern used for Let: Calculate the analysis information for the variables in env_bind, extend the environment with that, and pass it down:

    go env (LetRec binds e) = go env' e
      where
        env_bind = M.fromList [ (v, go env' e) | (v,e) <- binds ]
        env' = M.union env_bind env

Note that, crucially, we use env', and not just env, when analyzing the right-hand sides. It has to be that way, as all the variables are in scope in all the right-hand sides.

In a strict language, such a mutually recursive definition, where env_bind uses env' which uses env_bind is basically unthinkable. But in a lazy language like Haskell, it might just work.

Unfortunately, it works only as long as the recursive bindings are not actually recursive, or if they are recursive, they are not used:

ghci> canThrow1 $ LetRec [("x", Throw)] (Var "x")
True
ghci> canThrow1 $ LetRec [("x", App (Var "y") someVal), ("y", Throw)] (Var "x")
True
ghci> canThrow1 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "y")
True

But with genuine recursion, it does not work, and simply goes into a recursive cycle:

ghci> canThrow1 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "x")
^CInterrupted.

That is disappointing! Do we really have to toss that code and somehow do an explicit fixed-point calculation here? Obscuring our nice declarative code? And possibly having to repeat work (such as traversing the syntax tree) many times that we should only have to do once?

rec-def to the rescue

Not with rec-def! Using RBool from Data.Recursive.Bool instead of Bool, we can write the exact same code, as follows:

import qualified Data.Recursive.Bool as RB

canThrow2 :: Exp -> Bool
canThrow2 = RB.get . go M.empty
  where
    go :: M.Map Var RBool -> Exp -> R Bool
    go env (Var v)       = env M.! v
    go env Throw         = RB.true
    go env (Catch e)     = RB.false
    go env (Lam v e)     = go (M.insert v RB.false env) e
    go env (App e1 e2)   = go env e1 RB.|| go env e2
    go env (Let v e1 e2) = go env' e2
      where
        env_bind = M.singleton v (go env e1)
        env' = M.union env_bind env
    go env (LetRec binds e) = go env' e
      where
        env_bind = M.fromList [ (v, go env' e) | (v,e) <- binds ]
        env' = M.union env_bind env

And it works!

ghci> canThrow2 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "x")
False
ghci> canThrow2 $ LetRec [("x", App (Var "x") Throw), ("y", Throw)] (Var "x")
True

I find this much more pleasing than the explicit naive fix-pointing you might do otherwise, where you stabilize the result at each LetRec independently: Not only is all that extra work hidden from the programmer, but now also a single traversal of the syntax tree creates, thanks to the laziness, a graph of RBool values, which are then solved “under the hood”.

The issue with x=x

There is one downside worth mentioning: canThrow2 fails to produce a result in case we hit x=x:

ghci> canThrow2 $ LetRec [("x", Var "x")] (Var "x")
^CInterrupted.

This is, after all the syntax tree has been processed and all the map lookups have been resolved, equivalent to

ghci> let x = x in RB.get (x :: RBool)
^CInterrupted.

which also does not work. The rec-def machinery can only kick in if at least one of its function is used on any such cycle, even if it is just a form of identity (which I ~ought to add to the library~ since have added to the library):

ghci> idR x = RB.false ||| x
ghci> let x = idR x in getR (x :: R Bool)
False

And indeed, if I insert a call to idR in the line

        env_bind = M.fromList [ (v, idR (go env' e)) | (v,e) <- binds ]

then our analyzer will no longer stumble over these nasty recursive equations:

ghci> canThrow2 $ LetRec [("x", Var "x")] (Var "x")
False

It is a bit disappointing to have to do that, but I do not see a better way yet. I guess the def-rec library expects the programmer to have a similar level of sophistication as other tie-the-know tricks with laziness (where you also have to ensure that your definitions are productive and that the sharing is not accidentally lost).

Comments

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.