# Joachim Breitner's Homepage

## rec-def: Program analysis case study

At this week’s International Conference on Functional Programming I showed my `rec-def`

Haskell library to a few people. As this crowd appreciates writing compilers, and example from the realm of program analysis is quite compelling.

### To Throw or not to throw

Here is our little toy language to analyze: It has variables, lambdas and applications, non-recursive (lazy) let bindings and, so that we have something to analyze, a way to throw and to catch exceptions:

```
type Var = String
data Exp
= Var Var
| Throw
| Catch Exp
| Lam Var Exp
| App Exp Exp
| Let Var Exp Exp
```

Given such an expression, we would like to know whether it might throw an exception. Such an analysis is easy to write: We traverse the syntax tree, remembering in the `env`

which of the variables may throw an exception:

```
canThrow1 :: Exp -> Bool
= go M.empty
canThrow1 where
go :: M.Map Var Bool -> Exp -> Bool
Var v) = env M.! v
go env (Throw = True
go env Catch e) = False
go env (Lam v e) = go (M.insert v False env) e
go env (App e1 e2) = go env e1 || go env e2
go env (Let v e1 e2) = go env' e2
go env (where
= M.singleton v (go env e1)
env_bind = M.union env_bind env env'
```

The most interesting case is the one for `Let`

, where we extend the environment `env`

with the information about the additional variable `env_bind`

, which is calculated from analyzing the right-hand side `e1`

.

So far so good:

```
ghci> someVal = Lam "y" (Var "y")
ghci> canThrow1 $ Throw
True
ghci> canThrow1 $ Let "x" Throw someVal
False
ghci> canThrow1 $ Let "x" Throw (App (Var "x") someVal)
True
```

### Let it rec

To spice things up, let us add a *recursive* let to the language:

```
data Exp
…| LetRec [(Var, Exp)] Exp
```

How can we support this new constructor in `canThrow1`

? Let use naively follow the pattern used for `Let`

: Calculate the analysis information for the variables in `env_bind`

, extend the environment with that, and pass it down:

```
LetRec binds e) = go env' e
go env (where
= M.fromList [ (v, go env' e) | (v,e) <- binds ]
env_bind = M.union env_bind env env'
```

Note that, crucially, we use `env'`

, and not just `env`

, when analyzing the right-hand sides. It has to be that way, as all the variables are in scope in all the right-hand sides.

In a strict language, such a mutually recursive definition, where `env_bind`

uses `env'`

which uses `env_bind`

is basically unthinkable. But in a lazy language like Haskell, it might just work.

Unfortunately, it works only as long as the recursive bindings are *not actually recursive*, or if they are recursive, they are not used:

```
ghci> canThrow1 $ LetRec [("x", Throw)] (Var "x")
True
ghci> canThrow1 $ LetRec [("x", App (Var "y") someVal), ("y", Throw)] (Var "x")
True
ghci> canThrow1 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "y")
True
```

But with genuine recursion, it does not work, and simply goes into a recursive cycle:

```
ghci> canThrow1 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "x")
^CInterrupted.
```

That is disappointing! Do we really have to toss that code and somehow do an explicit fixed-point calculation here? Obscuring our nice declarative code? And possibly having to repeat work (such as traversing the syntax tree) many times that we should only have to do once?

### rec-def to the rescue

Not with rec-def! Using `RBool`

from `Data.Recursive.Bool`

instead of `Bool`

, we can write the exact same code, as follows:

```
import qualified Data.Recursive.Bool as RB
canThrow2 :: Exp -> Bool
= RB.get . go M.empty
canThrow2 where
go :: M.Map Var RBool -> Exp -> R Bool
Var v) = env M.! v
go env (Throw = RB.true
go env Catch e) = RB.false
go env (Lam v e) = go (M.insert v RB.false env) e
go env (App e1 e2) = go env e1 RB.|| go env e2
go env (Let v e1 e2) = go env' e2
go env (where
= M.singleton v (go env e1)
env_bind = M.union env_bind env
env' LetRec binds e) = go env' e
go env (where
= M.fromList [ (v, go env' e) | (v,e) <- binds ]
env_bind = M.union env_bind env env'
```

And it works!

```
ghci> canThrow2 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "x")
False
ghci> canThrow2 $ LetRec [("x", App (Var "x") Throw), ("y", Throw)] (Var "x")
True
```

I find this much more pleasing than the explicit naive fix-pointing you might do otherwise, where you stabilize the result at each `LetRec`

independently: Not only is all that extra work hidden from the programmer, but now also a *single* traversal of the syntax tree creates, thanks to the laziness, a graph of `RBool`

values, which are then solved “under the hood”.

### The issue with x=x

There is one downside worth mentioning: `canThrow2`

fails to produce a result in case we hit `x=x`

:

```
ghci> canThrow2 $ LetRec [("x", Var "x")] (Var "x")
^CInterrupted.
```

This is, after all the syntax tree has been processed and all the map lookups have been resolved, equivalent to

```
ghci> let x = x in RB.get (x :: RBool)
^CInterrupted.
```

which also does not work. The `rec-def`

machinery can only kick in if at least one of its function is used on any such cycle, even if it is just a form of identity (which I ~ought to add to the library~ since have added to the library):

```
ghci> idR x = RB.false ||| x
ghci> let x = idR x in getR (x :: R Bool)
False
```

And indeed, if I insert a call to `idR`

in the line

`= M.fromList [ (v, idR (go env' e)) | (v,e) <- binds ] env_bind `

then our analyzer will no longer stumble over these nasty recursive equations:

```
ghci> canThrow2 $ LetRec [("x", Var "x")] (Var "x")
False
```

It is a bit disappointing to have to do that, but I do not see a better way yet. I guess the `def-rec`

library expects the programmer to have a similar level of sophistication as other tie-the-know tricks with laziness (where you also have to ensure that your definitions are productive and that the sharing is not accidentally lost).

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.