## rec-def: Behind the scenes

Published 2022-09-10 in sections English, Haskell.

A week ago I wrote about the `rec-def` Haskell library, which allows you to write more recursive definitions, such as in this small example:

``````let s1 = RS.insert 23 s2
s2 = RS.insert 42 s1
in RS.get s1``````

This will not loop (as it would if you’d just used `Data.Set`), but rather correctly return the set `S.fromList [23,42]`. See the previous blog post for more examples and discussion of the user-facing side of this.

For quick reference, these are the types of the functions involved here:

``````insert :: Ord a => a -> RSet a -> RSet a
get :: RSet a -> Set a``````

The type of `s1` and `s2` above is not `Set Int`, but rather `RSet Int`, and in this post I’ll explain how `RSet` works internally.

### Propagators, in general

The conceptual model behind an recursive equation like above is

• There are a multiple cells that can hold values of an underlying type (here `Set`)
• These cells have relations that explain how the values in the cells should relate to each other
• After registering all the relations, some form of solving happens.
• If the solving succeeds, we can read off the values from the cells, and they should satisfy the registered relation.

This is sometimes called a propagator network, and is a quite general model that can support different kind of relations (e.g. equalities, inequalities, functions), there can be various solving strategies (iterative fixed-points, algebraic solution, unification, etc.) and information can flow on along the edges (and hyper-edges) possibly in multiple directions.

For our purposes, we only care about propagator networks where all relations are functional, so they have a single output cell that is declared to be a function of multiple (possibly zero) input cells, without affecting these input cells. Furthermore, every cell is the output of exactly one such relation.

### IO-infested propagator interfaces

This suggests that an implementation of such a propagator network could provide an interface with the following three operations:

• Functions to declare cells
• Functions to declare relations
• Functions to read values off cells

This is clearly an imperative interface, so we’ll see monads, and we’ll simply use `IO`. So concretely for our small example above, we might expect

``````data Cell a
newCell :: IO (Cell a)
-- declareInsert x c1 c2 declares that the value of c1 ought to be S.insert x c2
declareInsert :: Ord a => a -> Cell a -> Cell a -> IO ()
getCell :: Cell a -> IO (Set a)``````

There is no need for an explicit “solve” function: solving can happen when `declareInsert` or `getCell` is called – as a User I do not care about that.

You might be curious about the implementation of `newCell`, `declareInsert` and `getCell`, but I have to disappoint you: This is not the topic of this article. Instead, I want to discuss how to turn this IO-infested interface into the pure interface seen above?

### Pure, but too strict

Obviously, we have to get rid of the `IO` somehow, and have to use `unsafePerformIO :: IO a -> a` somehow. This dangerous function creates a pure-looking value that, when used the first time, will run the IO-action and turn into that action’s result.

So maybe we can simply write the following:

``````type RSet a = Cell a
insert x c2 = unsafePerformIO \$ do
c1 <- newCell
declareInsert x c1 c2
return c1
get c = unsafePerformIO \$
getCell c``````

Indeed, the types line up, but if we try to use that code, nothing will happen. Our `insert` is too strict to be used recursively: It requires the value of `c2` (as it is passed to `declareInsert`, which we assume to be strict in its arguments) before it can return `c1`, so the recursive example at the top of this post will not make any progress.

### Pure, lazy, but forgetful

To work around this, maybe it suffices if we do not run `declareInsert` right away, but just remember that we have to do it eventually? So let’s introduce a new data type for `RSet a` that contains not just the cell (`Cell a`), but also an action that we still have to run before getting a value:

``````data RSet a = RSet (Cell a) (IO ())
insert x r2 = unsafePerformIO \$ do
c1 <- newCell
let todo = do
let (RSet c2 _) = r2
declareInsert x c1 c2
return (RSet c1 todo)
get (RSet c todo) = unsafePerformIO \$ do
todo
getCell c``````

This is better: `insert` is now lazy in its arguments (for this it is crucial to pattern-match on `RSet` only inside the `todo` code, not in the pattern of `insert`!) This means that our recursive code above does not get stuck right away.

### Pure, lazy, but runs in circles

But it is still pretty bad: Note that we do not run `get s2` in the example above, so that cell’s `todo`, which would `declareInsert 42`, will never run. This cannot work! We have to (eventually) run the declaration code from all involved cells before we can use `getCell`!

We can try to run the todo action of all the dependencies as part of a cell’s todo action:

``````data RSet a = RSet (Cell a) (IO ())
insert x r2 = unsafePerformIO \$ do
c1 <- newCell
let todo = do
let (RSet c2 todo2) = r2
declareInsert x c1 c2
todo2
return (RSet c1 todo)
get (RSet c todo) = unsafePerformIO \$ do
todo
getCell c``````

Now we certainly won’t forget to run the second cell’s todo action, so that is good. But that cell’s todo action will run the first cell’s todo action, and that again the second cell’s, and so on.

### Pure, lazy, terminating, but not thread safe

This is silly: We only need (and should!) run that code once! So let’s keep track of whether we ran it already:

``````data RSet a = RSet (Cell a) (IO ()) (IORef Bool)
insert x r2 = unsafePerformIO \$ do
c1 <- newCell
done <- newIORef False
let todo = do
unless is_done \$ do
writeIORef done True
let (RSet c2 todo2 _) = r2
declareInsert x c1 c2
todo2
return (RSet c1 todo done)
get (RSet c todo _) = unsafePerformIO \$ do
todo
getCell c``````

Ah, much better: It works! Our call to `get c1` will trigger the first cell’s todo action, which will mark it as done before calling the second cell’s todo action. When that now invokes the first cell’s todo action, it is already marked done and we break the cycle, and by the time we reach `getCell`, all relations have been correctly registered.

In a single-threaded world, this would be all good and fine, but we have to worry about multiple threads running `get` concurrently, on the same or on different cells.

In fact, because we use `unsafePerformIO`, we have to worry about this even when the program is not using threads.

And the above code has problems. Imagine a second call to `get c1` while the first one has already marked it as done, but has not finished processing all the dependencies yet: It will call `getCell` before all relations are registered, which is bad.

### Recursive do-once IO actions

Making this thread-safe seems to be possible, but would clutter both the code and this blog post. So let’s hide that problem behind a nice and clean interface. Maybe there will be a separate blog post about its implementation (let me know if you are curious), or you can inspect the code in `System.IO.RecThunk` module yourself). The interface is simply

``````data Thunk
thunk :: IO [Thunk] -> IO Thunk
force :: Thunk -> IO ()``````

and the idea is that `thunk act` will defer the action `act` until the thunk is passed to `force` for the first time, and `force` will not return until the action has been performed (possibly waiting if another thread is doing that at the moment), and also until the actions of all the thunks returned by `act` have performed, recursively, without running into cycles.

We can use this in our definition of `RSet` and get to the final, working solution:

``````data RSet a = RSet (Cell a) Thunk
insert x r2 = unsafePerformIO \$ do
c1 <- newCell
t1 <- thunk \$ do
let (RSet c2 t2) = r2
declareInsert x c1 c2
return [t2]
return (RSet c1 t1)
get (RSet c t) = unsafePerformIO \$ do
force t
getCell c``````

This snippet captures the essential ideas behind `rec-def`:

• Use laziness to allow recursive definition to describe the propagator graph naturally
• Use a form of “explicit thunk” to register the propagator graph relations at the right time (not too early/strict, not too late)

### And that’s all?

The actual implementation in `rec-def` has a few more moving parts.

In particular, it tries to support different value types (not just sets), possibly with different implementations, and even mixing them (e.g. in `member :: Ord a => a -> RSet a -> RBool`), so the generic code is in `Data.Propagator.Purify`, and supports various propagators underneath. The type `RSet` is then just a newtype around that, defined in `Data.Recursive.Internal` to maintain the safety of the abstraction,

I went back and forth on a few variants of the design here, including one where there was a generic `R` type constructor (`R (Set a)`, `R Bool` etc.), but then monomorphic interface seems simpler.

### Does it really work?

The big remaining question is certainly: Is this really safe and pure? Does it still behave like Haskell?

The answer to these questions certainly depends on the underlying propagator implementation. But it also depends on what we actually mean by “safe and pure”? For example, do we expect the Static Argument Transformation be semantics preserving? Or is it allowed to turn undefined values into defined ones (as it does here)?

I am unsure myself yet, so I’ll defer this discussion to a separate blog post, after I hopefully had good discussions about this here at ICFP 2022 in Ljubljana. If you are around and want to discuss, please hit me up!