Joachim Breitner

Blog

rec-def: Dominators case study

Published 2022-09-15 in sections English, Haskell.

More ICFP-inspired experiments using the rec-def library: In Norman Ramsey’s very nice talk about his Functional Pearl “Beyond Relooper: Recursive Translation of Unstructured Control Flow to Structured Control Flow”, he had the following slide showing the equation for the dominators of a node in a graph:

Norman Ramsey shows a formula

He said “it’s ICFP and I wanted to say the dominance relation has a beautiful set of equations … you can read all these algorithms how to compute this, but the concept is simple”.

This made me wonder: If the concept is simple and this formula is beautiful – shouldn’t this be sufficient for the Haskell programmer to obtain the dominator relation, without reading all those algorithms?

Before we start, we have to clarify the formula a bit: If a node is an entry node (no predecessors) then the big intersection is over the empty set, and that is not a well-defined concept. For these nodes, we need that big intersection to return the empty set, as entry nodes are not dominated by any other node. (Let’s assume that the entry nodes are exactly those with no predecessors.)

Let’s try, first using plain Haskell data structures. We begin by implementing this big intersection operator on Data.Set, and also a function to find the predecessors of a node in a graph:

import qualified Data.Set as S
import qualified Data.Map as M

intersections :: [S.Set Int] -> S.Set Int
intersections [] = S.empty
intersections xs = foldl1 S.intersection xs

preds :: [(Int,Int)] -> M.Map Int [Int]
preds edges = M.fromListWith (<>) $
    [ (v1, [])   | (v1, _) <- edges ] ++ -- to make the map total
    [ (v2, [v1]) | (v1, v2) <- edges ]

Now we can write down the formula that Norman gave, quite elegantly:

domintors1 :: [(Int,Int)] -> M.Map Int [Int]
domintors1 edges = fmap S.toList doms
  where
    doms :: M.Map Int (S.Set Int)
    doms = M.mapWithKey
        (\v vs -> S.insert v (intersections [ doms M.! v' | v' <- vs]))
        (preds edges)

Does this work? It seems it does:

ghci> domintors1 []
fromList []
ghci> domintors1 [(1,2)]
fromList [(1,[1]),(2,[1,2])]
ghci> domintors1 [(1,2),(1,3),(2,4),(3,4)]
fromList [(1,[1]),(2,[1,2]),(3,[1,3]),(4,[1,4])]

But – not surprising if you have read my previous blog posts – it falls over once we have recursion:

ghci> domintors1 [(1,2),(1,3),(2,4),(3,4),(4,3)]
fromList [(1,[1]),(2,[1,2]),(3,^CInterrupted.

So let us reimplement it with Data.Recursive.Set.

import qualified Data.Recursive.Set as RS

intersections :: [RS.RSet Int] -> RS.RSet Int
intersections [] = RS.empty
intersections xs = foldl1 RS.intersection xs

domintors2 :: [(Int,Int)] -> M.Map Int [Int]
domintors2 edges = fmap (S.toList . RS.get) doms
  where
    doms :: M.Map Int (RS.RSet Int)
    doms = M.mapWithKey
        (\v vs -> RS.insert v (intersections [ doms M.! v' | v' <- vs]))
        (preds edges)

The hope is that we can simply replace the operations, and that now it can suddenly handle cyclic graphs as well. Let’s see:

ghci> domintors2 [(1,2),(1,3),(2,4),(3,4),(4,3)]
fromList [(1,[1]),(2,[1,2]),(3,[3]),(4,[4])]

It does! Well, it does return a result… but it looks strange. Clearly node 3 and 4 are also dominated by 1, but the result does not reflect that.

But the result is a solution to Norman’s equation. Was the equation wrong? No, but we failed to notice that the desired solution is the largest, not the smallest. And Data.Recursive.Set calculates, as documented, the least fixed point.

What now? Until the library has code for RDualSet a, we can work around this by using the dual formula to calculate the non-dominators. To do this, we

  • use union instead of intersection
  • delete instead of insert,
  • S.empty, use the set of all nodes (which requires some extra plumbing)
  • subtract the result from the set of all nodes to get the dominators

and thus the code turns into:

unions' :: S.Set Int -> [RS.RSet Int] -> RS.RSet Int
unions' univ [] = mkR univ
unions' _ xs = foldl1 RS.union xs

domintors3 :: [(Int,Int)] -> M.Map Int [Int]
domintors3 edges = fmap (S.toList . S.difference nodes . RS.get) nonDoms
  where
    nodes = S.fromList [v | (v1,v2) <- edges, v <- [v1,v2]]
    nonDoms :: M.Map Int (RS.RSet Int)
    nonDoms = M.mapWithKey
        (\v vs -> RS.delete v (unions' nodes [ nonDoms M.! v' | v' <- vs]))
        (preds edges)

And with this, now we do get the correct result:

ghci> domintors3 [(1,2),(1,3),(2,4),(3,4),(4,3)]
fromList [(1,[1]),(2,[1,2]),(3,[1,3]),(4,[1,4])]

We worked a little bit on how to express the “beautiful formula” to Haskell, but at no point did we have to think about how to solve it. To me, this is the essence of declarative programming.

rec-def: Program analysis case study

Published 2022-09-14 in sections English, Haskell.

At this week’s International Conference on Functional Programming I showed my rec-def Haskell library to a few people. As this crowd appreciates writing compilers, and example from the realm of program analysis is quite compelling.

To Throw or not to throw

Here is our little toy language to analyze: It has variables, lambdas and applications, non-recursive (lazy) let bindings and, so that we have something to analyze, a way to throw and to catch exceptions:

type Var = String

data Exp
  = Var Var
  | Throw
  | Catch Exp
  | Lam Var Exp
  | App Exp Exp
  | Let Var Exp Exp

Given such an expression, we would like to know whether it might throw an exception. Such an analysis is easy to write: We traverse the syntax tree, remembering in the env which of the variables may throw an exception:

canThrow1 :: Exp -> Bool
canThrow1 = go M.empty
  where
    go :: M.Map Var Bool -> Exp -> Bool
    go env (Var v)       = env M.! v
    go env Throw         = True
    go env (Catch e)     = False
    go env (Lam v e)     = go (M.insert v False env) e
    go env (App e1 e2)   = go env e1 || go env e2
    go env (Let v e1 e2) = go env' e2
      where
        env_bind = M.singleton v (go env e1)
        env' = M.union env_bind env

The most interesting case is the one for Let, where we extend the environment env with the information about the additional variable env_bind, which is calculated from analyzing the right-hand side e1.

So far so good:

ghci> someVal = Lam "y" (Var "y")
ghci> canThrow1 $ Throw
True
ghci> canThrow1 $ Let "x" Throw someVal
False
ghci> canThrow1 $ Let "x" Throw (App (Var "x") someVal)
True

Let it rec

To spice things up, let us add a recursive let to the language:

data Exp

  | LetRec [(Var, Exp)] Exp

How can we support this new constructor in canThrow1? Let use naively follow the pattern used for Let: Calculate the analysis information for the variables in env_bind, extend the environment with that, and pass it down:

    go env (LetRec binds e) = go env' e
      where
        env_bind = M.fromList [ (v, go env' e) | (v,e) <- binds ]
        env' = M.union env_bind env

Note that, crucially, we use env', and not just env, when analyzing the right-hand sides. It has to be that way, as all the variables are in scope in all the right-hand sides.

In a strict language, such a mutually recursive definition, where env_bind uses env' which uses env_bind is basically unthinkable. But in a lazy language like Haskell, it might just work.

Unfortunately, it works only as long as the recursive bindings are not actually recursive, or if they are recursive, they are not used:

ghci> canThrow1 $ LetRec [("x", Throw)] (Var "x")
True
ghci> canThrow1 $ LetRec [("x", App (Var "y") someVal), ("y", Throw)] (Var "x")
True
ghci> canThrow1 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "y")
True

But with genuine recursion, it does not work, and simply goes into a recursive cycle:

ghci> canThrow1 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "x")
^CInterrupted.

That is disappointing! Do we really have to toss that code and somehow do an explicit fixed-point calculation here? Obscuring our nice declarative code? And possibly having to repeat work (such as traversing the syntax tree) many times that we should only have to do once?

rec-def to the rescue

Not with rec-def! Using RBool from Data.Recursive.Bool instead of Bool, we can write the exact same code, as follows:

import qualified Data.Recursive.Bool as RB

canThrow2 :: Exp -> Bool
canThrow2 = RB.get . go M.empty
  where
    go :: M.Map Var RBool -> Exp -> R Bool
    go env (Var v)       = env M.! v
    go env Throw         = RB.true
    go env (Catch e)     = RB.false
    go env (Lam v e)     = go (M.insert v RB.false env) e
    go env (App e1 e2)   = go env e1 RB.|| go env e2
    go env (Let v e1 e2) = go env' e2
      where
        env_bind = M.singleton v (go env e1)
        env' = M.union env_bind env
    go env (LetRec binds e) = go env' e
      where
        env_bind = M.fromList [ (v, go env' e) | (v,e) <- binds ]
        env' = M.union env_bind env

And it works!

ghci> canThrow2 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "x")
False
ghci> canThrow2 $ LetRec [("x", App (Var "x") Throw), ("y", Throw)] (Var "x")
True

I find this much more pleasing than the explicit naive fix-pointing you might do otherwise, where you stabilize the result at each LetRec independently: Not only is all that extra work hidden from the programmer, but now also a single traversal of the syntax tree creates, thanks to the laziness, a graph of RBool values, which are then solved “under the hood”.

The issue with x=x

There is one downside worth mentioning: canThrow2 fails to produce a result in case we hit x=x:

ghci> canThrow2 $ LetRec [("x", Var "x")] (Var "x")
^CInterrupted.

This is, after all the syntax tree has been processed and all the map lookups have been resolved, equivalent to

ghci> let x = x in RB.get (x :: RBool)
^CInterrupted.

which also does not work. The rec-def machinery can only kick in if at least one of its function is used on any such cycle, even if it is just a form of identity (which I ~ought to add to the library~ since have added to the library):

ghci> idR x = RB.false ||| x
ghci> let x = idR x in getR (x :: R Bool)
False

And indeed, if I insert a call to idR in the line

        env_bind = M.fromList [ (v, idR (go env' e)) | (v,e) <- binds ]

then our analyzer will no longer stumble over these nasty recursive equations:

ghci> canThrow2 $ LetRec [("x", Var "x")] (Var "x")
False

It is a bit disappointing to have to do that, but I do not see a better way yet. I guess the def-rec library expects the programmer to have a similar level of sophistication as other tie-the-know tricks with laziness (where you also have to ensure that your definitions are productive and that the sharing is not accidentally lost).

rec-def: Behind the scenes

Published 2022-09-10 in sections English, Haskell.

A week ago I wrote about the rec-def Haskell library, which allows you to write more recursive definitions, such as in this small example:

let s1 = RS.insert 23 s2
    s2 = RS.insert 42 s1
in RS.get s1

This will not loop (as it would if you’d just used Data.Set), but rather correctly return the set S.fromList [23,42]. See the previous blog post for more examples and discussion of the user-facing side of this.

For quick reference, these are the types of the functions involved here:

insert :: Ord a => a -> RSet a -> RSet a
get :: RSet a -> Set a

The type of s1 and s2 above is not Set Int, but rather RSet Int, and in this post I’ll explain how RSet works internally.

Propagators, in general

The conceptual model behind an recursive equation like above is

  • There are a multiple cells that can hold values of an underlying type (here Set)
  • These cells have relations that explain how the values in the cells should relate to each other
  • After registering all the relations, some form of solving happens.
  • If the solving succeeds, we can read off the values from the cells, and they should satisfy the registered relation.

This is sometimes called a propagator network, and is a quite general model that can support different kind of relations (e.g. equalities, inequalities, functions), there can be various solving strategies (iterative fixed-points, algebraic solution, unification, etc.) and information can flow on along the edges (and hyper-edges) possibly in multiple directions.

For our purposes, we only care about propagator networks where all relations are functional, so they have a single output cell that is declared to be a function of multiple (possibly zero) input cells, without affecting these input cells. Furthermore, every cell is the output of exactly one such relation.

IO-infested propagator interfaces

This suggests that an implementation of such a propagator network could provide an interface with the following three operations:

  • Functions to declare cells
  • Functions to declare relations
  • Functions to read values off cells

This is clearly an imperative interface, so we’ll see monads, and we’ll simply use IO. So concretely for our small example above, we might expect

data Cell a
newCell :: IO (Cell a)
-- declareInsert x c1 c2 declares that the value of c1 ought to be S.insert x c2
declareInsert :: Ord a => a -> Cell a -> Cell a -> IO ()
getCell :: Cell a -> IO (Set a)

There is no need for an explicit “solve” function: solving can happen when declareInsert or getCell is called – as a User I do not care about that.

You might be curious about the implementation of newCell, declareInsert and getCell, but I have to disappoint you: This is not the topic of this article. Instead, I want to discuss how to turn this IO-infested interface into the pure interface seen above?

Pure, but too strict

Obviously, we have to get rid of the IO somehow, and have to use unsafePerformIO :: IO a -> a somehow. This dangerous function creates a pure-looking value that, when used the first time, will run the IO-action and turn into that action’s result.

So maybe we can simply write the following:

type RSet a = Cell a
insert x c2 = unsafePerformIO $ do
  c1 <- newCell
  declareInsert x c1 c2
  return c1
get c = unsafePerformIO $
  getCell c

Indeed, the types line up, but if we try to use that code, nothing will happen. Our insert is too strict to be used recursively: It requires the value of c2 (as it is passed to declareInsert, which we assume to be strict in its arguments) before it can return c1, so the recursive example at the top of this post will not make any progress.

Pure, lazy, but forgetful

To work around this, maybe it suffices if we do not run declareInsert right away, but just remember that we have to do it eventually? So let’s introduce a new data type for RSet a that contains not just the cell (Cell a), but also an action that we still have to run before getting a value:

data RSet a = RSet (Cell a) (IO ())
insert x r2 = unsafePerformIO $ do
  c1 <- newCell
  let todo = do
        let (RSet c2 _) = r2
        declareInsert x c1 c2
  return (RSet c1 todo)
get (RSet c todo) = unsafePerformIO $ do
  todo
  getCell c

This is better: insert is now lazy in its arguments (for this it is crucial to pattern-match on RSet only inside the todo code, not in the pattern of insert!) This means that our recursive code above does not get stuck right away.

Pure, lazy, but runs in circles

But it is still pretty bad: Note that we do not run get s2 in the example above, so that cell’s todo, which would declareInsert 42, will never run. This cannot work! We have to (eventually) run the declaration code from all involved cells before we can use getCell!

We can try to run the todo action of all the dependencies as part of a cell’s todo action:

data RSet a = RSet (Cell a) (IO ())
insert x r2 = unsafePerformIO $ do
  c1 <- newCell
  let todo = do
        let (RSet c2 todo2) = r2
        declareInsert x c1 c2
        todo2
  return (RSet c1 todo)
get (RSet c todo) = unsafePerformIO $ do
  todo
  getCell c

Now we certainly won’t forget to run the second cell’s todo action, so that is good. But that cell’s todo action will run the first cell’s todo action, and that again the second cell’s, and so on.

Pure, lazy, terminating, but not thread safe

This is silly: We only need (and should!) run that code once! So let’s keep track of whether we ran it already:

data RSet a = RSet (Cell a) (IO ()) (IORef Bool)
insert x r2 = unsafePerformIO $ do
  c1 <- newCell
  done <- newIORef False
  let todo = do
        is_done <- readIORef done
        unless is_done $ do
          writeIORef done True
          let (RSet c2 todo2 _) = r2
          declareInsert x c1 c2
          todo2
  return (RSet c1 todo done)
get (RSet c todo _) = unsafePerformIO $ do
  todo
  getCell c

Ah, much better: It works! Our call to get c1 will trigger the first cell’s todo action, which will mark it as done before calling the second cell’s todo action. When that now invokes the first cell’s todo action, it is already marked done and we break the cycle, and by the time we reach getCell, all relations have been correctly registered.

In a single-threaded world, this would be all good and fine, but we have to worry about multiple threads running get concurrently, on the same or on different cells.

In fact, because we use unsafePerformIO, we have to worry about this even when the program is not using threads.

And the above code has problems. Imagine a second call to get c1 while the first one has already marked it as done, but has not finished processing all the dependencies yet: It will call getCell before all relations are registered, which is bad.

Recursive do-once IO actions

Making this thread-safe seems to be possible, but would clutter both the code and this blog post. So let’s hide that problem behind a nice and clean interface. Maybe there will be a separate blog post about its implementation (let me know if you are curious), or you can inspect the code in System.IO.RecThunk module yourself). The interface is simply

data Thunk
thunk :: IO [Thunk] -> IO Thunk
force :: Thunk -> IO ()

and the idea is that thunk act will defer the action act until the thunk is passed to force for the first time, and force will not return until the action has been performed (possibly waiting if another thread is doing that at the moment), and also until the actions of all the thunks returned by act have performed, recursively, without running into cycles.

We can use this in our definition of RSet and get to the final, working solution:

data RSet a = RSet (Cell a) Thunk
insert x r2 = unsafePerformIO $ do
  c1 <- newCell
  t1 <- thunk $ do
    let (RSet c2 t2) = r2
    declareInsert x c1 c2
    return [t2]
  return (RSet c1 t1)
get (RSet c t) = unsafePerformIO $ do
  force t
  getCell c

This snippet captures the essential ideas behind rec-def:

  • Use laziness to allow recursive definition to describe the propagator graph naturally
  • Use a form of “explicit thunk” to register the propagator graph relations at the right time (not too early/strict, not too late)

And that’s all?

The actual implementation in rec-def has a few more moving parts.

In particular, it tries to support different value types (not just sets), possibly with different implementations, and even mixing them (e.g. in member :: Ord a => a -> RSet a -> RBool), so the generic code is in Data.Propagator.Purify, and supports various propagators underneath. The type RSet is then just a newtype around that, defined in Data.Recursive.Internal to maintain the safety of the abstraction,

I went back and forth on a few variants of the design here, including one where there was a generic R type constructor (R (Set a), R Bool etc.), but then monomorphic interface seems simpler.

Does it really work?

The big remaining question is certainly: Is this really safe and pure? Does it still behave like Haskell?

The answer to these questions certainly depends on the underlying propagator implementation. But it also depends on what we actually mean by “safe and pure”? For example, do we expect the Static Argument Transformation be semantics preserving? Or is it allowed to turn undefined values into defined ones (as it does here)?

I am unsure myself yet, so I’ll defer this discussion to a separate blog post, after I hopefully had good discussions about this here at ICFP 2022 in Ljubljana. If you are around and want to discuss, please hit me up!

More recursive definitions

Published 2022-09-03 in sections English, Haskell.

Haskell is a pure and lazy programming language, and the laziness allows us to write some algorithms very elegantly, by recursively referring to already calculated values. A typical example is the following definition of the Fibonacci numbers, as an infinite stream:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

Elegant graph traversals

A maybe more practical example is the following calculation of the transitive closure of a graph:

import qualified Data.Set as S
import qualified Data.Map as M

type Graph = M.Map Int [Int]

transitive1 :: Graph -> Graph
transitive1 g = M.map S.toList sets
  where
    sets :: M.Map Int (S.Set Int)
    sets = M.mapWithKey (\v vs -> S.insert v (S.unions [ sets M.! v' | v' <- vs ])) g

We represent graphs as maps from vertex to their successors vertex, and define the resulting map sets recursively: The set of reachable vertices from a vertex v is v itself, plus those reachable by its successors vs, for which we query sets.

And, despite this apparent self-referential recursion, it works!

ghci> transitive1 $ M.fromList [(1,[3]),(2,[1,3]),(3,[])]
fromList [(1,[1,3]),(2,[1,2,3]),(3,[3])]

Cyclic graphs ruin it all

These tricks can be very impressive … until someone tries to use it on a cyclic graph and the program just hangs until we abort it:

ghci> transitive1 $ M.fromList [(1,[2,3]),(2,[1,3]),(3,[])]
fromList [(1,fromList ^CInterrupted.

At this point we are thrown back to implement a more pedestrian graph traversal, typically keeping explicit track of vertices that we have seen already:

transitive2 :: Graph -> Graph
transitive2 g = M.fromList [ (v, S.toList (go S.empty [v])) | v <- M.keys g ]
  where
    go :: S.Set Int -> [Int] -> S.Set Int
    go seen [] = seen
    go seen (v:vs) | v `S.member` seen = go seen vs
    go seen (v:vs) = go (S.insert v seen) (g M.! v ++ vs)

I have written that seen/todo recursion idiom so often in the past, I can almost write it blindly And indeed, this code handles cyclic graphs just fine:

ghci> transitive2 $ M.fromList [(1,[2,3]),(2,[1,3]),(3,[])]
fromList [(1,[1,2,3]),(2,[1,2,3]),(3,[3])]

But this is a bit anticlimactic – Haskell is supposed to be a declarative language, and transitive1 declares my intent just fine!

We can have it all

It seems there actually is a way to write essentially the code in transitive1, and still get the right result in all cases, and I have just published a possible implementation as rec-def. In the module Data.Recursive.Set we find an API that resembles that of Set, with a type RSet a, and in addition to conversion functions from and to sets, we find the two operations that we needed in transitive1:

insert :: Ord a => a -> RSet a -> RSet a
unions :: Ord a => [RSet a] -> RSet a
get :: RSet a -> Set a

Let’s try that:

import qualified Data.Recursive.Set as RS

transitive2 :: Graph -> Graph
transitive2 g = M.map (S.toList . RS.get) sets
  where
    sets :: M.Map Int (RS.RSet Int)
    sets = M.mapWithKey (\v vs -> RS.insert v (RS.unions [ sets M.! v' | v' <- vs ])) g

And indeed it works! Magic!

ghci> transitive2 $ M.fromList [(1,[3]),(2,[1,3]),(3,[])]
fromList [(1,[1,3]),(2,[1,2,3]),(3,[3])]
ghci> transitive2 $ M.fromList [(1,[2,3]),(2,[1,3]),(3,[])]
fromList [(1,[1,2,3]),(2,[1,2,3]),(3,[3])]

To show off some more, here are small examples:

ghci> let s = RS.insert 42 s in RS.get s
fromList [42]
ghci> :{
  let s1 = RS.insert 23 s2
      s2 = RS.insert 42 s1
  in RS.get s1
 :}
fromList [23,42]

How is that possible? Is it still Haskell?

The internal workings of the RSet a type will be the topic of a future blog post; let me just briefly mention that it uses unsafe features under the hood, and just keeps applying the equations you gave until a fixed-point is reached. Because it starts with the empty set and all operations provided by Data.Recursive.Set are monotonous (e.g. no difference) it will eventually find the least fixed point.

Despite the unsafe machinery under the hood, I claim that Data.Recursive.Set is itself nicely safe, and does not destroy Haskell’s nice properties like purity, referential transparency and equational reasoning. If you disagree, I’d like to hear about it (here, on Twitter, Reddit or Discourse)! There is a brief discussion at the end of the tutorial in Data.Recursive.Example.

More than sets

The library also provides Data.Recursive.Bool for recursive equations with booleans (preferring False) and Data.Recursive.DualBool (preferring True), and some operations like member :: Ord a => a -> RSet a -> RBool can actually connect different types. I plan to add other data types (natural numbers, maps, Maybe, with suitable orders) as demand arises and as I come across nice small example use-cases for the documentation (e.g. finding shortest paths in a graph).

I believe this idiom is practically useful in a wide range of applications (which of course all have some underlying graph structure – but then almost everything in Computer Science is a graph). My original motivation was a program analysis. Imagine you want to find out from where in your program you can run into a division by zero. As long as your program does not have recursion, you can simply keep track of a boolean flag while you traverse the program, keeping track a mapping from function names to whether they can divide by zero – all nice and elegant. But once you allow mutually recursive functions, things become tricky. Unless you use RBool! Simply use laziness, pass the analysis result down when analyzing the function’s right-hand sides, and it just works!

The Via Alpina red trail through Slovenia

Published 2022-07-31 in sections English.

This July my girlfriend and I hiked the Slovenian part of the Red Trail of the Via Alpina, from the edge of the Julian Alps to Trieste, and I’d like to share some observations and tips that we might have found useful before our trip.

Our most favorite camp spot

Getting there

As we traveled with complete camping gear and wanted to stay in our tent, we avoided the high alpine parts of the trail and started just where the trail came down from the Alps and entered the Karst. A great way to get there is to take the night train from Zurich or Munich towards Ljubljana, get off at Jesenice, have breakfast, take the local train to Podbrdo and you can start your tour at 9:15am. From there you can reach the trail at Pedrovo Brdo within 1½h.

Finding the way

We did not use any paper maps, and instead relied on the OpenStreetMap data, which is very good, as well as the official(?) GPX tracks on Komoot, which are linked from the official route descriptions. We used OsmAnd.

In general, trails are generally very well marked (red circle with white center, and frequent signs), but the signs rarely tell you which way the Via Alpina goes, so the GPS was needed.

Sometimes the OpenStreetMap trail and the Komoot trail disagreed on short segments. We sometimes followed one and other times the other.

Variants

We diverged from the trail in a few places:

  • We did not care too much about the horses in Lipica and at least on the map it looked like a longish boringish and sun-exposed detour, so we cut the loop and hiked from Prelože pri Lokvi up onto the peak of the Veliko Gradišče (which unfortunately is too overgrown to provide a good view).

  • When we finally reached the top of Mali Kras and had a view across the bay of Trieste, it seemed silly to walk to down to Dolina, and instead we followed the ridge through Socerb, essentially the Alpe Adria Trail.

  • Not really a variant, but after arriving in Muggia, if one has to go to Trieste, the ferry is a probably nicer way to finish a trek than the bus.

Pitching a tent

We used our tent almost every night, only in Idrija we got a room (and a shower…). It was not trivial to find good camp spots, because most of the trail is on hills with slopes, and the flat spots tend to have housed built on them, but certainly possible. Sometimes we hid in the forest, other times we found nice small and freshly mowed meadows within the forest.

Water

Since this is Karst land, there is very little in terms of streams or lakes along the way, which is a pity.

The Idrijca river right south of Idrija was very tempting to take a plunge. Unfortunately we passed there early in the day and we wanted to cover some ground first, so we refrained.

As for drinking water, we used the taps at the bathrooms of the various touristic sites, a few (but rare) public fountains, and finally resorted to just ringing random doorbells and asking for water, which always worked.

Paths

A few stages lead you through very pleasant narrow forest paths with a sight, but not all. On some days you find yourself plodding along wide graveled or even paved forest roads, though.

Landscape and sights

The view from Nanos is amazing and, with this high peak jutting out over a wide plain, rather unique. It may seem odd that the trail goes up and down that mountain on the same day when it could go around, but it is certainly worth it.

The Karst is mostly a cultivated landscape, with lots of forestry. It is very hilly and green, which is pretty, but some might miss some craggedness. It’s not the high alps, after all, but at least they are in sight half the time.

But the upside is that there are few sights along the way that are worth visiting, in particular the the Franja Partisan Hospital hidden in a very narrow gorge, the Predjama Castle and the Škocjan Caves

Telegram bots in Python made easy

Published 2022-01-03 in sections English, Digital World.

A while ago I set out to get some teenagers interested in programming, and thought about a good way to achieve that. A way that allows them to get started with very little friction, build something that’s relevant in their currently life quickly, and avoids frustration.

They were old enough to have their own smartphone, and they were already happily chatting with their friends, using the Telegram messenger. I have already experimented a bit with writing bots for Telegram (e.g. @Umklappbot or @Kaleidogen), and it occurred to me that this might be a good starting point: Chat bot interactions have a very simple data model: message in, response out, all simple text. Much simpler than anything graphical or even web programming. In a way it combines the simplicity of the typical initial programming exercises on the command-line with the impact and relevance of web programming.

But of course “real” bot programming is still too hard – installing a programming environment, setting up a server, deploying, dealing with access tokens, understanding the Telegram Bot API and mapping it to your programming language.

The IDE

So I built a browser-based Python programming environments for Telegram bots that takes care of all of that. You simply write a single Python function, click the “Deploy” button, and the bot is live. That’s it!

This environment provides a much simpler “API” for the bots: Define a function like the following:

  def private_message(sender, text):
     return "Hello!"

This gets called upon a message, and if it returns a String, that’s the response. That’s it! Not enough to build any kind of Telegram bot, but sufficient for many fun applications.

A chatbot

In fact, my nephew and niece use this to build a simple interactive fiction game, where the player says where they are going (“house”, ”forest”, “lake”) and thus explore the story, and in the end kill the dragon. And my girlfriend created a shopping list bot that we are using “productively”.

If you are curious, you can follow the instructions to create your own bot. There you can also find the source code and instructions for hosting your own instance (on Amazon Web Services).

Help with the project (e.g. improving the sandbox for running untrustworthy python code; making the front-end work better) is of course highly appreciated, too. The frontend is written in PureScript, and the backend in Python, building on Amazon lambda and Amazon DynamoDB.

Zero-downtime upgrades of Internet Computer canisters

Published 2021-11-28 in sections English, Internet Computer.

TL;DR: Zero-downtime upgrades are possible if you stick to the basic actor model.

Background

DFINITY’s Internet Computer provides a kind of serverless compute platform, where the services are WebAssemmbly programs called “canisters”. These services run without stopping (or at least that’s what it feels like from the service’s perspective; this is called “orthogonal persistence”), and process one message after another. Messages not only come from the outside (“ingress” calls), but are also exchanged between canisters.

On top of these uni-directional messages, the system provides the concept of “inter-canister calls”, which associates a respondse message with the outgoing message, and guarantees that a response will come. This RPC-like interface allows canister developers to program in the popular async/await model, where these inter-canister calls look almost like normal function calls, and the subsequent code is suspended until the response comes back.

The problem

This is all very well, until you try to upgrade your canister, i.e. install new code to fix a bug or add a feature. Because if you used the await pattern, there may still be suspended computations waiting for the response. If you swap out the program now, the code of that suspended computation will no longer be present, and the response cannot be handled! Worse, because of an infelicity with the current system’s API, when the response comes back, it may actually corrupt your service’s state.

That is why upgrading a canister requires stopping it first, which means waiting for all outstanding calls to come back. During this time, your canister is not available for new calls (so there is downtime), and worse, the length of the downtime is at the whims of the canisters you called – they could withhold the response ad infinitum, rendering your canister unupgradeable.

Clearly, this is not acceptable for any serious application. In this post, I’ll explore some of the ways to mitigate this problem, and how to create canisters that are safely instantanously (no downtime) upgradeable.

It’s a spectrum

Some canisters are trivially upgradeable, for others all hope is lost; it depends on what the canister does and how. As an overview, here is the spectrum:

  1. A canister that never performs inter-canister calls can always be upgraded without stopping.
  2. A canister that only does one-way calls, and does them in a particular way (see below), can always be upgraded without stopping.
  3. A canister that performs calls, and where it is acceptable to simply drop outstanding repsonses, can always be upgraded without stopping, once the System API has been improved and your Canister Development Kit (CDK; Motoko or Rust) has adapted.
  4. A canister that performs calls, but uses explicit continuations to handle, responses instead of the await-convenience, based on an eventually fixed System API, can be upgradeded without stopping, and will even handle responses afterwards.
  5. A canister that uses await to do inter-canister call cannot be upgraded without stopping.

In this post I will explain 2, which is possible now, in more detail. Variant 3 and 4 only become reality if and when the System API has improved.

One-way calls

A one-way call is a call where you don’t care about the response; neither the replied data, nor possible failure conditions.

Since you don’t care about the response, you can pass an invalid continuation to the system (technical detail: a Wasm table index of -1). Because it is invalid for any (realistic) Wasm module, it will stay invalid even after an upgrade, and the problem of silent corruption mentioned above is avoided. And otherwise it’s fine for this to be invalid: it means the canister “traps” once the response comes back, which is harmeless (and possibly even cheaper than a do-nothing computation).

This requires your CDK to support this kind of call. Mostly incidential, Motoko (and Candid) actually have the concept of one-way call in their type system, namely shared functions with return type () instead of async ... (Motoko is actually older than the system, and not every prediction about what the system will provide has proven successful). So, pending this PR to be released, Motoko will implement one-way calls in this way. On Rust, you have to use the System API directly or wait for cdk-rs to provide this ability (patches welcome, happy to advise).

You might wonder: How are calls useful if I don’t get to look at the response? Of course, this is a set-back – calls with responses are useful, and await is convenient. And if you have to integrate with an existing service that only provides normal calls, you are out of luck.

But if you get to design the canister and all called canisters together, it may be possible to use only one-way messages. You’d be programming in the plain actor model now, with all its advantages (simple concurrency, easy to upgrade, general robustness).

Consider for example a token ledger canister, not unlike the ICP ledger canister. For the most part, it doesn’t have to do any outgoing calls (and thus be trivially upgradeble). But say we need to add notify functionality, where the ledger canister tells other canisters about a transaction. This is a good example for a one-way call: Maybe the ledger canister doesn’t care if that notification was received? The ICP leder does care (once it comes back successful, this particular notification cannot be sent again), but maybe your ledger can do it differently: let the other canister confirm the receip via another one-way call, instead of via the reply; or simply charge for each notification and do not worry about repeated notifications.

Maybe you want to add archiving functionality, where the ledger canister streams its data to an archive canister. There, again, instead of using successful responses to confirm receipt, the archive canister can ping the ledger canister with the latest received index directly.

Yes, it changes the programming model a bit, and all involved parties have to play together, but the gain (zero-downtime upgrades) is quite valuable, and removes a fair number of other sources of issues.

And in the future?

The above is possible with today’s Internet Computer. If the System API gets improves the way I hope it will be, you have a possible middle ground: You still don’t get to use await and instead have to write your response handler as separate functions, but this way you can call any canister again, and you get the system’s assistance in mapping responses to calls. With this in place, any canister can be rewritten to a form that supports zero-downtime upgrades, without affecting its interface or what the canister can do.

How to audit an Internet Computer canister

Published 2021-11-09 in sections English, Internet Computer.

I was recently called upon by Origyn to audit the source code of some of their Internet Computer canisters (“canisters” are services or smart contracts on the Internet Computer), which were written in the Motoko programming language. Both the application model of the Internet Computer as well as Motoko bring with them their own particular pitfalls and possible sources for bugs. So given that I was involved in the creation of both, they reached out to me.

In the course of that audit work I collected a list of things to watch out for, and general advice around them. Origyn generously allowed me to share that list here, in the hope that it will be helpful to the wider community.

Inter-canister calls

The Internet Computer system provides inter-canister communication that follows the actor model: Inter-canister calls are implemented via two asynchronous messages, one to initiate the call, and one to return the response. Canisters process messages atomically (and roll back upon certain error conditions), but not complete calls. This makes programming with inter-canister calls error-prone. Possible common sources for bugs, vulnerabilities or simply unexpected behavior are:

  • Reading global state before issuing an inter-canister call, and assuming it to still hold when the call comes back.

  • Changing global state before issuing an inter-canister call, changing it again in the response handler, but assuming nothing else changes the state in between (reentrancy).

  • Changing global state before issuing an inter-canister call, and not handling failures correctly, e.g. when the code handling the callback rolls backs.

If you find such pattern in your code, you should analyze if a malicious party can trigger them, and assess the severity that effect

These issues apply to all canisters, and are not Motoko-specific.

Rollbacks

Even in the absence of inter-canister calls the behavior of rollbacks can be surprising. In particular, rejecting (i.e. throw) does not rollback state changes done before, while trapping (e.g. Debug.trap, assert …, out of cycle conditions) does.

Therefore, one should check all public update call entry points for unwanted state changes or unwanted rollbacks. In particular, look for methods (or rather, messages, i.e. the code between commit points) where a state change is followed by a throw.

This issues apply to all canisters, and are not Motoko-specific, although other CDKs may not turn exceptions into rejects (which don’t roll back).

Talking to malicious canisters

Talking to untrustworthy canisters can be risky, for the following (likely incomplete) reasons:

  • The other canister can withhold a response. Although the bidirectional messaging paradigm of the Internet Computer was designed to guarantee a response eventually, the other party can busy-loop for as long as they are willing to pay for before responding. Worse, there are ways to deadlock a canister.

  • The other canister can respond with invalidly encoded Candid. This will cause a Motoko-implemented canister to trap in the reply handler, with no easy way to recover. Other CDKs may give you better ways to handle invalid Candid, but even then you will have to worry about Candid cycle bombs that will cause your reply handler to trap.

Many canisters do not even do inter-canister calls, or only call other trustwothy canisters. For the others, the impact of this needs to be carefully assessed.

Canister upgrade: overview

For most services it is crucial that canisters can be upgraded reliably. This can be broken down into the following aspects:

  1. Can the canister be upgraded at all?
  2. Will the canister upgrade retain all data?
  3. Can the canister be upgraded promptly?
  4. Is there a recovery plan for when upgrading is not possible?

Canister upgradeability

A canister that traps, for whatever reason, in its canister_preupgrade system method is no longer upgradeable. This is a major risk. The canister_preupgrade method of a Motoko canister consists of the developer-written code in any system func preupgrade() block, followed by the system-generated code that serializes the content of any stable var into a binary format, and then copies that to stable memory.

Since the Motoko-internal serialization code will first serialize into a scratch space in the main heap, and then copy that to stable memory, canisters with more than 2GB of live data will likely be unupgradeable. But this is unlikely the first limit:

The system imposes an instruction limit on upgrading a canister (spanning both canister_preupgrade and canister_postupgrade). This limit is a subnet configuration value, and sepearate (and likely higher) than the normal per-message limit, and not easily determined. If the canister’s live data becomes too large to be serialized within this limit, the canister becomes non-upgradeable.

This risk cannot be eliminated completely, as long as Motoko and Stable Variables are used. It can be mitigated by appropriate load testing:

Install a canister, fill it up with live data, and exercise the upgrade. If this succeeds with a live data set exceeding the expected amount of data by a margin, this risk is probably acceptable. Bonus points for adding functionality that will prevent the canister’s live data to increase above a certain size.

If this testing is to be done on a local replica, extra care needs to be taken to make sure the local replica actually performs instruction counting and has the same resource limits as the production subnet.

An alternative mitigation is to avoid canister_pre_upgrade as much as possible. This means no use of stable var (or restricted to small, fixed-size configuration data). All other data could be

  • mirrored off the canister (possibly off chain), and manually re-hydrated after an upgrade.
  • stored in stable memory manually, during each update call, using the ExperimentalStableMemory API. While this matches what high-assurance Rust canisters (e.g. the Internet Identity) do, this requires manual binary encoding of the data, and is marked experimental, so I cannot recommend this at the moment.
  • not put into a Motoko canister until Motoko has a scalable solution for stable variable (for example keeping them in stable memory permanently, with smart caching in main memory, and thus obliterating the need for pre-upgrade code.)

Data retention on upgrades

Obviously, all live data ought to be retained during upgrades. Motoko automatically ensures this for stable var data. But often canisters want to work with their data in a different format (e.g. in objects that are not shared and thus cannot be put in stable vars, such as HashMap or Buffer objects), and thus may follow following idiom:

stable var fooStable = …;
var foo = fooFromStable(fooStable);
system func preupgrade() { fooStable := fooToStable(foo); })
system func postupgrade() { fooStable := (empty); })

In this case, it is important to check that

  • All non-stable global vars, or global lets with mutable values, have a stable companion.
  • The assignments to foo and fooStable are not forgotten.
  • The fooToStable and fooFromStable form bijections.

An example would be HashMaps stored as arrays via Iter.toArray(….entries()) and HashMap.fromIter(….vals()).

It is worth pointiong out that a code view will only look at a single version of the code, but cannot check whether code changes will preserve data on upgrade. This can easily go wrong if the names and types of stable variables are changed in incompatible way. The upgrade may fail loudly in this cases, but in bad cases, the upgrade may even succeed, losing data along the way. This risk needs to be mitigated by thorough testing, and possibly backups (see below).

Prompt upgrades

Motoko and Rust canisters cannot be safely upgraded when they are still waiting for responses to inter-canister calls (the callback would eventually reach the new instance, and because of infelicities of the IC’s System API, could possibly call arbitrary internal functions). Therefore, the canister needs to be stopped before upgrading, and started again. If the inter-canister calls take a long time, this mean that upgrading may take a long time, which may be undesirable. Again, this risk is reduced if all calls are made to trustworthy canisters, and elevated when possibly untrustworthy canisters are called, directly or indirectly.

Backup and recovery

Because of the above risk around upgrades it is advisable to have a disaster recovery strategy. This could involve off-chain backups of all relevant data, so that it is possible to reinstall (not upgrade) the canister and re-upload all data.

Note that reinstall has the same issue as upgrade described above in “prompt upgrades”: It ought to be stopped first to be safe.

Note that the instruction limit for messages, as well as the message size limit, limit the amount of data returned. If the canister needs to hold more data than that, the backup query method might have to return chunks or deltas, with all the extra complexity that entails, e.g. state changes between downloading chunks.

If large data load testing is performed (as Irecommend anyways to test upgradeability), one can test whether the backup query method works within the resource limits.

Time is not strictly monotonic

The timestamps for “current time” that the Internet Computer provides to its canisters is guaranteed to be monotonic, but not strictly monotonic. It can return the same values, even in the same messages, as long as they are processed in the same block. It should therefore not be used to detect “happens-before” relations.

Instead of using and comparing time stamps to check whether Y has been performed after X happened last, introduce an explicit var y_done : Bool state, which is set to False by X and then to True by Y. When things become more complex, it will be easier to model that state via an enumeration with speaking tag names, and update this “state machine” along the way.

Another solution to this problem is to introduce a var v : Nat counter that you bump in every update method, and after each await. Now v is your canister’s state counter, and can be used like a timestamp in many ways.

While we are talking about time: The system time (typically) changes across an await. So if you do let now = Time.now() and then await, the value in now may no longer be what you want.

Wrapping arithmetic

The Nat64 data type, and the other fixed-width numeric types provide opt-in wrapping arithmetic (e.g. +%, fromIntWrap). Unless explicitly required by the current application, this should be avoided, as usually a too large or negatie value is a serious, unrecoverable logic error, and trapping is the best one can do.

Cycle balance drain attacks

Because of the IC’s “canister pays” model, all canisters are prone to DoS attacks by draining their cycle balance, and this risk needs to be taken into account.

The most elementary mitigation strategy is to monitor the cycle balance of canisters and keep it far from the (configurable) freezing threshold.

On the raw IC-level, further mitigation strategies are possible:

  • If all update calls are authenticated, perform this authentication as quickly as possible, possibly before decoding the caller’s argument. This way, a cycle drain attack by an unauthenticated attacker is less effective (but still possible).

  • Additionally, implementing the canister_inspect_message system method allows the above checks to be performed before the message even is accepted by the Internet Computer. But it does not defend against inter-canister messages and is therefore not a complete solution.

  • If an attack from an authenticated user (e.g. a stakeholder) is to be expected, the above methods are not effective, and an effective defense might require relatively involved additional program logic (e.g. per-caller statistics) to detect such an attack, and react (e.g. rate-limiting).

  • Such defenses are pointless if there is only a single method where they do not apply (e.g. an unauthenticated user registration method). If the application is inherently attackable this way, it is not worth the bother to raise defenses for other methods.

    Related: A justification why the Internet Identity does not use canister_inspect_message)

A motoko-implemented canister currently cannot perform most of these defenses: Argument decoding happens unconditionally before any user code that may reject a message based on the caller, and canister_inspect_message is not supported. Furthermore, Candid decoding is not very cycle defensive, and one should assume that it is possible to construct Candid messages that require many instructions to decode, even for “simple” argument type signatures.

The conclusion for the audited canisters is to rely on monitoring to keep the cycle blance up, even during an attack, if the expense can be born, and maybe pray for IC-level DoS protections to kick in.

Large data attacks

Another DoS attack vector exists if public methods allow untrustworthy users to send data of unlimited size that is persisted in the canister memory. Because of the translation of async-await code into multiple message handlers, this applies not only to data that is obviously stored in global state, but also local data that is live across an await point.

The effectiveness of such attacks is limited by the Internet Computer’s message size limit, which is in the order of a few megabytes, but many of those also add up.

The problem becomes much worse if a method has an argument type that allows a Candid space bomb: It is possible to encode very large vectors with all values null in Candid, so if any method has an argument of type [Null] or [?t], a small message will expand to a large value in the Motoko heap.

Other types to watch out:

  • Nat and Int: This is an unbounded natural number, and thus can be arbitrarily large. The Motoko representation will however not be much larger than the Candid encoding (so this does not qualify as a space bomb).

    It is still advisable to check if the number is reasonable in size before storing it or doing an await. For example, when it denotes an index in an array, throw early if it exceeds the size of the array; if it denotes a token amount to transfer, check it against the available balance, if it denotes time, check it against reasonable bounds.

  • Principal: A Principal is effectively a Blob. The Interface specification says that principals are at most 29 bytes in length, but the Motoko Candid decoder does not check that currently (fixed in the next version of Motoko). Until then, a Principal passed as an argument can be large (the principal in msg.caller is system-provided and thus safe). If you cannot wait for the fix to reach you, manually check the size of the princial (via Principal.toBlob) before doing the await.

Shadowing of msg or caller

Don’t use the same name for the “message context” of the enclosing actor and the methods of the canister: It is dangerous to write shared(msg) actor, because now msg is in scope across all public methods. As long as these also use public shared(msg) func …, and thus shadow the outer msg, it is correct, but it if one accidentially omits or mis-types the msg, no compiler error would occur, but suddenly msg.caller would now be the original controller, likely defeating an important authorization step.

Instead, write shared(init_msg) actor or shared({caller = controller}) actor to avoid using msg.

Conclusion

If you write a “serious” canister, whether in Motoko or not, it is worth to go through the code and watch out for these patterns. Or better, have someone else review your code, as it may be hard to spot issues in your own code.

Unfortunately, such a list is never complete, and there are surely more ways to screw up your code – in addition to all the non-IC-specific ways in which code can be wrong. Still, things get done out there, so best of luck!

A mostly allocation-free optional type

Published 2021-10-31 in sections English, Internet Computer.

The Motoko programming language has a built-in data type for optional values, named ?t with values null and ?v (for v : t); this is the equivalent of Haskell’s Maybe, Ocaml’s option or Rust’s Option. In this post, I explain how Motoko represents such optional values (almost) without allocation.

I neither claim nor expect that any of this is novel; I just hope it’s interesting.

Uniform representation

The Motoko programming language, designed by Andreas Rossberg and implemented by a pretty cool team at DFINITY is a high-level language with strict semantics and a strong, structural, equi-recursive type system that compiles down to WebAssembly.

Because the type system supports polymorphism, it represents all values uniformly. Simplified for the purpose of this blog post, they are always pointers into a heap object where the first word of the heap object, the heap tag, contains information about the value:

┌─────┬───┄
│ tag │ …
└─────┴───┄

The tag is something like array, int64, blob, variant, record, …, and it has two purposes:

  • The garbage collector uses it to understand what kind of object it is looking at, so that it can move it around and follow pointers therein. Variable size objects such as arrays include the object size in a subsequent word of the heap object.

  • Some types have values that may have different shapes on the heap. For example, the ropes used in our text representation can either be a plain blob, or a concatenation node of two blobs. For these types, the tag of the heap object is inspected.

The optional type, naively

The optional type (?t) is another example of such a type: Its values can either be null, or ?v for some value v of type t, and the primitive operations on this type are the two introduction forms, an analysis function, and a projection for non-null values:

null : () -> ?t
some : t -> ?t
is_some : ?t -> bool
project : ?t -> t     // must only be used if is_some holds

It is natural to use the heap tag to distinguish the two kind of values:

  • The null value is a simple one-word heap object with just a tag that says that this is null:

    ┌──────┐
    │ null │
    └──────┘
  • The other values are represented by a two-word object: The tag some, indicating that it is a ?v, and then the payload, which is the pointer that represents the value v:

    ┌──────┬─────────┐
    │ some │ payload │
    └──────┴─────────┘

With this, the four operations can be implemented as follows:

def null():
  ptr <- alloc(1)
  ptr[0] = NULL_TAG
  return ptr

def some(v):
  ptr <- alloc(2)
  ptr[0] = SOME_TAG
  ptr[1] = v
  return ptr

def is_some(p):
  return p[0] == SOME_TAG

def project(p):
  return p[1]

The problem with this implementation is that null() and some(v) both allocate memory. This is bad if they are used very often, and very liberally, and this is the case in Motoko: For example the iterators used for the for (x in e) construct have type

type Iter<T> = { next : () -> ?T }

and would unavoidably allocate a few words on each iteration. Can we avoid this?

Static values

It is quite simple to avoid this for for null: Just statically create a single null value and use it every time:

static_null = [NULL_TAG]

def null():
  return static_null

This way, at least null() doesn’t allocate. But we gain more: Now every null value is represented by the same pointer, and since the pointer points to static memory, it does not change even with a moving garbage collector, so we can speed up is_some:

def is_some(p):
  return p != static_null

This is not a very surprising change so far, and most compilers out there can and will do at least the static allocation of such singleton constructors.

For example, in Haskell, there is only a single empty list ([]) and a single Nothing value in your program, as you can see in my videos exploring the Haskell heap.

But can we get rid of the allocation in some(v) too?

Unwrapped optional values

If we don’t want to allocate in some(v), what can we do? How about simply

def some(v):
  return v

That does not allocate! But it is also broken. At type ??Int, the values null, ?null and ??null are distinct values, but here this breaks.

Or, more formally, the following laws should hold for our four primitive operations:

  1. is_some(null()) = false
  2. v. is_some(some(v)) = true
  3. p. project(some(p)) = p

But with the new definition of some, we’d get is_some(some(null())) = false. Not good!

But note that we only have a problem if we are wrapping a value that is null or some(v). So maybe take the shortcut only then, and write the following:

def some(v):
  if v == static_null || v[0] == SOME_TAG:
    ptr <- alloc(2)
    ptr[0] = SOME_TAG
    ptr[1] = v
    return ptr
  else:
    return v

The definition of is_some can stay as it is: It is still the case that all null values are represented by static_null. But the some values are now no longer all of the same shape, so we have to change project():

def project(p):
  if p[0] == SOME_TAG:
    return p[1]
  else:
    return p

Now things fall into place: A value ?v can, in many cases, be represented the same way as v, and no allocation is needed. Only when v is null or ?null or ??null or ???null etc. we need to use the some heap object, and thus have to allocate.

In fact, it doesn’t cost much to avoid allocation even for ?null:

static_some_null = [SOME_TAG, static_null]
def some(v):
  if v == static_null:
    return static_some_null
  else if v[0] == SOME_TAG:
    ptr <- alloc(2)
    ptr[0] = SOME_TAG
    ptr[1] = v
    return ptr
  else:
    return v

So unless one nests the ? type two levels deep, there is no allocation whatsoever, and the only cost is a bit more branching in some and project.

That wasn’t hard, but quite rewarding, as one can now use idioms like the iterator shown above with greater ease.

Examples

The following tables shows the representation of various values before and after. Here […] is a pointed-to dynamically allocated heap object, {…} a statically allocated heap object, N = NULL_TAG and S = SOME_TAG.

type value before after
Null null {N} {N}
?Int null {N} {N}
?Int ?23 [S,23] 23
??Int null {N} {N}
??Int ?null [S,{N}] {S,{N}}
??Int ??23 [S,[S,23]] 23
???Int null {N} {N}
???Int ?null [S,{N}] {S,{N}}
???Int ??null [S,[S,{N}]] [S,{S,{N}}]
???Int ???23 [S,[S,[S,23]]] 23

Concluding remarks

  • If you know what parametric polymorphism is, and wonder how this encoding can work in a language that has that, note that this representation is on the low-level of the untyped run-time value representation: We don’t need to know the type of v in some(v), merely its heap representation.

  • The uniform representation in Motoko is a bit more involved: The pointers are tagged (by subtracting 1) and some scalar values are represented directly in place (shifted left by 1 bit). But this is luckily orthogonal to what I showed here.

  • Could Haskell do this for its Maybe type? Not so easily:

    • The Maybe type is not built-in, but rather a standard library-defined algebraic data type. But the compiler could feasible detect that this is option-like?

    • Haskell is lazy, so at runtime, the type Maybe could be Nothing, or Just v, or, and this is crucial, a yet to be evaluated expression, also called a thunk. And one definitely needs to distinguish between a thunk t :: Maybe a that may evaluate to Nothing, and a value Just t :: Maybe a that definitely is Just, but contains a value, which may be a thunk.

    Something like this may work for a strict Maybe type or unlifted sums like (# (# #) | a #), but may clash with other ticks in GHC, such as pointer tagging.

  • As I said above, I don’t expect this to be novel, and I am happy to add references to prior art here.

  • Given that a heap object with tag SOME_TAG now always encodes a tower ?ⁿnull for n>0, one could try to optimize that even more by just storing the n:

    ┌──────┬─────┐
    │ some │  n  │
    └──────┴─────┘

    But that seems unadvisable: It is only a win if you have deep towers, which is probably rare. Worse, now the project function would have to return such a heap object with n decremented, so now projection might have to allocate, which goes against the cost model expected by the programmer.

  • If you rather want to see code than blog posts, feel free to check out Motoko PR #2115.

  • Does this kind of stuff excite you? Motoko is open source, so your contributions may be welcome!

A Candid explainer: Quirks

Published 2021-09-14 in sections English, Internet Computer.

This is the fifth and final post in a series about the interface description language Candid.

If you made it this far, you now have a good understanding of what Candid is, what it is for and how it is used. For this final post, I’ll put the spotlight on specific aspects of Candid that are maybe surprising, or odd, or quirky. This section will be quite opinionated, and could maybe be called “what I’d do differently if I’d re-do the whole thing”.

Note that these quirks are not serious problems, and they don’t invalidate the overall design. I am writing this up not to discourage the use of Candid, but merely help interested parties to understand it better.

References in the wire format

When the work on Candid began at DFINITY, the Internet Computer was still far away from being a thing, and many fundamental aspects about it were still in the air. I particular, there was still talk about embracing capabilities as a core feature of the application model, which would be implemented as opaque references on the system level, likely building on WebAssembly’s host reference type proposal (which only landed recently), and could be used to model access permissions, custom tokens and many other things.

So Candid is designed with that in mind, and you’ll find that its wire format is not just a type table and a value table, but actually

a triple (T, M, R), where T (“type”) and M (“memory”) are sequences of bytes and R (“references”) is a sequence of references.

Also the wire format for values of function service tyeps have an extra byte to distinguish between “public references” (represented by a principal and possible a method name in the data part), and these opaque references.

Alas, references never made it into the Internet Computer, so all Candid implementations simply ignore that part of the specification. But it’s still in the spec, and if it confused you before, now you know why.

Hashed field names

Candid record and variant types look like they have textual field names:

type T = record { syndactyle : nat; trustbuster: bool }

But that is actually only true superficially. The wire format for Candid only stores hashes of field names. So the above is actually equivalent to

type T = record { 4260381820 : nat; 3504418361 : bool }

or, for that matter, to

type T = record { destroys : bool; rectum : nat }

(Yes, I used an english word list to find these hash collisions. There aren’t that many actually.)

The benefit of such hashing is that the messages are a bit smaller in most (not all) cases, but it is a big annoyance for dynamic uses of Candid. It’s the reason why tools like dfx, if they don’t know the Candid interface of a service, will print the result with just the numerical hash, letting you guess which field is which.

It also complicates host languages that derive Candid types from the host language, like Motoko, as some records (e.g. record { trustbuster: bool; destroys : int }) with field name hash collisions can not be represented in Candid, and either the host language’s type system needs to be Candid aware now (as is the case of Motoko), or serialization/deserialization will fail at runtime, or odd bugs can happen.

(More discussion of this issue).

Tuples

Many languages have a built-in notion of a tuple type (e.g. (Int, Bool)), but Candid does not have such a type. The only first class product type is records.

This means that tuples have to encoded as records somehow. Conveniently(?) record fields are just numbers after all, so the type (Int, Bool) would be mapped to the type

record { 0 : int; 1 : bool }

So tuples can be expressed. But from my experience implementing the type mappings for Motoko and Haskell this is causing headaches. To get a good experience when importing from Candid, the tools have to try to reconstruct which records may have been tuples originally, and turn them into tuples.

The main argument for the status quo is that Candid types should be canonical, and there should not be more than one product type, and records are fine, and there needs to be no anonymous product type. But that has never quite resonated with me, given the practical reality of tuple types in many host languages.

Argument sequences

Did I say that Candid does not have tuple types? Turns out it does, sort of. There is no first class anonymous product, but since functions take sequences of arguments and results, there is a tuple type right there:

func foo : (bool, int) -> (int, bool)

Again, I found that ergonomic interaction with host languages becomes relatively unwieldy by requiring functions to take and return sequences of values. This is especially true for languages where functions take one argument value or return one result type (the latter being very common). Here, return sequences of length one are turned into that type directly, longer argument sequences turn into the host language’s tuple type, and nullary argument sequences turn into the idiomatic unit type. But this means that the types (int, bool) -> () and (record { 0: int, 1: bool}) -> () may be mapped to the same host language type, which causes problems when you hope to encode all necessary Candid type information in the host language.

Another oddity with argument and result sequences is that you can give names to the entries, e.g. write

func hello : (last_name : text; first_name : text) -> ()

but these names are completely ignored! So while this looks like you can, for example, add new optional arguments in the middle, such as

func hello : (last_name : text; middle_name: opt text, first_name : text) -> ()

without breaking clients, this does not have the effect you think it has and will likely break.

My suggestion is to never put names on function arguments and result values in Candid interfaces, and for anything that might be extended with new fields or where you want to name the arguments, use a single record type as the only argument:

func hello : (record { last_name : text; first_name : text}) -> ()

This allows you to add and remove arguments more easily and reliably.

Type “shorthands”

The Candid specification defines a system of types, and then adds a number of “syntactic short-hands”. For example, if you write blob in a Candid type description, it ought to means the same as vec nat8.

My qualm with that is that it doesn’t always mean the same. A Candid type description is interpreted by a number of, say, “consumers”. Two such consumers are part of the Candid specification:

  • The specification that defines the wire format for that type
  • The upgrading (subtyping) rules

But there are more! For every host language, there is some mapping from Candid types to host language types, and also generic tools like Candid UI are consumers of the type algebra. If these were to take the Candid specification as gospel, they would be forced to treat blob and vec nat8 the same, but that would be quite unergonomic and might cause performance regressions (most language try to map blob to some compact binary data type, while vec t tends to turn into some form of array structure).

So they need to be pragmatic and treat blob and vec nat8 differently. But then, for all practical purposes, blob is not just a short-hand of vec nat8. They are different types that just happens to have the same wire representations and subtyping relations.

This affects not just blob, but also “tuples” (record { blob; int; bool }) and field “names”, as discussed above.

The value text format

For a while after defining Candid, the only implementation was in Motoko, and all the plumbing was automatic, so there was never a need for users to to explicitly handle Candid values, as all values were Motoko values. Still, for debugging and testing and such things, we eventually needed a way to print out Candid values, so the text format was defined (“To enable convenient debugging, the following grammar specifies a text format for values…”).

But soon the dfx tool learned to talk to canisters, and now users needed to enter Candid value on the command line, possibly even when talking to canisters for which the interface was not known to dfx. And, sure enough, the textual interface defined in the Candid spec was used.

Unfortunately, since it was not designed for that use case, it is rather unwieldy:

  • It is quite verbose. You have to write record { … }, not just { … }. Vectors are written vec { …; …} instead of some conventional syntax like […, …]. Variants are written as variant { error = "…"} with braces that don’t any value here, and something like #error "…" might have worked as well.

    With a bit more care, a more concise and ergonomic syntax might have been possible.

  • It wasn’t designed to be sufficient to create a Candid value from it. If you write 5 it’s unclear whether that’s a nat or an int16 or what (and all of these have different wire representations). Type annotations were later added, but are relatively unwieldy, and don’t cover all cases (e.g. a service reference with a recursive type cannot be represented in the textual format at the moment).

  • Not really the fault of the textual format, but some useful information about the types is not reflected in the type description that’s part of the wire format. In particular not the field names, and whether a value was intended to be binary data (blob) or a list of small numbers (vec nat8), so pretty-printing such values requires guesswork. The Haskell library even tries to brute-force the hash to guess the field name, if it is short or in a english word list!

In hindsight I think it was too optimistic to assume that correct static type information is always available, and instead of actively trying to discourage dynamic use, Candid might be better if we had taken these (unavoidable?) use cases into account.

Custom wire format

At the beginning of this post, I have a “Candid is …” list. The list is relatively long, and the actual wire format is just one bullet point. Yes, defining a wire format that works isn’t rocket science, and it was easiest to just make one up. But since most of the interesting meat of Candid is in other aspects (subtyping rules, host language integration), I wonder if it would have been better to use an existing, generic wire format, such as CBOR, and build Candid as a layer on top.

This would give us plenty of tools and libraries to begin with. And maybe it would have reduced barrier of entry for developers, which now led to the very strange situation that DFINITY advocates for the universal use of Candid on the Internet Computer, so that all services can smoothly interact, but two of the most important services on the Internet Computer (the registry and the ledger) use Protobuf as their primary interface format, with Candid interfaces missing or an afterthought.

The type constructor opcodes

A small quirk of the wire format: It defines op-codes for every type constructor, both primitive (bool…) and composite (vec…), and the op-codes from primitive and composite types share a common namespace. This is a left-over from earlier iterations of the design where primitive and composite types may have been encoded together. But eventually, the “type table” was introduced, and composite types are now always referenced via their index in the type table, and the type table contains only composite types. In other words: In any place, one either deals with primitive types/table indices, or composite types. I think it might have been clearer and more honest if primitive and composite types got numbered independently.

Sideways Interface Evolution

This is not a quirk of Candid itself, but rather an idiom of how you can use Candid that emerged from our solution for record extensions and upgrades.

Consider our example from before, a service with interface

service { add_user : (record { login : text; name : text }) -> () }

where you want to add an age field, which should be a number.

The “official” way of doing that is to add that field with an optional type:

service { add_user : (record { login : text; name : text; age : opt nat }) -> () }

As explained above, this will not break old clients, as the decoder will treat a missing argument as null. So far so good.

But often when adding such a field you don’t want to bother new clients with the fact that this age was, at some point in the past, not there yet. And you can do that! The trick is to distinguish between the interface you publish and the interface you implement. You can (for example in your documentation) state that the interface is

service { add_user : (record { login : text; name : text; age : nat }) -> () }

which is not a subtype of the old type, but it is the interface you want new clients to work with. And then your implementation uses the type with opt nat. Calls from old clients will come through as null, and calls from new clients will come through as opt 42.

We can see this idiom used in the Management Canister of the Internet Computer. The current documented interface only mentions a controllers : vec principal field in the settings, but the implementation still can handle both the old controller : principal and the new controllers field.

It’s probably advisable to let your CI system check that new versions of your service continue to implement all published interfaces, including past interfaces. But as long as the actual implementation’s interface is a subtype of all interfaces ever published, this works fine.

This pattern is related to when your service implements, say, http_request (so its implemented interface is a subtype of that common interface), but does not include that method in the published documentation (because clients of your service don’t need to call it).

Self-describing Services

As you have noticed, Candid was very much designed assuming that all parties always have the service type of services they want to interact with. But the Candid specification does not define how one can obtain the interface of a given service, and there isn’t really a an official way to do that on the Internet Computer.

That is unfortunate, because many interesting features depend on that: Such as writing import C "ic:7e6iv-biaaa-aaaaf-aaada-cai" in your Motoko program, and having it’s type right there. Or tools like ic.rocks, that allow you to interact with any canister right there.

One reason why we don’t really have that feature yet is because of disagreements about how dynamic that feature should be. Should you be able to just ask the canister for its interface (and allow the canister to vary the response, for example if it can change its functionality over time, even without changing the actual wasm code)? Or is the interface a static property of the code, and one should be able to query the system for that data, without the canister’s active involvement. Or, completely different, should interfaces be distributed out of band, maybe together with API documentation, or in some canister registry somewhere else?

I always leaned towards the first of these options, but not convincingly enough. The second options requires system assistance, so more components to change, more teams to be involved that maybe intrinsically don’t care a lot about this feature. And the third might have emerged as the community matures and builds such infrastructure, but that did not happen yet.

In the end I sneaked in an implementation of the first into Motoko, arguing that even if we don’t know yet how this feature will be implemented eventually, we all want the feature to exist somehow, and we really really want to unblock all the interesting applications it enables (e.g. Candid UI). That’s why every Motoko canister, and some rust canisters too, implements a method

__get_candid_interface_tmp_hack : () -> (text)

that one can use to get the Candid interface file.

The name was chosen to signal that this may not be the final interface, but like all good provisional solutions, it may last longer than intended. If that’s the case, I’m not sorry.

This concludes my blog post series about Candid, for now. If you want to know more, feel free to post your question on the DFINTY developer forum, and I’ll probably answer.