Joachim Breitner

A Candid explainer: Opt is special

Published 2021-09-07 in sections English, Digital World.

This is the third post in a series about the interface description language Candid.

The record extension problem

Initially, the upgrade rules of Candid were a straight-forward application of the canonical subtyping rules. This worked and was sound, but it forbid one very commonly requested use case: Extending records in argument position.

The subtyping rule for records say that

   record { old_field : nat; new_field : int }
<: record { old_field : nat }

or, in words, that subtypes can have more field than supertypes. This makes intuitive sense: A user of a such a record value that only expects old_field to be there is not bothered if suddenly a new_field appears. But a user who expects new_field to be there is thrown off if it suddenly isn’t anymore. Therefore it is ok to extend the records returned by a service with new fields, but not to extend the record in your methods’s argument types.

But users want to extend their record types over time, also in argument position!

In fact, they often want to extend them in both positions. Consider a service with the following interface, where the CUser record appears both in argument and result position:

type CUser = record { login : text; name : text };
service C : {
  register_user : (CUser) -> ();
  get_user_data : (text) -> (CUser);
}

It seems quite natural to want to extend the record with a new field (say, the age of the user). But simply changing the definition of CUser to

type CUser = record { login : text; name : text; age : nat }

is not ok, because now register_user requires the age field, but old clients don’t provide it.

So how can we allow developers to make changes like this (because they really really want that), while keeping the soundness guarantees made by Candid (because we really really want that)? This question has bothered us for over two years, and we even created a meta issue that records the dozen approached we considered, discussed and eventually ditched.

Dynamic subtyping checks in opt

I will spare you the history lesson, though, and explain the solution we eventually came up with.

In the example above it seems unreasonable to let the developer add a field age of type nat. Since there may be old clients around, the service unavoidably has to deal with records that don’t have an age field. If the code expects an age value, what should it do in that case?

But one can argue that changing the record type to

type CUser = record { login : text; name : text; age : opt nat }

could work: If the age field is present, use the value. If the field is absent, inject a null value during decoding.

In the first-order case, this rather obvious solution would work just fine, and we’d be done. But Candid aims to solve the higher order case, and I said things get tricky here, didn’t I?

Consider another, independent service D with the following interface:

type DUser = record { login : text; name : text };
service D : {
  on_user_added : (func (DUser) -> ()) -> ()
}

This service has a method that takes a method reference, presumably with the intention of calling it whenever a new user was added to service D. And because the types line up nicely, we can compose these two services, by once calling D.on_user_added(C.register_user), so that from now on D calls C.register_user(…). These kind of service mesh-ups are central to the vision of the Internet Computer!

But what happens if the services now evolve their types in different ways, e.g. towards

type CUser = record { login : text; name : text; age : opt nat }

and

type DUser = record { login : text; name : text; age : opt variant { under_age; adult }}

Individually, these are type changes that we want to allow. But now the call from D to C may transmit a value of record { …; age = opt variant { under_age } } when C expects a natural number! This is precisely what we want to prevent by introducing types, and it seems we have lost soundness.

The best solution we could find is to make the opt type somewhat special, and apply these extra rules when decoding at an expected type of opt t.

  • If this was a record field, and the provided record value doesn’t even have a field of that name, don’t fail but instead treat this as null. This handles the first-order example above.

  • If there is a value, look at its type (which, in Candid, is part of the message).

    • If it is opt t' and t' <: t, then decode as normal.

    • If it is opt t' but t' <: t does not hold, then treat that as null.

      This should only happen in these relatively obscure higher-order cases where services evolved differently and incompatibly, and makes sure that the calls that worked before the upgrades will continue to work afterwards.

      It is not advisable to actually use that rule when changing your service’s interface. Tools that assist the developer with an upgrade check should prevent or warn about the use of this rule.

  • Not strictly required here, but since we are making opt special anyways:

    If its type t' is not of the form opt …, pretend it was opt t', and also pretend that the given value was wrapped in opt.

    This allows services to make record field in arguments that were required in the old version optional in a new version, e.g. as a way to deprecated them. So it is mildly useful, although I can report that it makes the meta-theory and implementation rather complex, in particular together with equirecursion and beasts like type O = opt O. See this discussion for a glimpse of that.

In the above I stress that we look at the type of the provided value, and not just the value. For example, if the sender sends the value opt vec {} at type opt vec nat, and the recipient expects opt vec bool, then this will decode as null, even though one could argue that the value opt vec {} could easily be understood at type opt vec bool. We initially had that, but then noticed that this still breaks soundness when there are references inside, and we have to do a full subtyping check in the decoder. This is very unfortunate, because it makes writing a Candid decoder a noticeably harder task that requires complicated graph algorithms with memoization (which I must admit Motoko has not implemented yet), but it’s the least bad solution we could find so far.

Isn’t that just dynamic typing?

You might have noticed that with these rules, t <: opt t' holds for all types t and t'. In other words, every opt … is a top type (like reserved), thus all optional types are equivalent. One could argue that we are throwing all of the benefits of static typing over board this way. But it’s not quite as bad: It’s true that decoding Candid values now involves a dynamic check inserting null values in certain situations, but as long as everyone plays by the rules (i.e. upgrades their services according to the Candid safe upgrade rules, and heeds the warning alluded to above), these certain situations will only happen in the obscurest of cases.

It is, by the way, not material to this solution that the special subtyping behavior is applied to “the” opt type, and a neighboring point in the design space would be a dedicated upgraded t type constructor. That would allow developers to use the canonical opt type with the usual subtyping rules and communicate their intent (“clients should consider this field optional” vs. “old clients don’t know about this field, but new clients should use it”) more cleanly – at the expense of having more “non-canonical” types in the type system.

To see why it is convenient if Candid has mostly just the “normal” types, read the next post, which will describe how Candid can be integrated into a host language.

Comments

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.