Joachim Breitner

Finding bugs in Haskell code by proving it

Published 2017-12-05 in sections English, Coq.

Last week, I wrote a small nifty tool called bisect-binary, which semi-automates answering the question “To what extent can I fill this file up with zeroes and still have it working”. I wrote it it in Haskell, and part of the Haskell code, in the Intervals.hs module, is a data structure for “subsets of a file” represented as a sorted list of intervals:

data Interval = I { from :: Offset, to :: Offset }
newtype Intervals = Intervals [Interval]

The code is the kind of Haskell code that I like to write: A small local recursive function, a few guards to case analysis, and I am done:

intersect :: Intervals -> Intervals -> Intervals
intersect (Intervals is1) (Intervals is2) = Intervals $ go is1 is2
  where
    go _ [] = []
    go [] _ = []
    go (i1:is1) (i2:is2)
        -- reorder for symmetry
        | to i1 < to i2 = go (i2:is2) (i1:is1)
        -- disjoint
        | from i1 >= to i2 = go (i1:is1) is2
        -- subset
        | to i1 == to i2 = I f' (to i2) : go is1 is2
        -- overlapping
        | otherwise = I f' (to i2) : go (i1 { from = to i2} : is1) is2
      where f' = max (from i1) (from i2)

But clearly, the code is already complicated enough so that it is easy to make a mistake. I could have put in some QuickCheck properties to test the code, I was in proving mood…

Now available: Formal Verification for Haskell

Ten months ago I complained that there was no good way to verify Haskell code (and created the nifty hack ghc-proofs). But things have changed since then, as a group at UPenn (mostly Antal Spector-Zabusky, Stephanie Weirich and myself) has created hs-to-coq: a translator from Haskell to the theorem prover Coq.

We have used hs-to-coq on various examples, as described in our CPP’18 paper, but it is high-time to use it for real. The easiest way to use hs-to-coq at the moment is to clone the repository, copy one of the example directories (e.g. examples/successors), place the Haskell file to be verified there and put the right module name into the Makefile. I also commented out parts of the Haskell file that would drag in non-base dependencies.

Massaging the translation

Often, hs-to-coq translates Haskell code without a hitch, but sometimes, a bit of help is needed. In this case, I had to specify three so-called edits:

  • The Haskell code uses Intervals both as a name for a type and for a value (the constructor). This is fine in Haskell, which has separate value and type namespaces, but not for Coq. The line

      rename value Intervals.Intervals = ival

    changes the constructor name to ival.

  • I use the Int64 type in the Haskell code. The Coq version of Haskell’s base library that comes with hs-to-coq does not support that yet, so I change that via

      rename type GHC.Int.Int64 = GHC.Num.Int

    to the normal Int type, which itself is mapped to Coq’s Z type. This is not a perfect fit, and my verification would not catch problems that arise due to the boundedness of Int64. Since none of my code does arithmetic, only comparisons, I am fine with that.

  • The biggest hurdle is the recursion of the local go functions. Coq requires all recursive functions to be obviously (i.e. structurally) terminating, and the go above is not. For example, in the first case, the arguments to go are simply swapped. It is very much not obvious why this is not an infinite loop.

    I can specify a termination measure, i.e. a function that takes the arguments xs and ys and returns a “size” of type nat that decreases in every call: Add the lengths of xs and ys, multiply by two and add one if the the first interval in xs ends before the first interval in ys.

    If the problematic function were a top-level function I could tell hs-to-coq about this termination measure and it would use this information to define the function using Program Fixpoint.

    Unfortunately, go is a local function, so this mechanism is not available to us. If I care more about the verification than about preserving the exact Haskell code, I could easily change the Haskell code to make go a top-level function, but in this case I did not want to change the Haskell code.

    Another way out offered by hs-to-coq is to translate the recursive function using an axiom unsafeFix : forall a, (a -> a) -> a. This looks scary, but as I explain in the previous blog post, this axiom can be used in a safe way.

    I should point out it is my dissenting opinion to consider this a valid verification approach. The official stand of the hs-to-coq author team is that using unsafeFix in the verification can only be a temporary state, and eventually you’d be expected to fix (heh) this, for example by moving the functions to the top-level and using hs-to-coq’s the support for Program Fixpoint.

With these edits in place, hs-to-coq splits out a faithful Coq copy of my Haskell code.

Time to prove things

The rest of the work is mostly straight-forward use of Coq. I define the invariant I expect to hold for these lists of intervals, namely that they are sorted, non-empty, disjoint and non-adjacent:

Fixpoint goodLIs (is : list Interval) (lb : Z) : Prop :=
  match is with
    | [] => True
    | (I f t :: is) => (lb <= f)%Z /\ (f < t)%Z /\ goodLIs is t
   end.

Definition good is := match is with
  ival is => exists n, goodLIs is n end.

and I give them meaning as Coq type for sets, Ensemble:

Definition range (f t : Z) : Ensemble Z :=
  (fun z => (f <= z)%Z /\ (z < t)%Z).

Definition semI (i : Interval) : Ensemble Z :=
  match i with I f t => range f t end.

Fixpoint semLIs (is : list Interval) : Ensemble Z :=
  match is with
    | [] => Empty_set Z
    | (i :: is) => Union Z (semI i) (semLIs is)
end.

Definition sem is := match is with
  ival is => semLIs is end.

Now I prove for every function that it preserves the invariant and that it corresponds to the, well, corresponding function, e.g.:

Lemma intersect_good : forall (is1 is2 : Intervals),
  good is1 -> good is2 -> good (intersect is1 is2).
Proof. … Qed.

Lemma intersection_spec : forall (is1 is2 : Intervals),
  good is1 -> good is2 ->
  sem (intersect is1 is2) = Intersection Z (sem is1) (sem is2).
Proof. … Qed.

Even though I punted on the question of termination while defining the functions, I do not get around that while verifying this, so I formalize the termination argument above

Definition needs_reorder (is1 is2 : list Interval) : bool :=
  match is1, is2 with
    | (I f1 t1 :: _), (I f2 t2 :: _) => (t1 <? t2)%Z
    | _, _ => false
  end.

Definition size2 (is1 is2 : list Interval) : nat :=
  (if needs_reorder is1 is2 then 1 else 0) + 2 * length is1 + 2 * length is2.

and use it in my inductive proofs.

As I intend this to be a write-once proof, I happily copy’n’pasted proof scripts and did not do any cleanup. Thus, the resulting Proof file is big, ugly and repetitive. I am confident that judicious use of Coq tactics could greatly condense this proof.

Using Program Fixpoint after the fact?

This proofs are also an experiment of how I can actually do induction over a locally defined recursive function without too ugly proof goals (hence the line match goal with [ |- context [unsafeFix ?f _ _] ] => set (u := f) end.). One could improve upon this approach by following these steps:

  1. Define copies (say, intersect_go_witness) of the local go using Program Fixpoint with the above termination measure. The termination argument needs to be made only once, here.

  2. Use this function to prove that the argument f in go = unsafeFix f actually has a fixed point:

    Lemma intersect_go_sound:

    f intersect_go_witness = intersect_go_witness

    (This requires functional extensionality). This lemma indicates that my use of the axioms unsafeFix and unsafeFix_eq are actually sound, as discussed in the previous blog post.

  3. Still prove the desired properties for the go that uses unsafeFix, as before, but using the functional induction scheme for intersect_go! This way, the actual proofs are free from any noisy termination arguments.

    (The trick to define a recursive function just to throw away the function and only use its induction rule is one I learned in Isabelle, and is very useful to separate the meat from the red tape in complex proofs. Note that the induction rule for a function does not actually mention the function!)

Maybe I will get to this later.

Update: I experimented a bit in that direction, and it does not quite work as expected. In step 2 I am stuck because Program Fixpoint does not create a fixpoint-unrolling lemma, and in step 3 I do not get the induction scheme that I was hoping for. Both problems would not exist if I use the Function command, although that needs some tickery to support a termination measure on multiple arguments. The induction lemma is not quite as polished as I was hoping for, so he resulting proof is still somewhat ugly, and it requires copying code, which does not scale well.

Efforts and gains

I spent exactly 7 hours working on these proofs, according to arbtt. I am sure that writing these functions took me much less time, but I cannot calculate that easily, as they were originally in the Main.hs file of bisect-binary.

I did find and fix three bugs:

  • The intersect function would not always retain the invariant that the intervals would be non-empty.
  • The subtract function would prematurely advance through the list intervals in the second argument, which can lead to a genuinely wrong result. (This occurred twice.)

Conclusion: Verification of Haskell code using Coq is now practically possible!

Final rant: Why is the Coq standard library so incomplete (compared to, say, Isabelle’s) and requires me to prove so many lemmas about basic functions on Ensembles?

Comments

Why is the Coq standard library so incomplete (compared to, say, Isabelle’s) and requires me to prove so many lemmas about basic functions on Ensembles?

Shameless plug: Have you looked at std++? See https://gitlab.mpi-sws.org/robbertkrebbers/coq-stdpp/. ;)

#1 Ralf Jung am 2017-12-05

The limitation wrt local recursion: is that due to the current state of hs-to-coq (if implemented differently, this would work) or Coq itself (we can’t build the theory machinery in Coq needed to support local recursive bindings in the source language)?

#2 Andrew Adams-Moran am 2017-12-06

I’d say it’s Coq itself. The built in fix construct allows local recusion, and hs-to-coq can use it, but the more advanced variants like Program Fixpoint or Function all only support top-level functions. But it is a tooling question – Coq’s theory would allow that.

#3 Joachim Breitner am 2017-12-06

Equations supports the definition of local well-founded fixpoints (at any « where » node, see the RoseTree example on the webpage), indeed this is just a tooling issue. We’re working on providing also a Coq-term-to-splitting-tree translation (the internal representation used by Equations) that would make any Coq definition benefit from the improved generation of functional elimination principles in Equations from Coq definitions (I expect for hs-to-coq programs it could always generate it). You could also consider targeting Equations frontend directly, as it could handle also some subset of Haskell’s GADT pattern-matching directly (don’t know exactly when/how it’s « compiled away » to management of equalities in GHC though). Anyway, this is neat!

#4 Matthieu Sozeau am 2017-12-07

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.