\ProvideTextCommandDefault{\textalpha}{\ensuremath\alpha}
\ProvideTextCommandDefault{\textGamma}{\ensuremath\Gamma}
\ProvideTextCommandDefault{\textDelta}{\ensuremath\Delta}

\tikzset{%
    every node/.style={
	inner sep=1pt
    },
    every loop/.style={}
}

\lettrine[nindent=0ex]T{he} previous chapter introduced a new analysis and transformation for an optimising compiler, and analyses it in the usual detail: A somewhat formal description of the transformation is accompanied by empirical evidence of its usefulness based on benchmark results.

That none of the benchmarks exhibited reduced performance, at least when measured by the number of allocations, suggests that the transformation is indeed a \emph{safe} optimisation, i.e.\@ does not  make matters worse. Nevertheless, I found this unsatisfying: The benchmark suite only contains a small number of example programs -- how can I be sure that my transformation really never makes matters worse?

To that end, I want to actually prove that my transformation is safe, and I want to do it in a machine-verified way, in order to attain the highest level of assurance that the proof is correct.

Therefore, I set out to go all the way: I took the Call Arity analysis, formalised it in the interactive theorem prover Isabelle and created a machine-checked proof not only of functional correctness, but also that the performance of the transformed program is not worse than that of the original program. This chapter, parts of which I have presented at the Haskell Symposium 2015 \cite{call-arity-safe}, describes this endeavour.

Recall that it is only safe to eta-expand a thunk if the thunk is called at most once. So an arity analysis requires a cardinality analysis, which determines how often a function or a thunk is called, in order to be able to eta-expand a thunk. If the cardinality analysis were wrong and we would eta-expand a thunk that is called multiple times, we would lose the benefits of sharing and suddenly repeat work.

A correctness proof with regard to a standard denotational semantics would not rule that out!
A more detailed semantics is required instead. I use the abstract machine introduced in \cref{ch:launchbury}, where the explicit heap allows me to prove that the number of heap allocations does not increase by transforming the program. This is a suitable criterion for safety in this context, as explained shortly.

% \medskip
% \noindent Our contributions are:
% \begin{compactitem}
% \item We provide a rigorous formal description of Call Arity and prove that it is safe, i.e.\@ the transformed program does not perform more allocations than the original program.
% \item Our proof is modular. We cleanly separate the arity analysis part (\cref{sec:arityanal}) from the cardinality part, and divide the cardinality part into a three-stage refinement proof (\cref{sec:cardanal}). This gives greater insight into their interaction, and provides reusable components for similar proofs.
% \item We introduce infinite \emph{trace trees} (\cref{sec:tracetrees}) as a suitable domain for an abstract cardinality analysis. % used in one of the refinements.
% \item We formalized a suitable semantics akin to Sestoft’s mark~1 abstract machine, the Call Arity analysis, the transformation and the safety proof in the theorem prover Isabelle. This gives us very high assurance of the correctness of this work, but also provides a data point on the question of how feasible machine-checked proofs of compiler transformations currently are (\cref{sec:formalisation}).
% \item Finally, and of more general interest,
% we critically discuss the formalisation gap left by our formalisation and find that the gap is not always bridgeable by meta-arguments. In particular, we explain how the interaction of Call Arity with other components of the compiler effected a serious and intricate bug, despite the proof of correctness
% (\cref{sec:formalisationgap}).
% \end{compactitem}

\section{Proof outline}

In my introduction of the Call Arity analysis in \cref{ch:call-arity}, I explain and motivate the various details of the definition. These might be convincing points that Call Arity might indeed be safe, but are far from a general, rigorous proof. How can I go about producing such a proof?

First, I need to find a suitable semantics. The elegant standard denotational semantics for functional programs, such as the one in \cref{sec:denotational_semantics}, are unfortunately too abstract and admit no observation of program performance. Therefore, I use a standard small-step operational semantics similar to Sestoft's mark~1 abstract machine, introduced in \cref{sec:sestoftssemantics}.

With that semantics, I could have followed Sands \cite{Moran:Sands:Need} and measured performance by counting evaluation steps. But that is too finegrained: The eta-expansion transformation causes additional beta-reductions to be performed during evaluation, and without subsequent simplification -- which does happen in a real compiler, but which I do not want to include in the proof -- these increase the number of steps in my semantics.

\label{sec:whyallocations}
Therefore, I measure the performance by counting the number of allocations performed during the evaluation. This is sufficient to detect accidental duplication of work, as shown by this gedankenexperiment: Consider a program $e_1$, which is transformed to $e_2$, and a subexpression $e$ of $e_1$ that also occurs in $e_2$. By replacing $e$ with \mbox{\keyword{let} \hi{x1} = \hi{x1},\ldots, \hi{xn} = \hi{xn} \keyword{in} $e$}, where the variables are fresh, we can force each evaluation of $e$ to perform at least $n$ allocations, for an arbitrary large choice of $n$. So if $e_2$ happens to evaluate $e$ more often than $e_1$, we can choose $n$ large enough to make $e_2$ allocate more than $e_1$. Conversely, if our criterion holds, we can conclude that the transformation does not duplicate work.

This measure is also realistic: When working on GHC, the number of bytes allocated by a benchmark or a test case is the prime measure that developers observe to detect improvements and regressions, as in practice, it correlates very well with execution time and memory usage, while being more stable across differing environments.

A transformation is \emph{safe} in this sense if the transformed program performs no more allocations than the original program.

The arity transformation eta-expands expressions, so in order to prove it safe, I identify conditions when eta-expansion itself is safe, and ensure that these conditions are always met.

A sufficient condition for the safety of an $n$-fold eta-expansion of an expression $e$ is that whenever $e$ is evaluated, the top $n$ elements on the stack are arguments, and neither continuations of a case expression, as eta-expansion would introduce a type error, nor update markers for thunks, as eta-expansion would prevent the sharing.  This is stated as \cref{lem:eta_exp_safe}.

The safety proof for the arity analysis (\cref{lem:arity-safety}) keeps track of some invariants during the evaluation which ensure that we can apply \cref{lem:eta_exp_safe} whenever an eta-expanded expression is about to be evaluated.

I perform the proof first for a naive arity analysis without a cardinality analysis, i.e.\@ one that needs to be conservative in this case. This corresponds to previous work on arity analysis, and furthermore, this is sufficient to prove that the transformation is semantics preserving (\cref{thm:call-arity-correct}).

I then formally introduce the concept of a cardinality analysis in \cref{sec:cardanal}. I do not simply prove safety of the co-call graph based analysis directly, but split it up into a series of increasingly concrete proofs, each building on the result of the previous, for two reasons:
\begin{itemize}
\item It is nice to separate various aspects of the proof (i.e.\@ the interaction of the arity analysis with the cardinality analysis; the gap between the steps of the semantics and the structurally recursive nature of the analysis; different treatments of recursive and non-recursive bindings) into individual steps, but more importantly\goodbreak
\item while the co-call graph data structure is sufficiently expressive to implement the analysis, it is an unsuitable abstraction for the safety proof.
There, we need to describe the possibly complex recursion patterns of the heap as a whole with sufficient detail to identify and make use that some expressions call each other in a nice and linear fashion.
\end{itemize}

In the first refinement, the cardinality analysis is completely abstract: Its input is the whole configuration and its result is simply which variables on the heap are going to be called more than once.
% In our example, after \hi{t1}, \hi{t2} and \hi g have been put on the heap, this analysis would find out that \hi{t2} and \hi g are called more than once, but not \hi{t1}.
We give conditions (\cref{def:cardspec}) when an arity analysis using such a cardinality analysis is safe (\cref{lem:card-arity-safety}).

The next refinement assumes a cardinality analysis that now looks just at expressions, not whole configurations, and returns a much richer analysis result: A \emph{trace tree} which is a (possibly) infinite tree where each path corresponds to one possible execution and the edges are labelled by the variables called during that evaluation.

% In our example, the tree corresponding to the right-hand-side of \hi g, namely
% \begin{center}
% \begin{tikzpicture}[grow=right, sibling distance=12, level distance=18, every node/.style={inner sep=0pt}]
% \coordinate[circle, fill, inner sep=1]
%  child { coordinate child { coordinate child { coordinate
%  edge from parent node[above] {\hi g} } edge from parent node[below] {\hi{t2}} } child { coordinate edge from parent node[above] {\hi{t1}} } edge from parent node[above] {\hi p} }
%  ;
% \end{tikzpicture}
% \end{center}
% can be combined with the very simple tree \singletontree{\hi g} from the body of the inner \keyword{let} to form the infinite tree
% \begin{center}
% \begin{tikzpicture}[grow=right, sibling distance=12, level distance=18, every node/.style={inner sep=0pt}]
% \coordinate[circle, fill, inner sep=1]
%  child {
%    coordinate
% %
%  child { coordinate child { coordinate child { coordinate
%  child { coordinate child { coordinate child { coordinate
%  child { coordinate child { coordinate child { coordinate
% %
% child {coordinate edge from parent [dotted]}
% %
%  edge from parent node[above] {\hi g} } edge from parent node[below] {\hi{t2}} } child { coordinate edge from parent node[above] {\hi{t1}} } edge from parent node[above] {\hi p} }
%  edge from parent node[above] {\hi g} } edge from parent node[below] {\hi{t2}} } child { coordinate edge from parent node[above] {\hi{t1}} } edge from parent node[above] {\hi p} }
%  edge from parent node[above] {\hi g} } edge from parent node[below] {\hi{t2}} } child { coordinate edge from parent node[above] {\hi{t1}} } edge from parent node[above] {\hi p} }
% %
%    edge from parent node[above] {\hi g}
%  }
%  ;
% \end{tikzpicture}
% \end{center}
% which describes the overall sequence of calls. Clearly, on every possible path, \hi{t1} is called at most once.

Given such a trace tree analysis, an abstract analysis as described in the first refinement can be implemented: The trees describing the expressions in a configuration
(on the heap, as the control or in the stack) can be combined to a tree describing the behaviour of the whole configuration. This calculation, named $s$ in \cref{sec:function-s}, is quite natural for trace trees, but would be hard to define on co-call graphs only. From that tree I can determine the cardinalities of the individual variables.
I specify conditions on the trace tree analysis (\cref{def:treespec}) and in \cref{lem:tree-safety} show them to be sufficient to fulfil the specification of the first refinement.


The third and final refinement assumes an analysis that returns a co-call graph for each expression. Co-call graphs can be seen as compact approximations of trace trees, with edges between variables that can occur on the same path in the tree. The specification in \cref{def:cocallspec} is shown in \cref{lem:cocall-safety} to be sufficient to fulfil the specification of the second refinement.

Eventually, I give the definition of the real Call Arity analysis in \cref{sec:callarityconcretely}, and as it fulfils the specification of the final refinement, the desired safety theorem (\cref{thm:cocallsafety}) follows.

%The following three technical sections necessarily omit some detail, especially in the proofs. But since the machine-checked formalisation exists, such omissions needn't cause worry. The full Isabelle code is available at \cite{arity-afp}; the proof document contains a table that maps the definitions and lemmas of this paper to the corresponding entities in the Isabelle development.

\section{Arity analyses}
\label{sec:arityanal}

In general, an arity analysis is a function that, given a binding $(\Gamma, e)$, consisting of variable names bound to right-hand-sides in $\Gamma $ and the body $e$, determines the arity of each of the bound expressions. It depends on the number $\alpha $ of arguments passed to $e$ and may return $\bot$ for a name that is not called at all:%
\index{aheap@$\aheap\idxdummy\idxdummy\idxdummy$\idxexpl{arity analysis of a binding}}
\[
\aheap{\Gamma }{e}{\alpha } \colon \sVar \rightarrow \mathbb N_\bot.
\]


Given such an analysis, we can run it over a program and transform it accordingly. The transformation function traverses the syntax tree, keeping track of the number of arguments passed along the way:%
\index{trans@$\trans\idxdummy\idxdummy$\idxexpl{arity transformation of an expression}}
\begin{align*}
\trans{x}{\alpha } &= x \\
\trans{\sApp e x}{\alpha } &= \sApp{\trans e {\alpha +1}} x \\
\trans{\sLam x e}{\alpha } &= (\sLam x {\trans e {\alpha -1}})\\
\trans{\sCon b}{\alpha } &= \sCon b \quad\quad \text{for } b\in\{\bT,\bF\}\\
\trans{\sITE e {e_\bT} {e_\bF}}{\alpha } &= \sITE {\trans e 0} {\trans {e_\bT} {\alpha }} {\trans {e_\bF}{\alpha }} \\
\trans{\sLet {\Gamma } e}{\alpha } &= \sLet {\transheap{\Gamma }{\aheap{\Gamma }e{\alpha }}} {\trans{e}{\alpha }}
\end{align*}

The actual transformation happens at a binding, where it eta-expands bound expressions according to the result of the arity analysis, using the $n$-fold eta expansion operator introduced in \cref{sec:arity}. If the analysis determines that a binding is never called, it simply leaves it alone:%
\index{transheap@$\transheap\idxdummy\idxdummy$\idxexpl{arity transformation of a heap}}
\begin{align*}
\transheap{\Gamma }{\bar {\alpha }} &= 
\Big[
x \mapsto
\bigg\{
\begin{aligned}
&e                       &&\text{if } {\bar {\alpha }}(x) = \bot \\
&\etaex{\trans{e}{\alpha}}{\alpha} &&\text{if } {\bar {\alpha }}(x) = \alpha
\end{aligned}
\bigg\}
\Big\lvert
(x\mapsto e)\in \Gamma 
\Big].
\end{align*}


As motivated earlier, I consider an arity analysis $\mathcal A$ to be safe if the transformed program does not perform more allocations than the original program.  A -- technical -- benefit of this measure is that the number of allocations always equals the size of the heap plus the number of update markers on the stack, as no garbage collector is modelled in the semantics:

\begin{definition}[Safe transformation]
\label{def:safe}
A program transformation $\mathsf T$ is \emph{safe} if for every execution
\[ \steps {([],e,[])} {(\Gamma, v, [])} \]
with $\isVal v$, there is an execution
\[ \steps{([],\mathsf T(e),[])}{(\Gamma',v', [])} \]
with $\isVal {v'}$ and $|\dom \Gamma'| \le |\dom \Gamma|$.
\index{safe transformation}

An arity analysis $\mathcal A$ is safe if the transformation $\mathsf T$ is safe.
\end{definition}

This formulation of safety works nicely as the semantics is deterministic up to the choice of variable names. For a genuinely non-deterministic semantics, a definition of safety would have to distinguish different executions and ensure that for each of them, a corresponding execution of the transformed expression exists and does not allocate more.

Note that this definition does not entail functional, i.e.\@ semantic, correctness, which is discussed and proved in \cref{sec:arity-correct}.

\subsubsection{Specification}

I begin by stating sufficient conditions for an arity analysis to be safe.  In order to phrase the conditions, I also need to know the arities an expression $e$ calls its free variables with, assuming it is itself called with $\alpha$ arguments:
\[
\aexp{e}{\alpha } \colon \sVar \rightarrow \mathbb N_\bot
\]
For notational simplicity, I define $\aexp{e}{\bot} \coloneqq \bot$.%
\index{aexp@$\aexp\idxdummy\idxdummy$\idxexpl{arity analysis}}

The specification consists of a few naming hygiene conditions and an
inequality for most syntactical constructs:

\begin{definition}[Arity analysis specification]%
\defdisplayunskip
\begin{flalign*}
\noprecondition
\dom \aexp{e}{\alpha} &\subseteq \fv e \tag{A-dom} \\
\noprecondition
\dom \aheap{\Gamma}{e}{\alpha} &\subseteq \dom \Gamma \tag{Ah-dom} \\
\shortprecondition{z \notin \{x,y\}}
\phantom{z \notin \{x,y\} \implies}
\aexp{\subst{e}{x}{y}}{\alpha}~z &= \aexp{e}{\alpha}~z \tag{A-subst} \\
\precondition{x,y \notin \dom\Gamma}
\aheap{\subst{\Gamma}{x}{y}}{\subst{e}{x}{y}}{\alpha} &= \aheap{\Gamma}{e}{\alpha} \tag{Ah-subst} \\
\noprecondition
[x \mapsto \alpha] &\sqsubseteq \aexp{x}{\alpha} \tag{A-Var} \\
\noprecondition
\aexp{e}{\alpha+1} \sqcup [x \mapsto 0] &\sqsubseteq \aexp{\sApp e x}{\alpha} \tag{A-App} \\
\noprecondition
\aexp{e}{\alpha-1} \setminus \{x\} &\sqsubseteq \aexp{\sLam x  e}{\alpha} \tag{A-Lam} \\
\noprecondition
\aexp{e}{0} \sqcup \aexp{e_\bT}{\alpha } \sqcup \aexp{e_\bF}{\alpha }  &\sqsubseteq \aexp{\sITE e {e_\bT}{e_\bF}}{\alpha} \tag{A-If} \\
\noprecondition
\abinds{\Gamma}{\aheap{\Gamma}e\alpha} \sqcup \aexp{e}\alpha
 &\sqsubseteq \aheap{\Gamma}e\alpha \sqcup \aexp{\sLet \Gamma e}\alpha \tag{A-Let}
%\\[-3em]
\end{flalign*}
where
\label{def:arityspec}
\[
\abinds{\Gamma}{\overline\alpha} \coloneqq \bigsqcup\big\{ \aexp{e}{(\overline \alpha ~ x)} \big| (x \mapsto e) \in \Gamma \big\}.
\]
\end{definition}


These conditions come quite naturally: An expression should not report calls to names that it does not know about. Replacing one variable by another should not affect the arity of other variables. A variable, evaluated with a certain arity, should report (at most) that arity.

In the rules for application and lambda abstraction we keep track of the number of arguments. As this models a forward analysis which looks at bodies before right-hand-sides, we get no useful information on how the argument $x$ in an application $\sApp e x$ is called by $e$.

In rule (A-If), the scrutinee is evaluated without arguments, hence it is analysed with arity $0$.

The rule (A-Let) is a concise way to capture a few requirements. Note that, by (A-dom) and (Ah-dom), the domains of $\aheap{\Gamma}e\alpha$ and $\aexp{\sLet \Gamma e}\alpha$ are disjoint, i.e.\@ $\aheap{\Gamma}e\alpha$ contains the information on how the names of the current binding are called, while $\aexp{\sLet \Gamma e}\alpha$ informs us about the free variables. The left-hand side contains all possible calls, both from the body of the binding and from each bound expression. These are analysed with the arity reported by $\aheap{\Gamma}e\alpha$.
The occurrence of $\aheap{\Gamma}e\alpha$ on both sides of the inequality anticipates the fixed-point iteration in the implementation of the analysis.

\cref{def:arityspec} suffices to prove functional correctness (\Cref{sec:arity-correct} contains a proof of that) but not safety, as the issue of thunks is not touched upon yet. The simplest way to handle thunks -- and the only way without the aid of a cardinality analysis -- is to simply give up when encountering a thunk:

\begin{definition}[No-cardinality analysis specification]%
\label{def:nocardspec}%
\defdisplayunskip
\begin{align*}
x \in \thunks \Gamma \implies \aheap{\Gamma}e\alpha~x = 0 \tag{Ah-thunk}
\end{align*}
\end{definition}

\subsubsection{Safety}
\label{sec:arityproof}

The safety of an eta-expanding transformation rests on the simple observation that, given enough arguments on the stack, an eta-expanded expression evaluates straight to the original expression:

\begin{lemma}[Safety of eta-expansion]
\lemdisplayunskip
\[
\steps {(\Gamma, \etaex{e}{\alpha}, \cons{\arg {x_1} \cdots\arg {x_\alpha}}S)}
       {(\Gamma, e, \cons{\arg {x_1} \cdots\arg {x_\alpha}}S)}
\label{lem:eta_exp_safe}
\]
\end{lemma}
\begin{proof}%
\proofdisplayunskip
\begin{conteq}[headline]
(\Gamma, \etaex{e}{\alpha}, \cons{\arg {x_1}\cdots\arg {x_\alpha}}S)\\
= (\Gamma, (\sLam {z_1\ldots z_\alpha}{\sApp{e}{z_1\ldots z_\alpha}}), \cons{\arg {x_1}\cdots\arg {x_\alpha}}S)\\
\mathop{\Rightarrow^\ast} (\Gamma, \sApp{e}{x_1\ldots x_\alpha}, S) & by \sRule{app$_2$} \\
\mathop{\Rightarrow^\ast}(\Gamma, e, \cons{\arg {x_1}\cdots\arg {x_\alpha}}S)& by \sRule{app$_1$}
\end{conteq}
\end{proof}

So the safety proof for the whole transformation now just has to make sure that whenever we evaluate an eta-expanded value, there are enough arguments on top of the stack. Let $\topargs S$ denote the number of arguments on top of the stack.
\index{args@$\topargs\idxdummy$\idxexpl{number of arguments on top of the stack}}

While tracking the evaluation of the original program in the proof, we need to construct the corresponding configurations in the evaluation of the transformed program. Therefore, we need to keep track of the arity argument to each of the expressions that occurs in a configuration: those on the heap, the control and those in alternatives on the stack. Together, these arguments form an  \emph{arity annotation} written $(\bar {\alpha }, \alpha , \dot \alpha)$. Given such an annotation, we can transform a configuration:%
\index{arity annotation}
\index{.C3arityannot@$(\bar {\alpha }, \alpha , \dot \alpha)$\idxexpl{arity annotation}}
\index{transconf@$\transconf\idxdummy\idxdummy$\idxexpl{arity transformation of a configuration}}
\index{transstack@$\transstack\idxdummy\idxdummy$\idxexpl{arity transformation of a stack}}
\begin{align*}
\transconf{(\Gamma, e, S)}{(\bar \alpha, \alpha, \dot \alpha)} &= (\transheap{\Gamma}{\bar\alpha}, \trans{e}{\alpha}, \transstack{S}{\dot \alpha})
\intertext{where the stack is transformed by}
\transstack{\cons{\alts{e_\bT}{e_\bF}}{S}}{\cons{\alpha}{\dot\alpha}} &= \cons{\alts{\trans{e_\bT}\alpha}{\trans{e_\bF}\alpha}}{\transstack S {\dot\alpha}} \\
\transstack{\cons{\arg x}{S}}{{\dot\alpha}} &= \cons{\arg x}{\transstack S {\dot\alpha}} \\
\transstack{\cons{\upd x}{S}}{{\dot\alpha}} &= \cons{\upd x}{\transstack S {\dot\alpha}}\\
\transstack{[]}{\dot\alpha} &= [].
\end{align*}

\newcommand{\consistent}[2]{#1 \mathrel\triangleright #2}
\newcommand{\consistentNC}[2]{#1 \mathrel{\triangleright_{\text{\tiny\textsf{N}}}} #2}
\newcommand{\consistentC}[2]{#1 \mathrel{\triangleright_{\text{\tiny\textsf{C}}}} #2}

While carrying the arity annotation through the evaluation of our programs, we need to ensure that it stays \emph{consistent} with the current configuration.
\index{consitent!arity annotation}
\index{.R2consistent@$\consistent\idxdummy\idxdummy$\idxexpl{arity annotation consistency}}
\index{.R2consistentNC@$\consistentNC\idxdummy\idxdummy$\idxexpl{arity annotation consistency (without cardinality information)}}
\goodbreak

\begin{definition}[Arity annotation consistency]
An arity annotation is consistent with a configuration, written $\consistent{(\bar \alpha, \alpha, \dot \alpha)}{(\Gamma, e, S)}$, if
\begin{compactitem}
\item $\dom \bar\alpha \subseteq \dom \Gamma \cup \upd S$,
\item $\topargs S \sqsubseteq \alpha$,
\item $\restr{\big(
  \abinds{\Gamma}{\overline \alpha} \sqcup \aexp{e}{\alpha} \sqcup \astack S {\dot \alpha}
  \big)}{\dom \Gamma \cup \upd S}
\sqsubseteq \overline \alpha$,
where
\begin{align*}
\astack{[]}{[]} &\coloneqq \bot \\
\astack{\cons{\alts{e_\bT}{e_\bF}}{S}}{\cons{\alpha}{\dot\alpha}} &\coloneqq \aexp{e_\bT}\alpha \sqcup \aexp{e_\bF}\alpha \sqcup \astack S{\dot\alpha} \\
\astack{\cons{\arg x}{S}}{{\dot\alpha}} &\coloneqq [x \mapsto 0] \sqcup \astack S{\dot\alpha} \\
\astack{\cons{\upd x}{S}}{{\dot\alpha}} &\coloneqq [x \mapsto 0] \sqcup \astack S{\dot\alpha}\text{, and}
\end{align*}
\item $\consistent {\dot\alpha} S$, defined as
\begin{alignat*}{2}
\consistent{[]}{{}&[]} \\
\consistent{\cons\alpha{\dot\alpha}}{{}&\cons{\alts{e_\bT}{e_\bF}}{S}} &&\iff \consistent{\dot\alpha}{S} \wedge \topargs S \sqsubseteq \alpha \\
\consistent{{\dot\alpha}}{{}&\cons{\arg x }{S}} &&\iff \consistent{\dot\alpha}{S} \\
\consistent{{\dot\alpha}}{{}&\cons{\upd x }{S}} &&\iff \consistent{\dot\alpha}{S}.
\end{alignat*}
\end{compactitem}
\end{definition}

As this definition does not consider the issue of thunks, I extend it by one additional requirement:

\begin{definition}[No-cardinality arity annotation consistency]
An arity annotation is \emph{no-cardinality consistent} with a configuration, written
$\consistentNC{(\bar \alpha, \alpha, \dot \alpha)}{(\Gamma, e, S)}$,
iff
$\consistent{(\bar \alpha, \alpha, \dot \alpha)}{(\Gamma, e, S)}$ and $\bar \alpha~x = 0$ for all $x\in \thunks \Gamma$.
\end{definition}
I do not include this requirement in the definition of $\triangleright$ as I will extend it differently when I add a cardinality analysis in \cref{def:card-cons}.


Clearly $(\bot, 0, [])$ is a consistent annotation for an initial configuration $([], e, [])$. The rules take consistently annotated configurations to consistently annotated configurations during the evaluation -- with one exception which causes a minor technical overhead: Upon evaluation of a variable $x$, its binding $x\mapsto e$ is always taken off the heap first, even when it is already evaluated, i.e.\@ $\isVal e$:
\begin{conteq}[oneline]
(\Gamma[x \mapsto e], x, S) \\
\Rightarrow (\Gamma, e, \cons {\upd x} S) \\
\Rightarrow (\Gamma[x \mapsto e], e,S)
\end{conteq}
I would not be able to prove consistency in the intermediate state. To work around this issue, assume that rule \sRule{var$_1$} has an additional constraint $\neg\isVal e$ and that the rule
\begin{equation*}
(x\mapsto e) \in \Gamma,\ \isVal e \implies 
\step{(\Gamma, x, S)}{(\Gamma, e, S)} \tag{\sRule{var$_1'$}}
\end{equation*}
is added. This modification makes the semantics skip over one step, which is fine (and closer to what happens in reality).

\begin{lemma}%[Safety of an arity transformation]
\label{lem:arity-safety}%
Assume $\mathcal A$ fulfils \cref{def:arityspec,def:nocardspec}.

If we have $\steps{(\Gamma, e, S)}{(\Gamma', e', S')}$ and  
$\consistentNC{(\bar \alpha, \alpha, \dot \alpha)}{(\Gamma, e, S)}$, then there exists an arity annotation $(\bar \alpha', \alpha', \dot \alpha')$
with
$\consistentNC{(\bar \alpha', \alpha', \dot \alpha')}{(\Gamma', e', S')}$,
and
$\steps{\transconf{(\Gamma, e, S)}{(\bar \alpha, \alpha, \dot \alpha)}}{\transconf{(\Gamma', e', S')}{(\bar \alpha', \alpha', \dot \alpha')}}$.
\end{lemma}

\begin{proof}
by the individual steps of $\Rightarrow^\ast$.
%
For \sRule{app$_1$} we have
\begin{conteq}
\aexp{e}{\alpha + 1} \sqcup \astack {\cons{\arg x}S}{\dot \alpha} \\
= \aexp{e}{\alpha + 1} \sqcup [x \mapsto 0] \sqcup \astack S {\dot \alpha} \\
\sqsubseteq \aexp{\sApp e x}{\alpha} \sqcup \astack S {\dot \alpha}
\end{conteq}
using (A-App) and the definition of $\dot{\mathcal A}$. So with
$\consistentNC{(\bar \alpha, \alpha, \dot \alpha)}{(\Gamma, \sApp e x, S)}$
we have
$\consistentNC{(\bar \alpha, \alpha+1, \dot \alpha)}{(\Gamma, e, \cons{\arg x} S)}$.
Furthermore
\begin{conteq}
\transconf{(\Gamma, \sApp e x, S)}{(\bar \alpha, \alpha, \dot \alpha)} \\
= (\transheap{\Gamma}{\bar\alpha}, \sApp{(\trans{e}{\alpha+1})} x, \transstack{S}{\dot \alpha}) \\
\Rightarrow (\transheap{\Gamma}{\bar\alpha}, \trans{e}{\alpha+1}, \cons{\arg x}{\transstack{S}{\dot \alpha}}) \\
= \transconf{(\Gamma, e, \cons{\arg x} S)}{(\bar \alpha, \alpha + 1, \dot \alpha)}
\end{conteq}
by rule \sRule{app$_1$}.

The other cases follow this pattern, where the inequalities in \cref{def:arityspec} ensure the preservation of consistency.

In case \sRule{var$_1$} the variable $x$ is bound to a thunk. From consistency we obtain $\bar\alpha~x=0$, so we can use $\etaex{\trans e0}{0} = \trans e0$. Similarly, $\alpha = \bar\alpha~x= 0$ holds in case \sRule{var$_2$}.

The actual eta-expansion is handled in case \sRule{var$_1'$}:
We have
\[
\topargs{\transstack{S}{\dot\alpha}} = \topargs S \sqsubseteq  \alpha \sqsubseteq \aexp{x}{\alpha}~x \sqsubseteq \bar\alpha~x,
\]
from consistency and (A-Var) and hence % $\bar\alpha~x \le \topargs{S} = \topargs{\transstack{S}{\dot\alpha}}$, so
\begin{flalign*}
\transconf{(\Gamma, x, S)}{(\bar \alpha, \alpha, \dot \alpha)} 
&\Rightarrow (\transheap{\Gamma}{\bar\alpha}, \etaex{\trans{e}{\bar\alpha~x}}{\bar\alpha~x}, \transstack{S}{\dot\alpha}) & \text{\{ \sRule{var$_1'$} \}}\\
&\mathop{\Rightarrow^\ast} (\transheap{\Gamma}{\bar\alpha}, \trans{e}{\bar\alpha~x}, \transstack{S}{\dot\alpha}) & \mathllap{\text{\{ by \cref{lem:eta_exp_safe} \}}} \\
&= \transconf{(\Gamma, e, S)}{(\bar \alpha, \bar\alpha~ x, \dot \alpha)}.
\end{flalign*}

\case{\sRule{let$_1$}}
The new variables in $\Delta$ are fresh with regard to $\Gamma$ and $S$, hence also with regard to $\bar\alpha$ according to the naming hygiene conditions in $\consistentNC{(\bar \alpha, \alpha, \dot \alpha)}{(\Gamma, \sLet \Delta e, S)}$. So in order to have $\consistent{(\aheap{\Delta}e\alpha \sqcup \bar \alpha, \alpha, \dot \alpha)}{(\Delta\cdot\Gamma,e, S)}$, it suffices to show
\[
\restr{(\abinds{\Delta}{\aheap{\Delta}e\alpha} \sqcup \aexp{e}{\alpha})}{\dom\Delta\cup\dom\Gamma\cup\upd S} \sqsubseteq \aheap{\Delta}e\alpha \sqcup \bar \alpha,
\]
which follows from (A-Let) and $\restr{\aexp{\sLet{\Delta}e}{\alpha}}{\dom\Gamma\cup\upd S} \sqsubseteq \bar\alpha$. The requirement $\aheap\Delta e \alpha ~ x = 0$ for $x \in \thunks\Delta$ holds by (Ah-thunk).
\end{proof}

The main take-away of this lemma is the following corollary, which states that the transformed program performs the same number of allocations as the original program.

\begin{corollary}
\label{cor:arity-safe}
The arity analysis is safe (in the sense of \cref{def:safe}): If $\steps{([],e,[])}{(\Gamma, v, [])}$, then there exists $\Gamma'$ and $v'$ such that $\steps{([],\trans{e}0,[])}{(\Gamma', v', [])}$ where $\Gamma$ and $\Gamma'$ contain the same number of bindings.
\end{corollary}

\begin{proof}
We have $\consistentNC{(\bot,0,[])}{([], e, [])}$. \cref{lem:arity-safety} gives us $\bar\alpha$, $\alpha$ and $\dot\alpha$ so that
%$\consistentNC{(\bar\alpha, \alpha, \dot\alpha)}{(\Gamma, v, [])}$ and
$\steps{\transconf{([],e,[])}{(\bot, 0, [])}}{\transconf{(\Gamma, v, [])}{(\bar\alpha, \alpha, \dot\alpha)}}$ and $\transheap{\Gamma}{\bar\alpha}$ binds the same names as $\Gamma$.
\end{proof}

\subsection{A concrete arity analysis}

So far, we have a specification for an arity analysis and a proof that every analysis that fulfils the specification is safe.

One possible implementation is the trivial arity analysis which does not do anything useful and simply returns the most pessimistic result: $\aexp{e}{\alpha} \coloneqq [x \mapsto 0 \mid x \in \fv e]$ and $\aheap{\Gamma}e\alpha \coloneqq [x \mapsto 0 \mid x \in \dom \Gamma]$.

A more realistic arity analysis is defined by
\begin{align*}
\aexp{x}{\alpha} &\coloneqq [x \mapsto \alpha]\\
\aexp{\sApp e x}{\alpha} &\coloneqq \aexp{e}{\alpha+1} \sqcup [x \mapsto 0] \\
\aexp{\sLam x  e}{\alpha} &\coloneqq \aexp{e}{\alpha-1} \setminus \{x\} \\
\aexp{\sITE e {e_\bT}{e_\bF}}{\alpha} &\coloneqq \aexp{e}{0} \sqcup \aexp{e_\bT}{\alpha } \sqcup \aexp{e_\bF}{\alpha } \\
\aexp{\sCon b}{\alpha} &\coloneqq \bot  \quad\quad \text{for } b\in\{\bT,\bF\}\\
\aexp{\sLet \Gamma e}\alpha 
&\coloneqq
%\\ %\span
(\mu \bar\alpha.~\abinds{\Gamma}{\bar\alpha} \sqcup \aexp{e}\alpha \sqcup [x\mapsto 0\mid x \in \thunks\Gamma]) \setminus \dom\Gamma
\end{align*}
and
\begin{multline*}
\aheap{\Gamma}e\alpha \coloneqq \restr{(\mu \bar\alpha.~\abinds{\Gamma}{\bar\alpha} \sqcup \aexp{e}\alpha \sqcup [x\mapsto 0\mid x \in\thunks\Gamma])}{\dom\Gamma}
\end{multline*}
where $(\mu \bar\alpha. \ldots)$ denotes the least fixed point, which exists as the involved operations are continuous and monotone in $\bar\alpha$. Moreover, the fixed point can be found in a finite number of steps by iterating from $\bot$, as the carrier of $\bar\alpha$ is bounded by the finite set $\fv\Gamma\cup\fv e$, and the pointwise partial order on arities has no infinite ascending chains. As this ignores the issue of thunks, it corresponds to the analysis described in \cite{gill}.

This implementation fulfils the specifications in \cref{def:arityspec} and \cref{def:nocardspec}, so by \cref{cor:arity-safe}, it is safe.

\subsection{Functional correctness}
\label{sec:arity-correct}

This section on the functional correctness of the transformation is a slight detour in this chapter, which is mainly about the safety of the transformation. I include it here not only because functional correctness, i.e.\@ the preservation of semantics, is an important property, but also to demonstrate that it holds independent of the correctness of a cardinality analysis.

For this section, we expect the analysis to fulfil the specification in \cref{def:arityspec}, but do not require any specific behaviour with regard to thunks, i.e.\@ \cref{def:nocardspec} does not need to hold. 

\begin{theorem}[Functional correctness of Call Arity]
For all expressions $e$, we have
\label{thm:call-arity-correct}
\[
\llbracket \trans e 0 \rrbracket = \llbracket e \rrbracket.
\]
\end{theorem}

As usual, in order to prove this, I need to generalise the statement. In this case, the statement needs to hold for arbitrary incoming arities, instead of just $0$, and furthermore for arbitrary environments instead of just $\bot$.

But in the general case, I would not be able to prove plain equality: Consider the expression $e =\sLet{x=x}x$: When analysed with an incoming arity of 1, we have
\[
\trans e 1 = \sLet{x=\sLam {y}{\sApp x  y}}x
\]
but
\[
\dsem{e}\bot = \bot \ne \sFn{\lambda \_.\, \bot} = \dsem{\trans e 1}\bot.
\]
Therefore, I need to generalise the notion of equality as well, to a weaker notion that only demands equality when applied to enough arguments:

\begin{definition}[Equality up to eta-expansion]
For every $\alpha \in \mathbb N$ and $v_1, v_2 \in \sValue$ let $e_1 \etaeq\alpha e_2$ denote that 
\[
\sFnProj{\sFnProj{\sFnProj{v_1}{z_1}}{\ldots}}{z_\alpha} =
\sFnProj{\sFnProj{\sFnProj{v_2}{z_1}}{\ldots}}{z_\alpha}.
\]
for all $z_1,\ldots,z_\alpha\in \sValue$

Furthermore, for every arity environment $\bar\alpha \in \sVar \to \mathbb N_\bot$ and environments $\rho_1, \rho_2 \in \sVar \to \sValue$, let
\[
\rho_1 \etaeqenv{\bar\alpha} \rho_2 \coloneqq \forall x \in \dom{\bar\alpha}.\,
(\rho_1\,x) \etaeq{\bar\alpha\, x} (\rho_2\,x).
\]
\end{definition}

Note that $\etaeq{0}$ coincides with plain equality.
%
If we have $v_1 \etaeq{\alpha+1} v_2$, then also $\sFnProj{v_1}{z} \etaeq{\alpha} \sFnProj{v_2}z$ for all $z\in \sValue$. Conversely, $v_1[v] \etaeq{\alpha-1} v_2[v]$ for all $v\in\sValue$ implies $\sFn{\lambda v.\, v_1[v]} \etaeq{\alpha} \sFn{\lambda v.\, v_2[v]}$.

The relation is monotone in the sense that for $\alpha\sqsubseteq \alpha'$, $\etaeq{\alpha'}$ implies $\etaeq{\alpha}$, and analogously for the relation on environments.

The main motivation for this definition is that it does not see eta-expansion:
\begin{lemma}
$\dsem{\etaex{e}{\alpha}}\rho \etaeq\alpha \dsem e\rho$
\label{lem:etaeq-etaex}
\end{lemma}
\begin{proof}
by induction of $\alpha$.
\end{proof}

I proceed by proving soundness of the analysis and then the correctness of the transformation, in its general form.

\begin{lemma}
If $\rho_1 \etaeqenv{\aexp e \alpha} \rho_2$, then $\dsem e {\rho_1} \etaeq\alpha\dsem e {\rho_2}$.
\label{lem:aexp-correct}
\end{lemma}

\begin{proof}
by induction over the expression $e$.

\case{$e = x$}
By (A-Var), we have $\alpha \sqsubseteq \aexp e \alpha$, so the assumption of the lemma implies $(\rho_1\,x) \etaeq{\aexp e \alpha\, x}(\rho_2\,x)$, which in turn provides $(\rho_1\,x) \etaeq{\alpha}(\rho_2\,x)$ as required.

\case{$e = \sApp {e'} x$}
By (A-App), the assumption of the lemma implies both $\rho_1 \etaeqenv{\aexp {e'}{\alpha+1}} \rho_2$ as well as $(\rho_1\,x)\etaeq{0}(\rho_2,x)$. By induction, the former yields $\dsem{e'}{\rho_1} \etaeq{\alpha+1} \dsem{e'}{\rho_2}$.
Therefore
\begin{conteq}
\dsem {\sApp {e'} x}{\rho_1} \\
= \sFnProj{\dsem{e'}{\rho_1}}{\rho_1\, x} \\
= \sFnProj{\dsem{e'}{\rho_1}}{\rho_2\, x} \\
\etaeq{\alpha} \sFnProj{\dsem{e'}{\rho_2}}{\rho_2\, x} \\
= \dsem {\sApp {e'} x}{\rho_2}.
\end{conteq}

\case{$e = \sLam x {e'}$}
By (A-Lam) and the assumption we have $\rho_1 \etaeqenv{\aexp {e'}{\alpha-1}\setminus\{x\}} \rho_2$, which, for any $v\in\sValue$, yields $(\rho_1\sqcup[x\mapsto v]) \etaeqenv{\aexp {e'}{\alpha-1}} (\rho_2\sqcup[x\mapsto v])$. By the induction hypothesis, this implies $\dsem{e'}{\rho_2\sqcup[x\mapsto v]} \etaeq{\alpha-1} \dsem{e'}{\rho_2\sqcup[x\mapsto v]}$, and thus
\begin{conteq}
\dsem {\sLam x {e'}}{\rho_1} \\
= \sFn{\lambda v.\, \dsem{e'}{\rho_1 \sqcup[x\mapsto v]}} \\
\etaeq{\alpha} \sFn{\lambda v.\, \dsem{e'}{\rho_2 \sqcup[x\mapsto v]}} \\
= \dsem {\sLam x {e'}}{\rho_2}.
\end{conteq}

\case{$e = \sLet{\Gamma}e'$}
This follows immediately from the denotation of let-expressions and the inductive hypothesis, once we have $(\esem{\Gamma}{\rho_1}) \etaeqenv{\aexp e \alpha} (\esem{\Gamma}{\rho_2})$. By (A-Let), this can be generalised to
$(\esem{\Gamma}{\rho_1}) \etaeqenv{\bar\alpha} (\esem{\Gamma}{\rho_2})$ where $\bar\alpha = \aheap{\Gamma}e\alpha \sqcup \aexp{\sLet \Gamma e}\alpha$.

We prove this by parallel fixed-point induction. The base case is trivial, so we assume we have
\[
\rho_1' \etaeqenv{\bar\alpha} \rho_2'
\]
for some $\rho_1', \rho_2'$, and we need to show
\[
(\rho_1 \addon{\dom\Gamma} \dsem{\Gamma}{\rho_1'}) \etaeqenv{\bar\alpha} (\rho_2 \addon{\dom\Gamma} \dsem{\Gamma}{\rho_2'})
\]
which we do point-wise. Let $\alpha' = \bar\alpha\, x$.

For $x\mapsto e''\in\Gamma$, we need to show $(\dsem{e''}{\rho_1'}) \etaeq{\alpha'} (\dsem{e''}{\rho_1'})$. By the induction hypothesis, this requires $\rho_1' \etaeqenv{\aexp{e''}{\alpha'}} \rho_2'$, which in turn follows from $\rho_1' \etaeqenv{\bar\alpha} \rho_2'$ and (A-Let).

For $x\notin\dom\Gamma$, we need to show $(\rho_1\,x) \etaeq{\alpha'} (\rho_2\,x)$. This follows from $\bar\alpha\,x = \aexp{\sLet \Gamma e}\alpha\,x$ and the assumption of the lemma, namely $\rho_1 \etaeqenv{\aexp{\sLet \Gamma e}\alpha} \rho_2$.

\case{$e = \sCon b$} Trivial.

\case{$e = \sITE{e'}{e_\bT}{e_\bF}$}
By the assumption, (A-If) and the monotonicity of $\etaeqenv{\bar\alpha}$, we can invoke all three induction hypotheses and obtain $\dsem{e'}{\rho_1}\etaeq0\dsem{e'}{\rho_2}$, $\dsem{e_\bT}{\rho_1}\etaeq\alpha\dsem{e_\bT}{\rho_2}$ and $\dsem{e_\bF}{\rho_1}\etaeq\alpha\dsem{e_\bF}{\rho_2}$. From this, $\dsem{\sITE{e'}{e_\bT}{e_\bF}}{\rho_1}\etaeq\alpha\dsem{\sITE{e'}{e_\bT}{e_\bF}}{\rho_2}$ follows by a case analysis on $\dsem{e'}{\rho_1}$.
\end{proof}

With this in place, we can prove that the transformation is semantics-preserving:
\begin{lemma}
$
\dsem{\trans e \alpha}\rho \etaeq\alpha \dsem{e}\rho.
$
\label{lem:call-arity-correct}
\end{lemma}

\begin{proof}
Again, by induction on $e$, for arbitrary $\alpha$ and $\rho$.

\case{$e = x$}
trivial.

\case{$e = \sApp {e'} x$}
By the induction hypothesis, we have $\dsem{\trans {e'} {\alpha+1}}\rho \etaeq{\alpha+1} \dsem{e'}\rho$, so
\begin{conteq}
\dsem{\trans {\sApp {e'} x} \alpha}\rho\\
= \dsem{\sApp {\trans {e'} {\alpha+1}} x}\rho\\
= \sFnProj{\dsem{\trans {e'} {\alpha+1}}\rho}{\rho\, x}\\
\etaeq{\alpha} \sFnProj{\dsem{e'}\rho}{\rho\, x}\\
= \dsem{\sApp {e'} x}\rho.
\end{conteq}

\case{$e = \sLam x {e'}$}
By the induction hypothesis, we have $\dsem{\trans {e'} {\alpha-1}}\rho' \etaeq{\alpha-1} \dsem{e'}\rho'$ for any $\rho'$, so
\begin{conteq}
\dsem{\trans {\sLam x {e'}} \alpha}\rho\\
= \dsem{\sLam x {\trans {e'} {\alpha-1}}}\rho\\
= \sFn{\lambda v.\, \dsem{\trans {e'} {\alpha+1}}{\rho \sqcup [x\mapsto v]}}\\
\etaeq{\alpha}\sFn{\lambda v.\, \dsem{e'}{\rho \sqcup [x\mapsto v]}}\\
= \dsem{\sLam x {e'}}\rho.
\end{conteq}

\case{$e = \sLet{\Gamma}e'$}
We first need to prove
\begin{equation}
\esem{\transheap \Gamma {\aheap \Gamma e \alpha}}\rho \etaeqenv{\aexp e\alpha} \esem{\Gamma}\rho.
\tag{$\ast$}
\end{equation}
which, using (A-let), follows from
\[
\esem{\transheap \Gamma {\aheap\Gamma e \alpha}}\rho \etaeqenv{\bar\alpha} \esem{\Gamma}\rho.
\]
with $\bar\alpha = \aheap{\Gamma}e\alpha \sqcup \aexp{\sLet \Gamma e}\alpha$.

Similar to above, we can prove this using parallel fixedpoint induction. Again, the base case is trivial, so let $\rho_1, \rho_2$ be environments for which
\[
\rho_1 \etaeqenv{\bar\alpha} \rho_2
\]
holds. We need to show
\[
\dsem{\transheap \Gamma {\aheap\Gamma e \alpha}}{\rho_1} \etaeqenv{\bar\alpha} \dsem{\Gamma}{\rho_2}.
\]
which we verify point-wise. Let $x\mapsto e'' \in\Gamma$ and  $\alpha' =  \aheap\Gamma e \alpha\, x = \bar\alpha\, x$. We have
\begin{conteq}
\dsem{\transheap \Gamma {\aheap\Gamma e \alpha}}{\rho_1} \,x \\
= \dsem{\etaex {\trans {e''}{\alpha'}}{\alpha'}}{\rho_1} \\ 
\etaeq{\alpha'} \dsem{\trans {e''}{\alpha'}}{\rho_1} & \cref{lem:etaeq-etaex} \\ 
\etaeq{\alpha'} \dsem{e''}{\rho_1} & ind.\@ hypothesis \\
\etaeq{\alpha'} \dsem{e''}{\rho_2} & \cref{lem:aexp-correct} \\
= \dsem{\Gamma}{\rho_2} \,x
\end{conteq}
where in order to invoke \cref{lem:aexp-correct}, we need $\rho_1 \etaeqenv{\aexp {e''}{\alpha'}} \rho_2$, which follows from $\rho_1 \etaeqenv{\bar\alpha} \rho_2$ and (A-Let).

Now we can calculate
\begin{conteq}
\dsem{\trans{\sLet\Gamma{e'}}\alpha}\rho \\
= \dsem{\sLet{\transheap\Gamma{\aheap\Gamma{e'}\alpha}}{\trans{e'}\alpha}}\rho \\
= \dsem{\trans{e'}\alpha}{\esem{\transheap\Gamma{\aheap\Gamma{e'}\alpha}}\rho} \\
\etaeq{\alpha} \dsem{e'}{\esem{\transheap\Gamma{\aheap\Gamma{e'}\alpha}}\rho} & ind.\@ hypothesis \\
\etaeq{\alpha} \dsem{e'}{\esem{\Gamma}\rho} & \cref{lem:aexp-correct} and ($\ast$) \\
= \dsem{\sLet\Gamma{e'}}\rho.
\end{conteq}

\case{$e = \sCon b$} Trivial.

\case{$e = \sITE{e'}{e_\bT}{e_\bF}$}
By induction we have $\dsem{\trans{e}0}\rho \etaeq{0} \dsem e\rho$, so $\dsem{\trans{e}0}\rho = \dsem e\rho$, and by case analysis on this value, this follows from the induction hypotheses.
\end{proof}

\begin{proof}[of \cref{thm:call-arity-correct}]
This follows from \cref{lem:call-arity-correct} with $\rho=\bot$, as $\etaeq{0}$ coincides with regular equality.
\end{proof}


\section{Cardinality analyses}
\label{sec:cardanal}

The previous section proved the safety of a straight-forward arity analysis. But it was severely limited by not being able to eta-expand thunks, which is desirable in practice.

\subsection{Abstract cardinality analysis}
\label{sec:abstractcard}
So the arity analysis needs an accompanying \emph{cardinality analysis} which prognoses how often a bound variable is going to be evaluated: I model this as a function%
\index{cheap@$\cheap\idxdummy\idxdummy\idxdummy$\idxexpl{cardinality analysis of a binding}}
\[
\cheap \Gamma e \alpha \colon \sVar \to \Card
\]
where $\Card$ is the three element lattice%
\index{Card@$\Card$\idxexpl{three-element lattice}}
\index{.C0one@$\Co$\idxexpl{called at most once, member of $\Card$}}
\index{.C0infty@$\Cm$\idxexpl{top element of $\Card$}}
\[
\bot \sqsubset \Co \sqsubset \Cm,
\]
corresponding to “not called”, “called at most once” and “no information”, respectively.  We use $\gamma$ for an element of $\Card$ and $\bar\gamma$ for a mapping $\sVar\to\Card$.

The expression $\bar\gamma - x$, which subtracts one call from the prognosis, is defined as%
\index{.O2minus@$\idxdummy-\idxdummy$!\idxexpl{subtracting a call from a cardinality environment}}
\[
(\bar\gamma - x)~ y =
\begin{cases}
\bot &\text{if } y = x \text{ and } \bar\gamma~y = \Co \\
\bar\gamma~y &\text{otherwise.}
\end{cases}
\]

\subsubsection{Specification}

I start with a very abstract specification for a safe cardinality analysis and prove that an arity transformation that makes use of it it is still safe. I stay oblivious in how the analysis works and defer that to the next refinement step in \cref{sec:tracetreecard}.

For the specification we not only need the local view on one binding, as provided by $\cheap \Gamma e \alpha$, but also a prognosis on how often each variable is going to be called in the further execution of a complete and arity-annotated configuration:%
\index{cconf@$\prognosis\idxdummy\idxdummy$\idxexpl{cardinality analysis of a configuration}}
\[
\prognosis{(\Gamma, e, S)}{(\bar\alpha, \alpha, \dot\alpha)} \colon \sVar \to \Card
\]

\begin{definition}[Cardinality analysis specification]
\label{def:cardspec}
The cardinality prognosis and cardinality analysis fulfil some obvious naming hygiene conditions:
\begin{flalign*}
\noprecondition
\dom \cheap\Delta e \alpha &= \dom\aheap\Delta e \alpha \tag{Ch-dom}\\
\noprecondition
\dom \prognosis{(\Gamma, e, S)}{(\bar\alpha, \alpha, \dot\alpha)} &\subseteq \fv(\Gamma, e, S) \tag{C-dom} \displaybreak[3] \\
\precondition{\restr{\bar\alpha}{\dom\Gamma} = \restr{\bar\alpha'}{\dom\Gamma}}
\prognosis{(\Gamma, e, S)}{(\bar\alpha, \alpha, \dot\alpha)} &= \prognosis{(\Gamma, e, S)}{(\bar\alpha', \alpha, \dot\alpha)} \tag{C-cong} \\
\shortprecondition{\bar\alpha~x = \bot}
\prognosis{(\Gamma, e, S)}{(\bar\alpha, \alpha, \dot\alpha)} &= 
\prognosis{(\Gamma \setminus \{x\}, e, S)}{(\bar\alpha, \alpha, \dot\alpha)} \tag{C-not-called} \\ 
\intertext{%
Furthermore, the cardinality analysis is likewise a forward analysis and has to be conservative about function arguments:
}
\shortprecondition{\arg x \in S}
[x\mapsto \Cm] &\sqsubseteq \prognosis{(\Gamma, e, S)}{(\bar\alpha, \alpha, \dot\alpha)}\tag{C-args}\\
\intertext{%
The prognosis may ignore update markers on the stack:
}
\noprecondition
\prognosis{(\Gamma, e, \cons{\upd x}S)}{(\bar\alpha, \alpha, \dot\alpha)} &\sqsubseteq \prognosis{(\Gamma, e, S)}{(\bar\alpha, \alpha, {\dot\alpha)}} \tag{C-upd} \\
\intertext{%
An imminent call better be prognosed:
}
\noprecondition
[x\mapsto \Co] &\sqsubseteq \prognosis{(\Gamma, x, S)}{(\bar\alpha, \alpha, \dot\alpha)} \tag{C-call} \\
\intertext{%
Evaluation improves the prognosis:
Note that in (C-Var$_1$) and (C-Var$_1'$),
we account for the call to $x$ with the $-$ operator.
}
\noprecondition
\prognosis{(\Gamma, e, \cons{\arg x}S)}{(\bar\alpha, \alpha+1, \dot\alpha)}
&\sqsubseteq 
\prognosis{(\Gamma, \sApp e x, S)}{(\bar\alpha, \alpha, \dot\alpha)} \tag{C-App}\\
\noprecondition
\prognosis{(\Gamma, \subst e y x, S)}{(\bar\alpha, \alpha-1, \dot\alpha)}
&\sqsubseteq 
\prognosis{(\Gamma, \sLam y e, \cons{\arg x}S)}{(\bar\alpha, \alpha, \dot\alpha)} \tag{C-Lam}\\
\precondition{(x\mapsto e)\in\Gamma,\, \neg \isVal e}
\prognosis{(\Gamma\setminus\{x\}, e, \cons{\upd x}S)}{(\bar\alpha, \bar\alpha~x, \dot\alpha)}
&\sqsubseteq 
\prognosis{(\Gamma, x, S)}{(\bar\alpha, \alpha, \dot\alpha)} - x \tag{C-Var$_1$}\\
\precondition{(x\mapsto e)\in\Gamma,\,  \isVal e}
\prognosis{(\Gamma, e, S)}{(\bar\alpha, \bar\alpha~x, \dot\alpha)}
&\sqsubseteq 
\prognosis{(\Gamma, x, S)}{(\bar\alpha, \alpha, \dot\alpha)} - x \tag{C-Var$_1'$}\\
\precondition{\isVal e}
\prognosis{(\Gamma[x\mapsto e], e, S)}{(\bar\alpha, 0, \dot\alpha)}
&\sqsubseteq 
\prognosis{(\Gamma, e, \cons{\upd x}S)}{(\bar\alpha, 0, \dot\alpha)} \tag{C-Var$_2$}\\
\noprecondition
\prognosis{(\Gamma, e, \cons{\alts{e_\bT}{e_\bF}}S)}{(\bar\alpha, 0, \cons{\alpha}{\dot\alpha})}
&\sqsubseteq 
\prognosis{(\Gamma, \sITE e{e_\bT}{e_\bF}, S)}{(\bar\alpha, \alpha, \dot\alpha)} \tag{C-If$_1$}\\
\shortprecondition{b\in\{\bT,\bF\}}
\prognosis{(\Gamma, e_b, S)}{(\bar\alpha, \alpha, \dot\alpha)}
&\sqsubseteq \prognosis{(\Gamma, \sCon b, \cons{\alts{e_\bT}{e_\bF}}S)}{(\bar\alpha, 0, \cons{\alpha}{\dot\alpha})}  \tag{C-If$_2$}
\end{flalign*}%
The specification for the \keyword{let}-bindings connects the arity analysis, the cardinality analysis and the cardinality prognosis:
\begin{flalign*}
\precondition{\dom\Delta \cap \fv(\Gamma , S) = \{\},\, \dom \bar\alpha \subseteq \dom\Gamma \cup \upd S}
\prognosis{(\Delta\cdot\Gamma, e, S)}{(\aheap \Delta e \alpha \sqcup \bar\alpha, \alpha, \dot \alpha)}
&\sqsubseteq
\cheap\Delta e \alpha \sqcup \prognosis{(\Gamma, \sLet \Delta e, S)}{(\bar\alpha, \alpha, \dot \alpha)}\tag{C-Let}
\end{flalign*}
Finally, we need to ensure that the analysis returns the top element of the lattice for thunks that might be called more than once. In contrast to the corresponding \cref{def:nocardspec}, this can now make use of the cardinality analysis:
\begin{align*}
x \in \thunks \Gamma,\,  \cheap\Gamma e \alpha~ x = \Cm \implies 
\aheap{\Gamma}e\alpha~x = 0 \tag{Ah-$\Cm$-thunk}
\end{align*}
\end{definition}

\vspace{-1em}

\subsubsection{Safety}
\label{sec:cardproof}

The safety proof proceeds similarly to the one for \cref{lem:arity-safety}. But now we are allowed to eta-expand thunks that are called at most once. This has considerable technical implications for the proof:
\begin{itemize}
\item An eta-expanded expression is a value, so in the transformed program, \sRule{var$_2$} occurs immediately after \sRule{var$_1$}. In the original program, however, an update marker stays on the stack until the expression is evaluated to a value, and then \sRule{var$_2$} fires without a correspondence in the evaluation of the transformed program. In particular, the update marker can interfere with uses of \cref{lem:eta_exp_safe}.
\item Because the eta-expanded expression is a value, it stays on the heap as it is, whereas in the original program, it is first evaluated. Evaluation can reduce the number of free variables of the expression, so subsequent choices of fresh variables in \sRule{let$_1$} in the original evaluation might not be suitable in the evaluation of the transformed program.
\end{itemize}
A more complicated variant of \cref{lem:eta_exp_safe} and carrying a variable renaming around throughout the proof might solve these problems, but would complicate it too much. I therefore apply a small trick and simply allow unwanted update markers to disappear, by defining a variant of the semantics:
\begin{definition}[Forgetful semantics]
The relation $\fste$ is defined by%
\index{.R2sestf@$\fstep\idxdummy\idxdummy$\idxexpl{forgetful small-step semantics}}
\begin{gather*}
\step{(\Gamma, e, S)}{(\Gamma', e', S')} \implies 
\fstep{(\Gamma, e, S)}{(\Gamma', e', S')}
\shortintertext{and}
\fstep{(\Gamma, e, \cons{\upd x}S)}{(\Gamma, e, S)}. \tag*{\sRule{dropUpd}}
\end{gather*}
\end{definition}

This way, a one-shot binding can disappear completely after it has been  called, making it easier to relate the original program to the transformed program. Because $\fste$ contains $\Rightarrow$, \cref{lem:eta_exp_safe} holds here as well.
Afterwards, and outside the scope of the safety proof, I will recover the original semantics from the forgetful semantics.

In the proof I keep track of the set of removed bindings (named $r$), and write $(\Gamma, e, S) -r \coloneqq (\Gamma \setminus r, e, S - r)$ for the configuration with bindings from the set $r$ removed. The stack $(S -r)$ is $S$ without update markers $\upd x$ where $x \in r$.
\index{.O2minus@$\idxdummy-\idxdummy$!\idxexpl{removing a set of bindings from a configuration}}

I also keep track of $\bar\gamma \colon \sVar\to\Card$, the current cardinalities of the variables on the heap:

\begin{definition}[Cardinality arity annotation consistency]
\label{def:card-cons}
I write $\consistentC{(\bar \alpha, \alpha, \dot \alpha, \bar\gamma, r)}{(\Gamma, e, S)}$,
iff%
\index{.R2consistentC@$\consistentC\idxdummy\idxdummy$\idxexpl{arity annotation consistency (with cardinality information)}}
\begin{compactitem}
\item the arity information is consistent, $\consistent{(\bar \alpha, \alpha, \dot \alpha)}{(\Gamma, e, S) - r}$,
\item $\dom \bar\alpha = \dom \bar\gamma$,
\item the cardinality information is correct, \mbox{$\prognosis{(\Gamma, e,S)}{(\bar\alpha, \alpha, \dot\alpha)} \sqsubseteq \bar\gamma$,}
\item many-called thunks are not going to be eta-expanded, i.e.\@ $\bar\alpha~x=0$ for $x \in \thunks \Gamma$ with $\bar\gamma~x = \Cm$ and
\item only bindings that are not going to be called ($\bar\gamma~x=\bot$) are removed, i.e.\@
$r \subseteq (\dom \Gamma \cup \upd S) - \dom \bar\gamma$.
\end{compactitem}
\end{definition}


\begin{lemma}
\label{lem:card-arity-safety}
Assume $\mathcal A$ and $\mathcal C$ fulfil the specifications in \cref{def:arityspec,def:cardspec}.

If $\steps{(\Gamma, e, S)}{(\Gamma', e', S')}$ and  $\consistentC{(\bar \alpha, \alpha, \dot \alpha, \bar\gamma, r)}{(\Gamma, e, S)}$ , then there exists an arity annotation $(\bar \alpha', \alpha', \dot \alpha', \bar\gamma', r')$ such that
$\consistentC{(\bar \alpha', \alpha', \dot \alpha', \bar\gamma', r')}{(\Gamma', e', S')}$,
and
$\fsteps{\transconf{(\Gamma, e, S) - r}{(\bar \alpha, \alpha, \dot \alpha)}}{\transconf{(\Gamma', e', S') - r'}{(\bar \alpha', \alpha', \dot \alpha')}}$.
\end{lemma}

The lemma is an analogue to \cref{lem:arity-safety}. The main difference, besides the extra data to keep track of, is that we produce an evaluation in the forgetful semantics, with some bindings removed.

\begin{proof}
by the individual steps of $\Rightarrow^\ast$. The preservation of the arity annotation consistency in the proof of \cref{lem:arity-safety} can be used here as well. Note that both the arity annotation requirement and the transformation are applied to $(\Gamma, e, S) -r$, so this goes well together. The correctness of the cardinality information (the second condition in \cref{def:card-cons}) follows easily from the inequalities in \cref{def:cardspec}.

I elaborate only on the interesting cases:

\case{\sRule{var$_1$}}
We cannot have $\bar\gamma~x = \bot$ because of (C-call).

If $\bar\gamma~x = \Cm$ we get $\bar\alpha~x= 0$, as before, and nothing surprising happens.

If $\bar\gamma~x= \Co$, we know that this is the only call to $x$, so we set $r' = r \cup \{x\}$, $\bar\gamma' = \bar\gamma - x$ and use $\sRule{dropUpd}$ to get rid of the mention of $\upd x$ on the stack.

\case{\sRule{var$_2$}}
If $x \notin r$, proceed as before.
If $x\in r$, then the transformed configurations are identical and the $\fste^\ast$ judgement follows from reflexivity.
\end{proof}

%\case{\sRule{let$_1$}}
%Besides the usual calculations about scoping and freshness, we use the specification rules (C-Let) and (Ah-$\Cm$-thunk) to ensure that $\consistentC{(\Delta\cdot\Gamma, e, s)}{(\aheap \Delta e \alpha \sqcup \bar\alpha, \alpha, \dot\alpha, \cheap \Delta e \alpha \sqcup \bar\gamma, r)}$ holds.

%In order to show the safety of the cardinality based arity analysis, we need to recover an evaluation in the original semantics. We skim over this technical issue by simply stating
%\begin{lemma}[Recovery of the original semantics]
%\label{lem:sem-recovery}
%If $\fv (\Gamma, e, S) \subseteq \dom\Gamma \cup \upd S$ and $\fsteps{(\Gamma, e, S) - r}{(\Delta, v, T)}$, where $r \subseteq \dom \Gamma \cup \upd S$, then there is $\Delta', T'$ and $r'$, so that $\Delta = \Delta' \setminus r$, $T = T'-r$ and $r \subseteq \dom\Delta' \cup \upd T'$.
%\end{lemma}
%\begin{proof}
%by the steps of $\fste$ and by induction on the number of update markers removed from the stop of the stack.
%\end{proof}

\begin{corollary}%[Safety of a cardinality based arity transformation]
\label{cor:card-arity-safe}
The cardinality based arity analysis is safe for closed expressions, i.e.\@ if $\fv e = \{\}$ and $\steps{([],e,[])}{(\Gamma, v, [])}$ then there exists $\Gamma'$ and $v'$ such that $\steps{([],\trans{e}0,[])}{(\Gamma', v', [])}$ where $\Gamma$ and $\Gamma'$ contain the same number of bindings.
\end{corollary}

\begin{proof}
We need $\fv e=\{\}$ to have $\prognosis{([],e,[])}{\bot,0,[]} = \bot$, so that $\consistentC{(\bot,0,[],\bot,[])}{([], e, [])}$ holds.
%
Now according to \cref{lem:card-arity-safety} there are $\bar\alpha$, $\alpha$, $\dot\alpha$
%, $\bar\gamma$
and $r$ so that
%$\consistentC{(\bar\alpha, \alpha, \dot\alpha, \bar\gamma, r)}{(\Gamma, v, [])}$ and
$\fsteps{\transconf{([],e,[])}{(\bot, 0, [])}}{\transconf{(\Gamma, v, [])-r}{(\bar\alpha, \alpha, \dot\alpha)}}$.

As the forgetful semantics only drops unused bindings, but does not otherwise behave any different than the real semantics, a technical lemma allows us to recover $\steps{\transconf{([],e,[])}{(\bot, 0, [])}}{\transconf{(\Gamma', v, [])}{(\bar\alpha, \alpha, \dot\alpha)}}$ for a $\Gamma'$ where $\transheap{\Gamma}{\bar\alpha} -r = \Gamma' - r'$. As $r \subseteq \Gamma$ and $r'\subseteq \Gamma'$, 
this concludes the proof of the corollary: $\Gamma$,  $\transheap{\Gamma}{\bar\alpha}$ and  $\Gamma'$ all bind the same variables.
\end{proof}

\subsection{Trace tree cardinality analysis}
\label{sec:tracetreecard}
\label{sec:tracetrees}

In the second refinement, I look -- still quite abstractly -- at the implementation of the cardinality analysis.
For the arity information, the type of the result required for the transformation ($\sVar \to \mathbb N_\bot$) was sufficiently rich to be used in the analysis as well. This is unfortunately not the case for the cardinality analysis: Even if we know that an expression calls $x$ and $y$ each at most once, this does not tell us whether these calls can occur together (as in $\sApp{\sApp e x} y$) or whether they are exclusive (as in $\sITE e x y$).

So I need a richer type that captures the future calls of an expression, that can distinguish different code paths and that maps easily to the type $\sVar \to \Card$:%
\index{TTree@$\sTTree$\idxexpl{trace trees}}
This is the type $\sTTree$ of (possibly infinite) trees, where each edge is labelled with a variable name, and a node has at most one outgoing edge for each variable name. The paths in the tree correspond to the possible executions  and the labels on the edges record each occurring variable call. I use $t$ for values of type $\sTTree$.

There are other, equivalent ways to interpret this type: Each $\sTTree$ corresponds to a non-empty set of (finite) lists of variable names that is prefixed-closed (i.e.\@ for every list in the set, its prefixes are also in the set). Each such list corresponds to a (finite) path in the tree. The function
\[
\paths \colon \sTTree \to 2^{[\sVar]}
\]
implements this correspondence.%
\index{paths@$\paths\idxdummy$\idxexpl{the set of paths of a trace tree}}

Yet another view is given by the function%
\index{next@$\nxt\idxdummy\idxdummy$\idxexpl{a specific immediate subtree of a trace tree}}
\[
\nxt{}{} \colon \sVar \to \sTTree \to \sTTree_\bot,
\]
where $\nxt x t = t'$ iff the root of $t$ has an edge labelled $x$ leading to $t'$, and $\nxt x t=\bot$ if the root of $t$ has no edge labelled $x$. In that sense, $\sTTree$ represents automata with labelled transitions, and we can actually define a trace trees $t$ by specifying $\nxt{x}{t}$ for all $x\in \sVar$.

\index{.O2oplus@$\idxdummy\oplus\idxdummy$\idxexpl{union of trace trees}}%
\index{.O2otimes@$\idxdummy\otimes\idxdummy$\idxexpl{interleaving of trace trees}}%
The basic operations on trees are $\oplus$, given by $\paths{t\oplus t'} = \paths t \cup \paths{t'}$, and $\otimes$, where $\paths{t\otimes t'}$ is the set of all interleavings of lists from $\paths t$ with lists from $\paths {t'}$. I write $\many t$ for $t \otimes t \otimes t \otimes \cdots$. A tree is called \emph{repeatable} if $t = t \otimes t = \many t$.

\index{.R2sqsubset@$\idxdummy\sqsubseteq\idxdummy$!on $\sTTree$}%
The partial order used on $\sTTree$ is
\[
t \sqsubseteq t' \iff \paths t \subseteq \paths {{t'}}.
\]

\index{.C0emptytree@$\emptytree$\idxexpl{empty trace tree}}%
\index{.C1singleton@$\singletontree\idxdummy$\idxexpl{singleton trace tree}}%
\index{.O2setminus@$\idxdummy\setminus\idxdummy$!on trace trees}%
\index{.O2restr@$\idxdummy"|_\idxdummy$!on trace trees}%
I write $\emptytree$ for the tree with no edges and simply $x$ for \singletontree{$x$}, the tree with exactly one edge labelled $x$.
The tree $t\setminus V$ is $t$ with all edges with labels in $V$ contracted, $\restr{t}{V}$ is $t$ with all edges but those labelled with variables in $V$ contracted.

\begin{example}
Consider the two trees
\[
t_1 =
\begin{tikzpicture}[grow=right, sibling distance=-12, level distance=18, every node/.style={inner sep=2pt}, baseline={([yshift=-2.3pt]O)}]
 \coordinate[circle, fill, inner sep=1]
  (O)
  child {
    coordinate
    edge from parent node[above] {$x$}
  }
  child {
    coordinate
    edge from parent node[below] {$y$}
  }
  ;
\end{tikzpicture}
\quad\text{and}\quad t_2 =
\begin{tikzpicture}[grow=right, sibling distance=-12, level distance=18, every node/.style={inner sep=2pt}, baseline={([yshift=-2.3pt]O)}]
 \coordinate[circle, fill, inner sep=1]
  (O)
  child {
    coordinate
    child {
      coordinate
      edge from parent node[above] {$z$}
    }
    edge from parent node[above] {$x$}
  }
  ;
\end{tikzpicture}.
\]
Then we have:
\vspace*{-2em}
\begin{gather*}
t_1 \oplus t_2 =
\begin{tikzpicture}[grow=right, sibling distance=-12, level distance=18, every node/.style={inner sep=2pt}, baseline={([yshift=-2.3pt]O)}]
 \coordinate[circle, fill, inner sep=1]
  (O)
  child {
    coordinate
    child {
      coordinate
      edge from parent node[above] {$z$}
    }
    edge from parent node[above] {$x$}
  }
  child {
    coordinate
    edge from parent node[below] {$y$}
  }
  ;
\end{tikzpicture}\quad
t_1 \otimes t_2 =
\begin{tikzpicture}[grow=right, level 1/.style={sibling distance=-28}, level 2/.style={sibling distance=-18}, sibling distance=-14, level distance=18, every node/.style={inner sep=2pt}, baseline={([yshift=-2.3pt]O)}]
 \coordinate[circle, fill, inner sep=1]
  (O)
  child {
    coordinate
    child {
      coordinate
      child {
        coordinate
        edge from parent node[above] {$x$}
      }
      child {
        coordinate
        edge from parent node[below] {$y$}
      }
      edge from parent node[above] {$z$}
    }
    child {
      coordinate
      child {
        coordinate
        edge from parent node[below] {$z$}
      }
      edge from parent node[below right] {$y$}
    }
    child {
      coordinate
      child {
        coordinate
        edge from parent node[below] {$z$}
      }
      edge from parent node[below] {$x$}
    }
    edge from parent node[above] {$x$}
  }
  child {
    coordinate
    child {
      coordinate
      child {
        coordinate
        edge from parent node[below] {$z$}
      }
      edge from parent node[below] {$x$}
    }
    edge from parent node[below] {$y$}
  }
  ;
\end{tikzpicture}\quad
t_1^* =
\begin{tikzpicture}[grow=right, level 1/.style={sibling distance=-28}, level 2/.style={sibling distance=-12}, sibling distance=-5, level distance=18, every node/.style={inner sep=2pt}, baseline={([yshift=-2.3pt]O)}]
 \coordinate[circle, fill, inner sep=1]
  (O)
  child {
    coordinate
    child {
      coordinate
      child {
        coordinate
        edge from parent[dotted]
      }
      child {
        coordinate
        edge from parent[dotted]
      }
      edge from parent node[above] {$x$}
    }
    child {
      coordinate
      child {
        coordinate
        edge from parent[dotted]
      }
      child {
        coordinate
        edge from parent[dotted]
      }
      edge from parent node[below] {$y$}
    }
    edge from parent node[above] {$x$}
  }
  child {
    coordinate
    child {
      coordinate
      child {
        coordinate
        edge from parent[dotted]
      }
      child {
        coordinate
        edge from parent[dotted]
      }
      edge from parent node[above] {$x$}
    }
    child {
      coordinate
      child {
        coordinate
        edge from parent[dotted]
      }
      child {
        coordinate
        edge from parent[dotted]
      }
      edge from parent node[below] {$y$}
    }
    edge from parent node[below] {$y$}
  }
  ;
\end{tikzpicture}\\
(t_1 \otimes t_2)\setminus\{x\} = \restr{(t_1 \otimes t_2)}{\{y,z\}} =
\begin{tikzpicture}[grow=right, sibling distance=-12, level distance=18, every node/.style={inner sep=2pt}, baseline={([yshift=-2.3pt]O)}]
 \coordinate[circle, fill, inner sep=1]
  (O)
  child {
    coordinate
    child {
      coordinate
      edge from parent node[above] {$y$}
    }
    edge from parent node[above] {$z$}
  }
  child {
    coordinate
    child {
      coordinate
      edge from parent node[below] {$z$}
    }
    edge from parent node[below] {$y$}
  }
  ;
\end{tikzpicture}
\end{gather*}

\end{example}

%\subsubsection{The $s$ function}
\label{sec:function-s}
\index{s@$s$\idxexpl{combining the trace trees of a binding}}%
Given a binding $(\Gamma, e)$ where we have a $\sTTree$ describing the calls done by $e$, and also one $\sTTree$ for each expression bound in $\Gamma$, how can we combine that information into one tree describing the behaviour of the whole binding?

A first attempt might be a function
\[
s \colon (\sVar \to \sTTree) \to \sTTree \to \sTTree
\]
defined by
\[
\nxt x {s~\bar t~t}\coloneqq
\begin{cases}
\bot &\text{if } \nxt x t  =\bot \\
s~\bar t~(t' \otimes \bar t~x) & \text{if } \nxt x t = t',
\end{cases}
\]
that traverses the tree $t$ and upon every call interleaves the tree of the called name, $\bar t~x$, with the remainder of $t$.

\begin{example}
Let $\bar t~x = \singletontree{y}$, $\bar t~y = \singletontree{z}$, $\bar t~x = \emptytree$ and $t =
\begin{tikzpicture}[grow=right, sibling distance=-12, level distance=18, every node/.style={inner sep=2pt}, baseline={([yshift=-2.3pt]O)}]
 \coordinate[circle, fill, inner sep=1]
  (O)
  child {
    coordinate
    child {
      coordinate
      edge from parent node[above] {$y$}
    }
    edge from parent node[above] {$x$}
  }
  ;
\end{tikzpicture}$. Then
\[
s~\bar t~t =
\begin{tikzpicture}[grow=right, sibling distance=-12, level distance=18, every node/.style={inner sep=2pt}, baseline={([yshift=-2.3pt]O)}]
 \coordinate[circle, fill, inner sep=1]
  (O)
  child { coordinate
    child { coordinate
      child {
        coordinate
        child {
          coordinate
          child {
            coordinate
            edge from parent node[above] {$z$}
          }
          edge from parent node[above] {$y$}
        }
        edge from parent node[above] {$z$}
      }
      child {
        coordinate
        child {
          coordinate
          child {
            coordinate
            edge from parent node[below] {$z$}
          }
          edge from parent node[below] {$z$}
        }
        edge from parent node[below] {$y$}
      }
      edge from parent node[above] {$y$}
    }
    edge from parent node[above] {$x$}
  }
  ;
\end{tikzpicture}.
\]
\end{example}


This is a good start, but it does not cater for thunks, where the first call behaves differently than later calls. Therefore, we have to tell $s$ which variables are bound to thunks, and give them special treatment: After a variable $x$ referring to a thunk is evaluated, we pass on a modified map where $\bar t~x = \emptytree$.

Hence I extend the signature to
\[
s \colon 2^\sVar \to (\sVar \to \sTTree) \to \sTTree \to \sTTree
\]
and the definition is now\label{function:s}
\begin{multline*}
\nxt x {s_T~\bar t~t}
\coloneqq 
\begin{cases}
\bot &\text{if } \nxt x t  =\bot \\
s_T~\bar t~(t' \otimes \bar t~x) & \text{if } \nxt x t = t' \text{, } x \notin T \\
s_T~(\bar t[x\mapsto \emptytree])~(t' \otimes \bar t~x) & \text{if } \nxt x t = t' \text{, } x \in T.
\end{cases}
\end{multline*}

The ability to define this function (relatively) easily is the main advantage of working with trace trees instead of co-call graphs at this stage.

As $s$ is defined in terms of monotone operations, it is itself monotone in its arguments $\bar t$ and $t$.

\begin{example}
With the same arguments as above, for $T=\{y\}$ the effect of calling $y$, i.e.\@ a call to $z$, happens only once, and we have
\[
s_T~\bar t~t =
\begin{tikzpicture}[grow=right, sibling distance=-12, level distance=18, every node/.style={inner sep=2pt}, baseline={([yshift=-2.3pt]O)}]
 \coordinate[circle, fill, inner sep=1]
  (O)
  child { coordinate
    child { coordinate
      child {
        coordinate
        child {
          coordinate
          edge from parent node[above] {$y$}
        }
        edge from parent node[above] {$z$}
      }
      child {
        coordinate
        child {
          coordinate
          edge from parent node[below] {$z$}
        }
        edge from parent node[below] {$y$}
      }
      edge from parent node[above] {$y$}
    }
    edge from parent node[above] {$x$}
  }
  ;
\end{tikzpicture}.
\]
\end{example}

\medskip

\index{c@$c\,\idxdummy$\idxexpl{projection from $\sTTree$ to $(\sVar\to\Card)$}}%
We project a $\sTTree$ to a value of type $(\sVar \to \Card)$, as required for a cardinality analysis, using $c \colon \sTTree \to (\sVar \to \Card)$ defined by

\[
c(t)~x \coloneqq
\begin{cases}
\bot, &\text{if $x$ does not occur in $t$} \\
\Co, &\text{{if on each path in $t$, $x$ occurs at most once}} \\
\Cm, &\text{otherwise.}
\end{cases}
\]

From this definition it follows
\begin{lemma}
$c(\nxt x t) \sqsubseteq c(t) - x.$\label{lem:c-nxt}
\end{lemma}

\subsubsection{Specification}
\label{sec:treespec}

\index{texp@$\fexp\idxdummy\idxdummy$\idxexpl{trace tree analysis}}%
A tree cardinality analysis determines for every expression $e$ and arity $\alpha $ the tree $\fexp e {\alpha}$ of calls to free variables of $e$ which are performed by evaluating $e$ with $\alpha$ arguments and using the result in any way.
As the resulting value might be passed to unknown code or stored in a data structure, we cannot assume anything about how often the resulting value is used. This justifies the arity parameter: We expect $\fexp{\sLam x y}0 = \many y$ but $\fexp{\sLam x y}1 = y$.

\index{tbinds@$\fbinds\idxdummy\idxdummy$\idxexpl{trace tree analysis on heaps}}%
I write $\fbinds{\Gamma}{\bar\alpha}$ for the analysis lifted to bindings, returning $\bot$ for variables not bound in $\Gamma$ or mapped to $\bot$ in $\bar\alpha$.

\index{theap@$\fheap\idxdummy\idxdummy\idxdummy$\idxexpl{trace tree analysis on bindings}}%
I also need a variant $\fheap \Gamma e \alpha$ that, given bindings $\Gamma$, an expression $e$ and an arity $\alpha$, reports the calls on $\dom \Gamma$ performed by $e$ and $\Gamma$ with these bindings in scope.

I can now identify conditions on $\mathcal T$ that allow to satisfy the specifications in \cref{def:cardspec}.

\begin{definition}[Tree cardinality analysis specification]
\label{def:treespec}
I expect the cardinality analysis to agree with the arity analysis on which variables are called at all:
\begin{flalign*}
\noprecondition
\dom \fexp e \alpha &= \dom\aexp e \alpha \tag{T-dom}\\
\noprecondition
\dom \fheap \Gamma e \alpha &= \dom\aheap \Gamma e \alpha \tag{Th-dom}
\intertext{%
Inequalities for the syntactic constructs:
}
\noprecondition
\many x \otimes \fexp e {\alpha+1} &\sqsubseteq \fexp {\sApp e x} \alpha \tag{T-App}\\
\noprecondition
(\fexp e {\alpha-1})\setminus\{x\} &\sqsubseteq \fexp {\sLam x e} \alpha \tag{T-Lam}\\
\noprecondition
\fexp {\subst e y x} {\alpha} &\sqsubseteq \many x \otimes (\fexp {e} \alpha)\setminus\{y\} \tag{T-subst}\\
\noprecondition
x &\sqsubseteq \fexp {x} \alpha \tag{T-Var}\\
\noprecondition
\fexp{e}0 \otimes (\fexp{e_\bT}\alpha \oplus \fexp{e_\bF}\alpha) 
&\sqsubseteq \fexp {\sITE e {e_\bT}{e_\bF}} \alpha \tag{T-If}\\
\noprecondition
(s_{\thunks \Gamma}~(\fbinds \Gamma {\aheap \Gamma e \alpha})~(\fexp e \alpha)) \setminus \dom \Gamma
&\sqsubseteq \fexp {\sLet \Gamma e}\alpha
\tag{T-Let}
\end{flalign*}
For values, analysed without arguments, the analysis is expected to return a repeatable tree:
\begin{equation*}
\isVal e \implies \fexp{e}{0}\text{ is repeatable} \tag{T-value}
\end{equation*}
The specification for $\aheap{\Gamma}e\alpha$ is closely related to (T-Let):
\begin{equation*}
\restr{(s_{\thunks \Gamma}~(\fbinds \Gamma {\aheap \Gamma e \alpha})~(\fexp e \alpha))}{\dom \Gamma}
\sqsubseteq \fheap \Gamma e \alpha \tag{Th-s}\\
\end{equation*}
And finally, the connection to the arity analysis:
\begin{equation*}
x \in \thunks \Gamma,\, c(\fheap{\Gamma}e\alpha)~x = \Cm\implies
(\aheap \Gamma e \alpha)~x = 0 \tag{Th-$\Cm$-thunk}
\end{equation*}
\end{definition}

\subsubsection{Safety}
\label{sec:treeproof}

Given a tree cardinality analysis, I can define a cardinality analysis in the sense of the previous section. The definition for $\cheap\Gamma e \alpha$ is straight forward:
\[
\cheap \Gamma e \alpha \coloneqq c(\fheap \Gamma e \alpha).
\]
In order to define $\prognosis{(\Gamma,e,S)}{(\bar\alpha, \alpha, \dot\alpha)}$ I need to fold the tree cardinality analysis over the stack:
\begin{align*}
\fstack{[]}{\_} &\coloneqq \bot \\
\fstack{\cons{\alts{e_\bT}{e_\bF}}S}{\alpha \cdot \dot\alpha} &\coloneqq \fstack S {\dot\alpha} \otimes (\fexp{e_\bT}\alpha \oplus \fexp{e_\bF}\alpha)  \\
\fstack{\cons{\arg x}S}{\dot\alpha} &\coloneqq \fstack S {\dot\alpha} \otimes \many x \\
\fstack{\cons{\upd x}S}{\dot\alpha} &\coloneqq \fstack S {\dot\alpha}.
\end{align*}
With this I can define
\[
\prognosis{(\Gamma,e,S)}{(\bar\alpha, \alpha, \dot\alpha)}
\coloneqq c\big(s_{\thunks \Gamma}~(\fbinds \Gamma {\bar\alpha})~(\fexp e \alpha \otimes \fstack S {\dot\alpha})\big),
\]
and set out to prove

\begin{lemma}%[Safety of a tree cardinality analysis]
\label{lem:tree-safety}
Given a tree cardinality analysis satisfying \cref{def:treespec}, together with an arity analysis satisfying \cref{def:arityspec}, the derived cardinality analysis satisfies \cref{def:cardspec}.
\end{lemma}


\begin{proof}
The conditions (C-dom) and (Ch-dom) follow directly from (T-dom) and (Th-dom) with (A-dom) and (Ah-dom).

The conditions (C-cong), (C-not-called) and (C-upd) follow directly from the definitions of $\overline{\mathcal T}$ and $\dot{\mathcal T}$

We have $\many x \sqsubseteq \fstack{S}{\dot\alpha}$ for $\arg x \in S$, so (C-args) follows from
\[
[x\mapsto \Cm] = c(\many x)\sqsubseteq c(\fstack S{\dot\alpha})\sqsubseteq (\prognosis{(\Gamma,e,S)}{(\bar\alpha, \alpha, \dot\alpha)}).
\]

Similar calculations prove (C-call) using (T-Var), (C-App) using \mbox{(T-App)}, (C-Lam) using (T-subst) and (T-Lam), (C-If$_1$) using (T-If).

Condition (C-If$_2$) is where the precision comes from, as we retain the knowledge that the two code paths are mutually exclusive. The proof is a direct consequence of $t \sqsubseteq t \oplus t'$.

The variable cases are interesting, as these interact with the heap, and hence with the $s$ function.

We first show that (C-Var$_1'$) is fulfilled.  Abbreviate $T\coloneqq \thunks{\Gamma}$ and note  that $x\notin T$. We have
\begin{conteq}[onecolumnr]
\prognosis{(\Gamma, e, S)}{(\bar\alpha, \bar\alpha~x, \dot\alpha)} \\
%= c\big(s_{T}~(\fbinds{\Gamma}{\bar\alpha})~(\fexp{e}{\bar\alpha~x} \otimes \fstack {\cons{\upd x}S}{\dot\alpha})\big) \\
= c\big(s_T~(\fbinds{\Gamma}{\bar\alpha})~(\fexp{e}{\bar\alpha~x} \otimes \fstack {S}{\dot\alpha})\big) \\
= c\big(s_T~(\fbinds{\Gamma}{\bar\alpha})~(\nxt x x \otimes \fexp{e}{\bar\alpha~x} \otimes \fstack {S}{\dot\alpha})\big) 
& as $\nxt x x=\emptytree$\\
\sqsubseteq c\big(s_T~(\fbinds{\Gamma}{\bar\alpha})~(\nxt x {x \otimes \fstack {S}{\dot\alpha}}) \otimes \fexp{e}{\bar\alpha~x})\big) 
& using $(\nxt x t) \otimes t' \sqsubseteq \nxt x {t\otimes t'}$
\\
= c\big(\nxt x {s_T~(\fbinds{\Gamma}{\bar\alpha})~(x \otimes \fstack {S}{\dot\alpha})}\big)
& by the definition of $s$ \\
\sqsubseteq c\big(s_T~(\fbinds{\Gamma}{\bar\alpha})~(x \otimes \fstack {S}{\dot\alpha})\big) - x & by \cref{lem:c-nxt}\\
\sqsubseteq c\big(s_T~(\fbinds{\Gamma}{\bar\alpha})~(\fexp x\alpha \otimes \fstack {S}{\dot\alpha})\big) - x
& by (T-Var) and $s$ monotone \\
= \prognosis{(\Gamma, x, S)}{(\bar\alpha, \alpha, \dot\alpha)} - x.
\end{conteq}

Condition (C-Var$_1$) represents the evaluation of a thunk. The proof is analogue, using $\fbinds{\Gamma}{\bar\alpha}[x\mapsto\emptytree]= \fbinds{\Gamma'}{\bar\alpha}$ in the step where the definition of $s$ is unfolded.

For (C-Var$_2$) abbreviate $T \coloneqq \thunks\Gamma = \thunks{\Gamma[x\mapsto e]}$. 
We know $\isVal e$, so $\fexp{e}0$ is repeatable, by (T-value). If a repeatable tree $t$ is already contained in the second argument to $s$, then we can remove it from the range of the first argument:
\[
s_T~(\bar t[x\mapsto t])~(t \otimes t') = s_T~\bar t~(t \otimes t')
\]
Altogether, we show
\begin{conteq}[onecolumnr]
\prognosis{(\Gamma[x\mapsto e], e, S)}{(\bar\alpha, 0,\dot\alpha)} \\
= c\big( s_T~(\fbinds{\Gamma[x\mapsto e]}{\bar\alpha})~(\fexp e {0} \otimes \fstack S{\dot\alpha}) \big) \\
= c\big( s_T~(\fbinds{\Gamma}{\bar\alpha}[x\mapsto \fexp{e}{\bar\alpha~x}])~(\fexp e {0} \otimes \fstack S{\dot\alpha}) \big) \\
\sqsubseteq c\big( s_T~(\fbinds{\Gamma}{\bar\alpha}[x\mapsto \fexp{e}{0}])~(\fexp e {0} \otimes \fstack S{\dot\alpha}) \big) \\
= c\big( s_T~(\fbinds{\Gamma}{\bar\alpha})~(\fexp e {0} \otimes \fstack S{\dot\alpha}) \big) & by $\fexp{e}0$ repeatable \\
= \prognosis{(\Gamma, e, \cons{\upd x}S)}{(\bar\alpha, 0, \dot\alpha)}.
\end{conteq}

Proving condition (C-Let) is for the most part a tedious calculation involving freshness of variables. We use that if the domain of $\bar t'$ is disjoint from the variables occurring in $\bar t$ (i.e.\@ $\forall y.\, \forall x \in \bar t~y.\, \bar t'~x=\emptytree$), then
\[
s_T~(\bar t\sqcup \bar t')~t = s_T~\bar t~(s_T~\bar t'~t).
\]

\noindent Abbreviating $T\coloneqq \thunks \Gamma$ and $T' \coloneqq \thunks \Delta$, we show:
%, which holds for $\bar t = \fbinds{\Gamma}{\bar\alpha}$ and $\bar t'= \fbinds{\Delta}{\aheap\Delta e \alpha}$.
\begin{conteq}[onecolumnr]
\prognosis{(\Delta\cdot\Gamma, e, S)}{(\aheap\Delta e \alpha \sqcup \bar\alpha, \alpha, \dot\alpha)}\\
= c\big( s_{T\cup T'}~(\fbinds{\Gamma\cdot\Delta}{\aheap\Delta e \alpha \sqcup \bar\alpha})~(\fexp{e}\alpha \otimes \fstack S{\dot\alpha}) \big)\\
%= c\big( s_{T\cup T'}~(\fbinds{\Gamma}{\bar\alpha} \sqcup \fbinds{\Delta}{\aheap\Delta e \alpha})~(\fexp{e}\alpha \otimes \fstack S{\dot\alpha}) \big)\\
= c\big( s_{T\cup T'}~(\fbinds{\Gamma}{\bar\alpha})~(s_{T\cup T'}~(\fbinds{\Delta}{\aheap\Delta e \alpha})~(\fexp{e}\alpha \otimes \fstack S{\dot\alpha})) \big) & by the above equation \\
= c\big( s_T~(\fbinds{\Gamma}{\bar\alpha})~(s_{T'}~(\fbinds{\Delta}{\aheap\Delta e \alpha})~(\fexp{e}\alpha \otimes \fstack S{\dot\alpha})) \big) \\
= c\big( s_T~(\fbinds{\Gamma}{\bar\alpha})~(s_{T'}~(\fbinds{\Delta}{\aheap\Delta e \alpha})~(\fexp{e}\alpha) \otimes \fstack S{\dot\alpha}) \big) & as $\dom\Delta$ is fresh with regard to $S$\\
= \restr{c\big( s_T~(\fbinds{\Gamma}{\bar\alpha})~(s_{T'}~(\fbinds{\Delta}{\aheap\Delta e \alpha})~(\fexp{e}\alpha) \otimes \fstack S{\dot\alpha}) \big)}{\dom\Delta} \sqcup \\
\phantom{{}={}} c\big( s_T~(\fbinds{\Gamma}{\bar\alpha})~(s_{T'}~(\fbinds{\Delta}{\aheap\Delta e \alpha})~(\fexp{e}\alpha) \otimes \fstack S{\dot\alpha}) \big)\setminus{\dom\Delta}  \\
= c\big( \restr{s_{T'}~(\fbinds{\Delta}{\aheap\Delta e \alpha})~(\fexp{e}\alpha)}{\dom\Delta}\big) \sqcup \\
\phantom{{}={}} c\big( s_T~(\fbinds{\Gamma}{\bar\alpha})~(s_{T'}~(\fbinds{\Delta}{\aheap\Delta e \alpha})~(\fexp{e}\alpha)\setminus{\dom\Delta}  \otimes \fstack S{\dot\alpha}) \big)\label{proofline:ttree_rest_substitute2} \\
\sqsubseteq c\big(\fheap{\Delta}e\alpha\big) \sqcup c\big( s_T~(\fbinds{\Gamma}{\bar\alpha})~(\fexp{\sLet \Delta e}\alpha \otimes \fstack S{\dot\alpha}) \big) & by (Th-s) and (T-Let) \\
= \cheap{\Delta}e\alpha \sqcup \prognosis{(\Gamma, \sLet \Delta e, S)}{(\bar\alpha, \alpha, \dot\alpha)}.
\end{conteq}

Finally, (Ah-$\Cm$-thunk) follows directly from  (Th-$\Cm$-thunk).
\end{proof}

\subsection{Co-call cardinality analysis}
\label{sec:cocallcard}

The preceding section provides a framework for a cardinality analysis, but the infinite nature of the $\sTTree$ data type prevents an implementation on that level. Therefore, the concrete implementation uses a practically implementable data type which can serve as an approximation to trace trees: The co-call graphs introduced in \cref{sec:cocallgraphs}.


\index{t@$t\,\idxdummy$\idxexpl{the trace tree represented by a $\sGraph$}}%
We can convert such a graph to a $\sTTree$, using the function $t \colon \sGraph \to \sTTree$  given by
\begin{multline*}
\paths{t(G)} \coloneqq \{x_1\cdots x_n \mid \forall i.\, x_i \in \dom G \wedge \forall j\ne i.\, \edge{x_i}{x_j} \in G\}.
\end{multline*}

\index{g@$g\,\idxdummy$\idxexpl{the $\sGraph$ approximation of a trace tree}}%
Conversely, we can approximate a $\sTTree$ by a $\sGraph$, as implemented by the function $g \colon \sTTree \to \sGraph$ where
\[
g(t) \coloneqq \bigsqcup\{\dot g (\dot x) \mid \dot x \in \paths t\}
\]
which uses $\dot g \colon [\sVar] \to \sGraph$ given by
\begin{align*}
\dom{\dot g(x_1\cdots x_n)} &= \{x_1,\ldots,x_n\} \\
\dot g(x_1\cdots x_n) &\coloneqq \{\edge {x_i}{x_j} \mid i\ne j \le n\}.
\end{align*}

The mappings $t$ and $g$ form a monotone Galois connection:
\[
g(t) \sqsubseteq G \iff t \sqsubseteq t(G).
\]
It even is a Galois insertion, as $g(t(G)) = G$.

\begin{example}
For
\begin{align*}
t &=
\begin{tikzpicture}[grow=right, sibling distance=-12, level distance=18, every node/.style={inner sep=2pt}, baseline={([yshift=-2.3pt]O)}]
 \coordinate[circle, fill, inner sep=1]
  (O)
  child {
    coordinate
    edge from parent node[above] {$x$}
  }
  child {
    coordinate
    child {
      coordinate
      child {
        coordinate
        edge from parent node[below] {$z$}
      }
      edge from parent node[below] {$z$}
    }
    edge from parent node[below] {$y$}
  }
  ;
\end{tikzpicture}\\
\shortintertext{we have}
g(t) &=
\begin{tikzpicture}[baseline={(0.base)}]
\node (0) {$x$};
\node at (0.6,0.3) (1) {$y$};
\node at (0.6,-0.3) (2) {$z$};
\draw (1) -- (2);
\draw (2) edge[loop right] (2);
\end{tikzpicture}
\shortintertext{and}
t(g(t)) &=
\begin{tikzpicture}[grow=right, level 1/.style={sibling distance=-18}, level 2/.style={sibling distance=-18}, level 3/.style={sibling distance=-18}, sibling distance=-5, level distance=18, every node/.style={inner sep=2pt}, baseline={([yshift=-2.3pt]O)}]
 \coordinate[circle, fill, inner sep=1]
  (O)
  child {
    coordinate
    edge from parent node[above] {$x$}
  }
  child {
    coordinate
    child {
      coordinate
      child {
        coordinate
        child {
          coordinate
          edge from parent[dotted]
        }
        edge from parent node[below] {$z$}
      }
      child {
        coordinate
        child {
          coordinate
          child {
            coordinate
            child {
              coordinate
              edge from parent[dotted]
            }
            edge from parent node[below] {$z$}
          }
          edge from parent node[below] {$z$}
        }
        edge from parent node[below] {$y$}
      }
      edge from parent node[below] {$z$}
    }
    child {
      coordinate
      child {
        coordinate
        child {
          coordinate
          child {
            coordinate
            edge from parent[dotted]
          }
          edge from parent node[below] {$z$}
        }
        edge from parent node[below] {$z$}
      }
      edge from parent node[below] {$y$}
    }
    edge from parent node[below] {$z$}
  }
  child {
    coordinate
    child {
      coordinate
      child {
        coordinate
        child {
          coordinate
          edge from parent[dotted]
        }
        edge from parent node[below] {$z$}
      }
      edge from parent node[below] {$z$}
    }
    edge from parent node[below] {$y$}
  }
  ;
\end{tikzpicture}
\end{align*}
which shows that converting from trees to graphs and back loses information, in particular about whether something is called twice or more often, but the resulting tree still contains all the paths of the original tree.
\end{example}

\subsubsection{Specification}

I proceed in the usual scheme, by giving a specification for a safe co-call cardinality analysis, connecting it to the tree cardinality analysis, and eventually proving that our implementation fulfils the specification.

\index{gexp@$\ccexp\idxdummy\idxdummy$\idxexpl{co-call analysis}}%
\index{gheap@$\ccheap\idxdummy\idxdummy\idxdummy$\idxexpl{co-call analysis of a binding}}%
A co-call cardinality analysis determines for each expression $e$ and incoming arity $\alpha$ its co-call graph $\ccexp e \alpha$. As before, there is also a variant that analyses bindings, written $\ccheap \Gamma e \alpha$. The conditions in the following definition are obviously designed to connect to \cref{def:treespec}.

\begin{definition}[Co-call cardinality analysis specification]
\label{def:cocallspec}
We want the co-call graph analysis to agree with the arity analysis on what is called at all:
\begin{flalign*}
\noprecondition
\dom \ccexp e \alpha &= \dom \aexp e \alpha \tag{G-dom}\\
% \noprecondition
% \dom \ccheap\Gamma e \alpha &= \dom \aheap \Gamma e \alpha \tag{Ch-dom}
\intertext{%
As usual, we have inequalities for the syntactic constructs:
}
\noprecondition
\ccexp e {\alpha +1} \sqcup (\{x\}\times \fv (\sApp e x)) &\sqsubseteq \ccexp{\sApp e x}\alpha \tag{G-App} \\
\noprecondition
\ccexp e {\alpha -1} \setminus \{x\} &\sqsubseteq \ccexp{\sLam x e}\alpha \tag{G-Lam} \\
\noprecondition
\ccexp {\subst e y x}{\alpha} \setminus \{x,y\} &\sqsubseteq \ccexp e{\alpha}\setminus \{x,y\} \tag{G-subst} \\
\longlineleft {\ccexp e 0 \sqcup \ccexp {e_\bT} \alpha \sqcup \ccexp {e_\bF} \alpha \sqcup 
               (\dom \aexp e 0 \times (\dom \aexp {e_\bT} \alpha \cup \dom \aexp {e_\bF} \alpha))}
\noprecondition
&\sqsubseteq \ccexp {\sITE e {e_\bT} {e_\bF}} \alpha \tag{G-If}\\
\noprecondition
\ccheap \Gamma e \alpha \setminus \dom \Gamma &\sqsubseteq \ccexp {\sLet \Gamma e} \alpha \tag{G-Let} \\
\shortprecondition{\isVal e}
(\fv e)^2 &\sqsubseteq \ccexp e 0 \tag{G-value}
\intertext{%
The following conditions concern $\ccheap \Gamma e \alpha$, which has to cater for the calls originating in $e$,
}
\noprecondition
\ccexp e \alpha &\sqsubseteq \ccheap \Gamma e \alpha, \tag{Gh-body}\\
\intertext{%
the calls originating in the right-hand-sides,
}
\shortprecondition{(x\mapsto e') \in \Gamma}
\ccexp {e'} {\aheap \Gamma e \alpha~x} &\sqsubseteq \ccheap \Gamma e \alpha, \tag{Gh-heap}\\
\intertext{%
and finally the extra edges between what is called from the right-hand-side of a variable and whatever the variable is called with:
}
\precondition{(x\mapsto e') \in \Gamma,\, \isVal {e'}}
(\fv e') \times \neighbors x {\ccheap \gamma e a} &\sqsubseteq \ccheap \Gamma e \alpha. \tag{Gh-extra}\\
\intertext{%
For thunks, we can be slightly more precise: Only one call to them matters, so we can ignore  a possible edge $\edge x x$:
}
\precondition{(x\mapsto e') \in \Gamma,\, \neg \isVal {e'}}
(\fv e') \times (\neighbors x {\ccheap \gamma e a} \setminus \{x\}) &\sqsubseteq \ccheap \Gamma e \alpha \tag{Gh-extra'}
\intertext{
Finally, we need to ensure that the cardinality analysis is actually used by the arity analysis when dealing with thunks. For recursive bindings, we never eta-expand thunks:
}
\precondition{\rec \Gamma,\, x \in \thunks \Gamma,\,  x\in \dom \aheap \Gamma e \alpha}
\aheap \Gamma e\alpha &= 0& \tag{Rec-$\Cm$-thunk}
\intertext{
But for a non-recursive thunk, we only have to worry about thunks which are possibly called multiple times:
}
\precondition{x \notin \fv e',\, \neg \isVal {e'},\,  \edge x x \in \ccheap \Gamma e \alpha}
\aheap {[x\mapsto e']} e\alpha &= 0 \tag{Nonrec-$\Cm$-thunk}
\end{flalign*}

\end{definition}

\subsubsection{Safety}
\label{sec:cocallproof}

From a co-call analysis fulfilling \cref{def:cocallspec} we can derive a tree cardinality analysis fulfilling \cref{def:treespec}, using
\begin{align*}
\fexp e \alpha &\coloneqq t(\ccexp e \alpha).
\end{align*}
The definition of $\fheap \Gamma e \alpha$ 
differs for non-recursive and recursive bindings.
\begin{compactitem}
\item For a non-recursive binding $\Gamma=[x\mapsto e']$ we have $\fheap \Gamma e \alpha \coloneqq \restr{t(\ccexp e \alpha)}{\dom \Gamma}$ and
\item for recursive $\Gamma$ we define $\fheap \Gamma e \alpha \coloneqq t((\dom {\aheap \Gamma e \alpha})^2)$, i.e.\@ the bound variables may call each other in any way.
\end{compactitem}
%
%\begin{align*}
%\fheap \Gamma e \alpha &\coloneqq
%\begin{cases}
%\bot    &\text{if } (\aheap {[x\mapsto e']}e \alpha)~x = \bot \\
%x       &\text{if } (\aheap {[x\mapsto e']}e \alpha)~x \ne \bot,\, \edge x x \notin \ccexp e \alpha\\
%\many x &\text{if } (\aheap {[x\mapsto e']}e \alpha)~x \ne \bot,\, \edge x x \in \ccexp e \alpha
%\end{cases}
%\end{align*}
%
%Note that this definition also holds for all non-recursive cases but the second.

\begin{lemma}%[Safety of a co-call cardinality analysis]
\label{lem:cocall-safety}
Given a co-call cardinality analysis satisfying \cref{def:cocallspec}, together with an arity analysis satisfying \cref{def:arityspec}, the derived cardinality analysis satisfies \cref{def:treespec}.
\end{lemma}

\begin{proof}
Most conditions of \cref{def:treespec} follow by simple calculation from their counterpart in \cref{def:cocallspec} using the Galois connection
\begin{align*}
t \sqsubseteq t(G) \iff g(t) \sqsubseteq G
\end{align*}
and identities such as
$
g(t \oplus t') = g(t) \sqcup g(t')
$
and
$
g(t \otimes t') = g(t) \sqcup g(t') \sqcup (\dom t \times \dom t').
$

For (T-Let), we use (G-Let) with the following \cref{lem:ttree-subst}, instantiated with $T=\thunks\Gamma$, $\bar t = \fbinds \Gamma {\aheap \Gamma e \alpha}$, $t = \fexp e \alpha$ and $S = \dom \Gamma$.
\end{proof}

\begin{lemma}
\label{lem:ttree-subst}
Given
\begin{compactitem}
\item $g(t) \sqsubseteq G$,
\item $\forall x \notin S.\, \bar t~x = \bot$,
\item $\forall x \in S.\, g(\bar t~x) \sqsubseteq G$,
\item $\forall x \in S,\, x \notin T.\, \dom(\bar t~x) \times N_x(G) \sqsubseteq G$ and
\item $\forall x \in S,\, x \in T.\, \dom(\bar t~x) \times (N_x(G)\setminus\{x\}) \sqsubseteq G$
\end{compactitem}
we have $g({(s_T~\bar t~t)}\setminus{S}) \sqsubseteq G$.
\end{lemma}

Intuitively, this lemma describes how we can approximate the trace tree that is the result of integrating trace trees representing the bound expressions in a recursive binding with the trace tree describing the calls from the body. The conditions specify that
\begin{compactitem}
\item the body is approximated by the graph $G$,
\item the set $S$ encompasses all bound variables,
\item the effect of each individual bound trace tree is approximated by $G$,
\item for a non-thunk $x$, for every edge $\edge x y\in G$, there is also an edge from $y$ to anything that is called by $x$ and
\item similarly for a thunk $x$, but disregarding a possible loop $\edge x x \in G$.
\end{compactitem}
The absence in $G$ of an edge $\edge x y$ for $x, y\in S$ does not indicate that  calls to $x$ are $y$ are exclusive (they are mutually recursive, so typically both will be called, many times), but rather that in an infinite unwrapping of the recursive let, as explained on \cpageref{infinite-unwrapping}, the recursion proceeds with either $x$ or $y$. Therefore, the inequality in the conclusion of the lemma disregards variables from $S$.

\begin{proof}
In order to prove $g({(s_T~\bar t~t)}\setminus{S}) \sqsubseteq G$, we have to show that $\dot g (\dot x\setminus S) \sqsubseteq G$ for every path $\dot x\in\paths{s_T~\bar t~t}$. I write $\dot x \setminus S$ for the list $\dot x$ with all elements in $S$ filtered out.

The behaviour of the function $s$ can also be described as follows: In order to produce the path $\dot x \in \paths{s_T~\bar t~t}$, $s$ picks a specific path $\dot y \in \paths t$ (and disregards $t$ otherwise). Going through each entry $x$ on $\dot y$, the function chooses a a specific path from $\bar t~x$ and interleaves it into $\dot y$ after the entry $x$. This procedure then continues with the interleaved path, at the position following $x$.

In order to do a proof by induction, I strengthen the proposition, and keep track of two sets: The set $V$ of variables not in $S$ that have been called so far, and the set $V_T$ that keeps track of the thunks that have been called. The assumptions of the strengthened proposition are
\begin{compactitem}
\item $\dot x$ is a path produced by interleaving the trees from $\bar t$ into $\dot y$, as described above,
\item $\dot g(\dot y \setminus V_T) \sqsubseteq G$,
\item $V \times (\dot y \setminus V_T) \sqsubseteq G$,
\item $V \cap S = \{\}$,
\item $V_T \subseteq S$,
\item $\forall x \in V_T,\, \bar t~x = \bot$,
\end{compactitem}
and we show not only $g(\dot x \setminus{S}) \sqsubseteq G$, but also $V \times (\dot x \setminus V_T) \sqsubseteq G$.

With $V = V_T = \{\}$, the assumptions are fulfilled, so by proving this proposition, we conclude the lemma.

The statement is trivial for $\dot y = \dot x = []$. Otherwise, $\dot x$ and $\dot y$ are necessarily headed by the same variable, so let $\dot y = x\cdot \dot y'$ and $\dot x = x \cdot \dot x'$, where $\dot x'$ is a path produced by interleaving the trees from $\bar t'$ into an interleaving of $\dot y'$ and $\dot z$, where $\dot z$ is a path in the tree $\bar t~x$ and $\bar t'$ is $\bar t$ if $x \notin T$ and $\bar t[x\mapsto \emptytree]$ otherwise (cf.\ the definition of $s$ on \cpageref{function:s}).

There are three cases to consider:
\begin{itemize}
\item If $x$ refers to a thunk that we have seen before (i.e.\@ $x \in V_T$), then $\bar t ~ x$ is the empty tree. We invoke the inductive hypothesis with the same $V$ and $V_T$ and obtain $g(\dot x' \setminus{S}) \sqsubseteq G$ and $V \times (\dot x' \setminus V_T) \sqsubseteq G$. The assumptions are fulfilled, as $x \in V_T$ and $\dot z = []$, and the conclusion immediately implies the desired $g(\dot x \setminus{S}) \sqsubseteq G$ and $V \times (\dot x \setminus V_T) \sqsubseteq G$, as $x\in V_T$ and $x\in S$ due to $V_T \subseteq S$.

\item Otherwise, if $x$ is a recursive call (i.e.\@ $x \in S$), we again invoke the inductive hypothesis, this time extending $V_T$ by $x$ if $x\in T$. To establish the assumptions, we first decompose what we have given:
\begin{align*}
\dot g((x\cdot\dot y') \setminus V_T) &= \{x\} \times (\dot y' \setminus V_T) \sqcup \dot g(\dot y' \setminus V_T).
\intertext{and}
V\times ((x\cdot \dot y')\setminus V_T) &= V \times \{x\} \sqcup V \times (\dot y'\setminus V_T).
\end{align*}
Furthermore, we note that $g(\dot z \setminus V_T) \sqsubseteq G$.

It remains to show that $V \times (\dot z \setminus V_T) \sqsubseteq G$.
Above decomposition provides $V \times \{x\} \sqsubseteq G$, so all calls seen so far are adjacent to $x$ in $G$, and thus $V \subseteq \neighbors x G\setminus\{x\}$. Together with $\dot z\subseteq \dom{(\bar t~x)}$ the assumption of the lemma provides the desired inequality.

We then obtain $g(\dot x' \setminus{S}) \sqsubseteq G$ and $V \times (\dot x' \setminus V_T) \sqsubseteq G$, which, together with $V \times \{x\} \sqsubseteq G$, implies the desired result.

\item The remaining case is that of a call to something not in $S$, i.e.\@ $x\notin S$.

In this case, $\dot z=[]$, so the above decompositions suffice to establish the assumptions of the inductive hypothesis. This time, we extend $V$ with $\{x\}$ and obtain $g(\dot x' \setminus{S}) \sqsubseteq G$ and $(V\cup\{x\}) \times (\dot x' \setminus V_T) \sqsubseteq G$.

This implies the desired result together with $V \times \{x\} \sqsubseteq G$ and $\{x\}\times (\dot x'\setminus S) \sqsubseteq G$, which follows from the second conclusion of the inductive hypothesis due to $V_T \subseteq S$.
\end{itemize}
\end{proof}

\subsection{Call Arity, concretely}
\label{sec:callarityconcretely}

At last I can give the complete and concrete co-call analysis corresponding to GHC’s Call Arity, and establish its safety via our chain of refinements, simply by checking the conditions in \cref{def:cocallspec}. It is a slightly more concise reformulation of the specification given in \cref{sec:spec}, and adjusted to the restricted syntax where application arguments are always variables.

The arity analysis is:
\begin{align*}
\aexp{x}{\alpha} &\coloneqq [x \mapsto \alpha]\\
\aexp{\sApp e x}{\alpha} &\coloneqq \aexp{e}{\alpha+1} \sqcup [x \mapsto 0] \\
\aexp{\sLam x  e}{\alpha} &\coloneqq \aexp{e}{\alpha-1} \setminus \{x\} \\
\aexp{\sITE e {e_\bT}{e_\bF}}{\alpha} &\coloneqq \aexp{e}{0} \sqcup \aexp{e_\bT}{\alpha } \sqcup \aexp{e_\bF}{\alpha } \\
\aexp{\sCon b}{\alpha} &\coloneqq \bot \quad\quad \text{for } b\in\{\bT,\bF\}
\end{align*}

The analysis of a let expression $\aexp{\sLet\Gamma e} \alpha$ as well as the analysis of a binding $\aheap \Gamma e \alpha$ are defined differently for recursive and non-recursive bindings.

For a recursive $\Gamma$, we have $\aexp{\sLet\Gamma e}\alpha \coloneqq \bar\alpha \setminus \dom\Gamma$  and $\aheap \Gamma e \alpha \coloneqq \restr{\bar\alpha}{\dom\Gamma}$  where $\bar\alpha$ is the least fixed point defined by the equation%
\footnote{The initial implementation of Call Arity did not include the third term in this equation, and thunks would erroneously be eta-expanded if they happened to be part of a linearly recursive bindings. Working towards this formalisation uncovered the bug (see GHC commit 306d255).}
\[
\bar\alpha = \abinds \Gamma {\bar\alpha} \sqcup \aexp e \alpha \sqcup  [x \mapsto 0 \mid x \in \thunks\Gamma].
\]

For a non-recursive binding $\Gamma = [x \mapsto e']$ we have $\aexp{\sLet\Gamma e}\alpha \coloneqq (\aexp{e'}{\alpha'} \sqcup \aexp e\alpha) \setminus \dom\Gamma$  and $\aheap \Gamma e \alpha \coloneqq [x \mapsto \alpha']$  where 
\begin{align*}
\alpha' &\coloneqq
\begin{cases}
0                   &\text{if } \neg \isVal {e'} \text{ and }\edge x x \in \ccexp e \alpha\\
\aexp e \alpha~x  &\text{otherwise.}
\end{cases}
\end{align*}


We have $\dom \ccexp e \alpha = \dom \aexp e \alpha$ and
\begin{align*}
\ccexp x \alpha &\coloneqq \{\} \\
\ccexp {\sApp e x} \alpha &\coloneqq \ccexp e {\alpha +1} \sqcup (\{x\}\times \fv (\sApp e x)) \\
\ccexp {\sLam x e} 0 &\coloneqq (\fv e)^2\ \setminus \{x\} \\
\ccexp {\sLam x e} {\alpha+1} &\coloneqq \ccexp e \alpha \setminus \{x\} \\
\ccexp {\sITE e {e_\bT} {e_\bF}} \alpha &\coloneqq 
\!\begin{aligned}[t]
&\ccexp e 0 \sqcup \ccexp {e_\bT} \alpha \sqcup \ccexp {e_\bF} \alpha \sqcup{}\\
&(\dom \aexp e 0 \times (\dom \aexp {e_\bT} \alpha \cup \dom \aexp {e_\bF} \alpha))
\end{aligned}\\
\ccexp {\sCon b} \alpha &\coloneqq \{\}\quad\quad \text{for } b\in\{\bT,\bF\}\\
\ccexp{\sLet \Gamma e}\alpha &\coloneqq \ccheap \Gamma e \alpha \setminus \dom\Gamma
\end{align*}

The analysis result for bindings is different for recursive and non-recursive bindings and uses the auxiliary function
\begin{align*}
\ccbind x {e'} {\bar \alpha} G \coloneqq
\begin{cases}
(\fv {e'})^2 &\text {if }\isVal {e'} \wedge \edge x x \in G \\
\ccexp {e'}{\bar\alpha~x} &\text{otherwise,}
\end{cases}
\end{align*}
which calculates the co-calls of an individual binding, adding the extra edges between multiple invocations of a bound variable, unless it is bound to a thunk and hence shared.


For  recursive $\Gamma$ we define $\ccheap \Gamma e \alpha$ as the least fixed point fulfilling
\begin{align*}
\ccheap \Gamma e \alpha = 
\begin{alignedat}[t]{2}
\ccexp e \alpha 
& \sqcup \bigsqcup_{(x\mapsto e')\in\Gamma} \ccbind x {e'}{\aheap \Gamma e \alpha}{\ccheap \Gamma e \alpha} \\
& \sqcup \bigsqcup_{(x\mapsto e')\in\Gamma} (\fv {e'} \times N_x(\ccheap \Gamma e \alpha)).
\end{alignedat}
\end{align*}

For a non-recursive $\Gamma = [x \mapsto e']$, we have
\begin{align*}
\ccheap \Gamma e \alpha =
\begin{alignedat}[t]{2}
\ccexp e \alpha
&\sqcup  \ccbind x {e'}{\aheap \Gamma e \alpha}{\ccexp e \alpha} \\
&\sqcup
\begin{cases}
\fv {e'} \times (N_x(\ccexp e \alpha) \setminus\{x\}) & \text{ if } \neg \isVal {e'} \\
\fv {e'} \times N_x(\ccexp e \alpha)                  & \text{ if } \isVal {e'}. \\
\end{cases}
\end{alignedat}
\end{align*}

\begin{theorem}
Call Arity is safe (in the sense of \cref{def:safe}).
\label{thm:cocallsafety}
\end{theorem}

\begin{proof}
By straightforward calculation (and simple induction for (G-subst)), we can show that the analyses fulfil \cref{def:arityspec,def:cocallspec}. So by \cref{lem:cocall-safety,lem:tree-safety,lem:card-arity-safety,cor:arity-safe}, Call Arity is safe.
\end{proof}

\section{The Isabelle formalisation}
\label{sec:formalisation}

On their own, the proofs presented in the previous sections are involved, but otherwise rather standard. What sets them apart from similar work is that these proofs have been carried out in the interactive theorem prover Isabelle \cite{isabelle}. This provides a level of assurance that is hard to reach using pen-and-paper-proofs.

\subsection{Size and effort}

But it also greatly increases the effort involved in obtaining a result like \cref{thm:cocallsafety}. The Isabelle development corresponding to this chapter, including the definitions of the syntax and the semantics (but excluding unrelated results from \cref{ch:launchbury}, such as correctness and adequacy of the semantics), contains roughly 12,000 lines of code with 1,200 lemmas (many small, some large) in 75 theories, created over the course of 9 months \cite{arity-afp}. 
%Large parts of it, however, can be re-used for other developments: The syntax and semantics, of course, but also the newly created data types like the trace trees and the co-call graphs.
%
Much of the complexity is owed to the problem of bindings and to the handling of monotonicity and continuity of the analysis. See \cref{sec:launchbury-formalisation} for more information on the formalisation, especially how using Nominal logic, as discussed (\cref{sec:nominal,sec:using-nominal}) and the HOLCF package (\cref{sec:domaintheory}) has helped here.

%Another cause of overhead is ensuring that all analyses and the operators used by them are monotone and continuous, so that the fixed points are actually well-defined. Here, I use the HOLCF package by Brian Huffman \cite{holcf} with good results, but again not without an extra cost compared to handwaving over such issues in pen-and-paper proofs.

So while the actual result shown here might not have warranted that effort on its own -- after all, performance regressions due to bugs in the Call Arity analysis do not have very serious consequences -- it lays ground towards formalising more and more parts of the core data structures and algorithms in our compilers.

\subsection{Structure}

The separation into individual theories (Isabelle's equivalent to Haskell's modules) as well as the use of \emph{locales} (\cite{locales}, Isabelle's approximation to a module system) helps to gain insight into the structure of an otherwise very large proof, by ensuring a separation of concerns.
%For example, the definitions required to merely \emph{define} the analysis and transformation are separate from the correctness proofs: Only the theory with the correctness proof will import the theories with the semantics, ensuring that the transformation is indeed independent of the semantics.
For example, the proof of $\llbracket \trans e 0 \rrbracket = \llbracket e \rrbracket$ has only the conditions from \cref{def:arityspec} available, which shows that the cardinality analysis is irrelevant for functional correctness.

%Another benefit of having a machine-checked proof is that one can be a bit more liberal in the write-up for human consumption, i.e.\@ this paper, and skip over uninteresting technical details with a much better conscience, knowing there really is nothing lurking in the skipped parts. We hope that this made the paper a bit more pleasant to read than if it were completely rigorous.

\subsection{The trace tree type implementation}
\label{sec:trace-tree-impl}

Isabelle/HOL is a logic of total functions, and it is not a surprise that it has good support for inductive data types via the \isa{\isacommand{datatype}} command, where each element of the type has a finite size. But the type of trace trees introduced in \cref{sec:tracetrees} is not of that kind: To model program behaviour with recursion I need infinite trees as well.

Fortunately, it is possible to define such types in Isabelle/HOL, and there are actually a few options available:

\begin{itemize}
\item The HOLCF package can construct domains with an infinitely deep structure from a domain equation like
\[
\sTTree = \sVar \to \sTTree_\bot
\]
which can be implemented as
\isainputanonym{ttree-holcf}

\item The \isa{\isacommand{codatatype}} command provides support for co-inductive data types, which can create the appropriate type directly:
\isainputanonym{ttree-codatatype}
\item The type can be constructed “by hand” using the plain \isa{\isacommand{typedef}} command.
\end{itemize}

I have experimented with all three variants and found that the first two approaches would not provide me with the right tools (e.g.\ definition tools, induction principles) that allow me to efficiently define the required operation and prove the required lemmas, so I turned to the “by hand” construction.

To that end, I defined the notion of sets of lists where for every list, all its prefixes are in the set as well. These are the downward-closed sets under the prefix-order on lists:
\goodbreak

\isainput{downset}
Any non-empty downward-closed set is then such a trace tree:
\isainput{ttree}
A \isa{\isacommand{typedef}} is only admissible if the given set is not empty; this is the proof obligation solved automatically using \isa{\isacommand{by} auto}; the set $\{[]\}$, corresponding to the tree $\emptytree$, is one possible witness of that.

In \cref{sec:tracetrees} I describe two concrete interpretation of trace trees: As sets of traces and as automata with labelled transitions. The actual type definition is based on the first, so the function $\paths{}$ returns nothing but the representation of a tree based on that type definition. Using the helpful machinery of the lifting package \cite{lifting}, this function is thus defined to be the identity  function, once the abstraction is removed:
\isainput{paths-definition}


The automata view is realised by the two functions \isa{possible} and \isa{nxt} which indicate what labels are present on the root’s edges, and the corresponding child node.
\isainput{automata-view}

In order to make \isa{nxt} a total function I add the empty list to the result, which only matters if there was no edge with the requested label in the given tree. The proof obligation following the definition ensures that the result of \isa{nxt} is a valid tree, i.e.\@ non-empty and downward closed.

The important operations on trees, $\oplus$ and $\otimes$, are defined in terms of set union and list interleavings. As the Isabelle theory for list interleavings already uses the latter symbol, I use $\oplus\oplus$ resp.\ $\otimes\otimes$ in my formalisation for the operations on trees:
\isainput{either-definition}
\isainput{both-definition}

Further operations include, for example, the function \isa{without} that is defined in terms of \isa{filter}:
\isainput{without-definition}
Operations like this and the similar \isa{ttree{\isacharunderscore}restr} are partly the reason why using the existing infrastructure for co-inductive definitions failed, as filtering is a notoriously difficult problem here: The paper \cite{coinductive-filter} discusses that problem in depth.

Most of the code in the Isabelle theory on trace trees is concerned with the $s$ function (see \cpageref{function:s}), which is defined as
\begin{multline*}
\nxt x {s_T~\bar t~t}
\coloneqq 
\begin{cases}
\bot &\text{if } \nxt x t  =\bot \\
s_T~\bar t~(t' \otimes \bar t~x) & \text{if } \nxt x t = t' \text{, } x \notin T \\
s_T~(\bar t[x\mapsto \emptytree])~(t' \otimes \bar t~x) & \text{if } \nxt x t = t' \text{, } x \in T.
\end{cases}
\end{multline*}

In the Isabelle formalisation, I first define a related predicate on traces (\isa{substitute'}), by recursion on the trace, as I need to prove that predicate to be downward-closed before I can lift it to form the real \isa{substitute} on trace trees:
\isainput{substitute-prime-definition}
\isainput{substitute-definition}

Depending on the proposition that one wants to show, a different definition of \isa{substitute} provides a more useful induction scheme. In this alternative definition, it is emphasised that every path in $s_T~\bar t~t$ comes from a specific path in $t$. I formalised this using the following inductive definition and equivalence proof:
\isainput{substitute-prime-prime}

This alternative definition is used in one direction of the following lemma, which states that $\restr{s_T~\bar t~t}S = s_T~\bar t~(\restr t S)$ holds if all of $\bar t$, i.e.\@ both its domain as well as all variables in its range, are in the set $S$.
\isainput{ttree-rest-substitute2}
This lemma, which I use implicitly without much ado in the proof on \cpageref{proofline:ttree_rest_substitute2}, is a good example how much intuitivity and proof difficulty can differ:

It is quite obviously true, as $s$ only acts on and adds edges with labels in $S$, which is completely ignored by $\restr \_ S$.

Nevertheless, the proof consists of more than 120 lines of Isar, not counting supporting definitions such as the alternative definition of $s$. The complexity is owed to the fact that on both sides of the equality, we have a filtered trees. Induction over a path of filtered tree is not very useful, as in one step of the induction, still many (filtered) edges can be traversed by $s$. So I need to get hold of the original, unfiltered tree, which in a way hides behind an existential quantifier, and working in Isabelle with such existentially quantified statement tends to require more explicit steps.

This also confirms the observation that combining filter-like operations and co-inductive definitions is tricky.


\section{The formalisation gap}
\label{sec:formalisationgap}

Every formalisation -- whether hand-written or machine-checked -- has a formalisation gap, i.e.\@ a difference to the formalised artefact that is not (and often cannot) be formally bridged. Despite the effort that went into this formalisation, the gap is not very narrow, and has been wide enough to fall into.

\subsection{Core vs.\@ my syntax}
\label{sec:formalisationgap-syntax}

The most obvious difference between formalisation and reality is the syntax of the language: GHC’s Core (\cref{fig:core}) is defined with 15 constructors, while the small lambda calculus that I use to represent Core has only six (\cref{fig:launchbury_syntax} plus the two in \cref{sec:addingbooleans}). I argue that it is still a reasonable representation of Core: As I explain in \cref{sec:implementation}, the additional syntactic constructs are irrelevant to our analysis: It either returns the trivial result (literals, types, coercions), or transparently looks through them (casts, ticks, type abstraction and applications).

A similar difference is that Core is typed.  Nevertheless it is ok not to include types in the formalisation, as the Call Arity analysis ignores them anyways. One might now worry that the transformation might break type safety. This does not happen, as the type-ignorant Call Arity code does not actually transform the code: It merely annotates it, and the existing general-purpose simplifier then actually eta-expands it, if the types allow this. If they do not (which may be the case with type families), it will simply refuse to do so. This does not affect the safety results, as in my proofs I always only require a lower bound on the analysis result (i.e.\@ an upper bound on the reported arity), so being less aggressive is always safe.

\subsection{Core vs.\@ my semantics}
\label{sec:formalisationgap-semantics}

As mentioned in \cref{sec:ghc-core}, there is no official operational semantics for GHC Core that captures its evaluation behaviour, besides the actual implementation. It is folklore and my own understanding of GHC inner workings that justify that I use Launchbury’s semantics to model Core’s evaluation behaviour.

This is not without pitfalls: Core allows arbitrary expressions as arguments, while my syntax follows Launchbury and restricts that to variables. The transformation is rather simple: Just replace $\sApp{e_1}{e_2}$ with $\sLet{z = e_2}{\sApp{e_1} z}$ – this is essentially the first step of the Core-to-STG transformation as performed by GHC. But this is not the whole story: If the argument already is a variable, i.e.\@ $\sApp{e_1}x$, then no extra let-binding is introduced.

This subtly changes the evaluation behaviour: While the evaluation of $\sApp{e_1}{(\sApp{x}{y})}$ calls $x$ at most once, the evaluation of $\sApp{e_1}{x}$ can call $x$ multiple times. Therefore, the equation for $\ccexp{\sApp{e_1}{e_2}}\alpha$ in \cref{fig:eqn1} has two cases.

Originally, the Call Arity code did not take that into account and would return a wrong result in such instances.  Interestingly, this could not be observed: As the argument of an application $\sApp{e_1}x$ is always analysed with an incoming arity of 0, this would thus be the call arity of $x$, and no eta-expansion would happen. Furthermore, when concluding the analysis of the let-expression where $x$ is bound, $\alpha_x$ is definitely 0 and then, according to the equations in \cref{fig:eqn2} resp.\@ \cref{fig:eqn3}, it does not matter whether $x$ is called multiple times.

It would make a noticeable difference if the cardinality result is to be used elsewhere as well, e.g.\@ to avoid the updating of thunks that are used at most once anyways (\cref{sec:update-avoidance}): Preliminary testing confirms that this leads to a huge increase of allocations and program run time if this corner case had not been taken care of.

In order to prove such update-avoidance based on my (or any) analysis to be correct, one could use a semantics that models these update flags explicitly. In \cite{biernacki} Pirog \& Biernacki describe a big-step semantics for STG which is operationally very close to the GHC runtime, as from it they derive -- formally verified(!) -- a virtual machine equivalent to the STG machine (cf.\@ \cref{sec:stg}).

But as Call Arity is a Core-to-Core analysis, this clearly shows that there is demand for a precise formal semantics for GHC’s Core with enough detail to describe its evaluation and sharing behaviour. It would help to avoid such mistakes and can serve as a reference for developers coming up and implementing Core-to-Core transformations and other parts of the compiler.

\subsection{Core’s annotations}

Identifiers in GHC's core are annotated with a wealth of additional information -- inlining information, occurrence information, strictness signatures, demand information. As later phases rely on these information, they have to be considered part of the language, and should be included in a formal semantics.

This actually caused a nasty bug\footnote{\url{https://ghc.haskell.org/trac/ghc/ticket/10176}} that appeared in the third release candidate of GHC 7.10. The symptoms were weird: The program would skip over a call to \textsf{error} and simply carry on with the rest of the code. With Call Arity disabled, nothing surprising happened and the exception was raised as expression. What went wrong?

It boiled down to a function
\begin{hcode}
\> f :: a → b\\
\> f x = error "\ldots"
\end{hcode}
which the strictness analyser annotates with \texttt{<B,A>b}, indicating that once \hi f is called with one argument, the result is definitely bottom.

In the code at hand, every call to \hi f passes two arguments, i.e.\@  \hi{!case f x y !of \{\ldots\}}. Therefore, Call Arity determines \hi f's external arity to be 2, and changes the definition to
\begin{hcode}
\> f x y = error "\ldots" y
\end{hcode}

The strictness annotation on \hi f, however, is still present, allowing the simplifier to change the code that contains the call to \hi f to \hi{!case f x !of \{\}}, as passing one argument is enough to cause the exception to be raised. It also removes all alternatives from the \keyword{case}, as the control flow will not return.

On their own, each transformation is correct; together, havoc is created: Due to the eta-expansion, the evaluation of \hi f \hi x does \emph{not} raise an exception. Because the \hi{!case} expression has no alternatives any more, the execution of the final program continues at some other, undefined part of the program.

One way to fix this would be to completely remove annotations that might no longer be true after eta-expanding a function definition, losing the benefit that these annotations provide. The actual fix was more careful and capped the reported arity at the number of arguments with which, according to the strictness signature, the function is definitely bottom.


\subsection{Implementation vs.\@ formalisation}

Clearly I have formalised and verified the algorithm behind Call Arity, but not the implementation. For example, the Isabelle code simply uses \isa{fix} to calculate the least fixedpoint during the analysis of a recursive \hi{!let}, where the implementation has to explicitly iterate the analysis result, starting from bottom and stopping when a fixed point is reached. Termination of this implementation is not handled formally, and neither the correctness of the implementation with regard to its formal description here or in the Isabelle theories.

I currently do not see a practical way to do so: There are a few Haskell-specific formal methods, e.g.\@  refinement types in the form of Liquid Haskell \cite{refinement-types}, but they are not powerful enough to do such a full-fledged correctness proof with regard to a formal specification. Another approach would be to employ Isabelle to verify the implementation: As Haftmann writes \cite{haskell-formalisation}, this requires either a conversion of the Haskell implementation into Isabelle using the tool Haskabelle or to go the other way and using Isabelle’s code generation features \cite{codegen} to produce the implementation from the Isabelle definition. Either direction would require formalising large parts of the existing GHC codebase itself – a daunting prospect.

\medskip

Furthermore, GHC works with Haskell code that is structured in modules and packages; this naturally affects the implementation, which will, for example, not collect arity and co-call information for external identifiers, as they cannot be used anyways (see \cref{sec:interesting-variables}). This implementation short-cut is ignored here.

\subsection{Performance and safety in the larger context}

Call Arity is but one transformation in a large number of analyses and transformations performed by GHC, and the safety result established in this chapter does not immediately carry over to the whole thing. It is quite possible that a subsequent transformation works better on the original code $e$ than on the transformed code $\fexp e 0$, and that this difference outweighs the improvement due to Call Arity. In that sense, the safety theorem is not composable.

The property that I would like to be able to assume about the other transformations is monotonicity: If $e_2$ is better than $e_1$ before the transformation, then the transformed $e_2$ is better than the transformed $e_1$, where “better” refers to the abstract performance measure used – in my case, the number of allocations. Then it would follow from my safety theorem that the insertion of Call Arity in the sequence of transformations will not make the end result perform worse.

In practice, this assumption is certainly “somewhat true”, i.e.\@ holds in common cases, as also shown by the empirical results. But it is unlikely true in a complete and rigorous sense, i.e.\@ I expect that one can construct corner cases where it does not hold.

\medskip

Finally, my formal notion of performance is of course just an approximation for real performance, justified by little more than the empirically observed good correlation between allocations and execution time. Formally capturing the actual runtime of a program on modern hardware with multiple cores, long instruction pipelines, branch prediction and complex caches is currently way out of reach.

\section{Related work}

This work connects arity and cardinality analyses with operational safety properties, using an interactive theorem prover to verify its claims; as such this is a first.

%\medskip

However, this is not the first compiler transformation proven correct in an interactive theorem prover. After all there is CompCert (e.g.\@ \cite{compcert2,compcert}), a complete verified optimising compiler for C implemented in Coq. Furthermore, a verified Java to Java bytecode compiler \cite{jinjathreadscompiler} was written using Isabelle's code generation facilities, and the CakeML project has produced, among other things, a verified compiler from CakeML to CakeML bytecode, implemented in the HOL4 theorem prover \cite{cakeml}.
The Vellvm project provides formal semantics for LLVM’s intermediate representation and uses it for formal verification of compiler transformations
\cite{vellvm}.
These projects address functional correctness, though, and not (yet) performance.
%But compareably little of that kind has happened in the area of pure lazy functional programing languages.

Using a resource aware program logic for a subset of Java bytecode, which they have implemented in Isabelle, Aspinall, Beringer and Mo\-mi\-gli\-ano validate local optimisations \cite{Aspinall} to be indeed optimisations with regard to a variety of resource algebras. Following the precedent set by transformation validation, in this work the safety of the transformation is not proved once and for all, but rather for every conrete program a proof is constructed. The Isabelle formalisations of the proofs seem to be lost.

In the realm of functional programming languages, a number of formal treatments of compiler transformations exist, e.g.\@ verification of the CPS transformation in Coq (e.g.\@ \cite{ltamer,cpscoq}), Twelf (e.g.\ \cite{cpstwelf}) or Isabelle (e.g.\ \cite{cpsisabelle}). As their focus lies on finding proper techniques for handling naming, their semantics do not express heap usage and sharing.

An approach that necessarly produces safe program transformations are generated transformations: Here, a large number of possible program transformations of a specific form are mechanically enumerated and checked if they are both semantics preserving and performance improving; these checks often build on SMT solvers. The selected concrete transformations are then safe by construction. Examples of this line of research include \cite{denali},  \cite{bansal} and \cite{optgen}.

\medskip

Sand's \emph{improvement theory} \cite{Sands:Skye} provides a general, inequational algebra to describe the effect of program transformations on performance. Its notion of improvement is similar to my notion of safety, while the more general notion of weak improvement allows performance regressions up to a constant factor. This theory was adapted for lazy languages, both for improvement of time \cite{Moran:Sands:Need} and space \cite{Sands:Foundation,sands}.

Recently, Hackett and Hutten \cite{wwmif} took up on Sands' work and built a general framework to prove \emph{worker/wrapper transformations} time improving. And while neither that nor Sands's work have yet been machine-checked, at least the semantic correctness of Hutton's worker/wrapper framework has been verified using Isabelle \cite{WorkerWrapper-AFP}.\goodbreak

Could I have built my results on theirs, especially as \cite{wwmif} uses almost the same abstract machine?  Indeed, eta-expansion can be phrased as an instance of the worker/wrapper transformation, with abstraction and representation contexts $\mathsf{Abs} = []$ and $\mathsf{Rep} = (\sLam {z_1 \ldots z_n} {(\sApp{\sApp{[]}{z_1}\ldots} {z_n}}))$. Unfortunately, the assumptions of the worker/wrapper improvement theorem are not satisfied, and this is to be expected: Sands' notion of improvement -- and hence Hackett and Hutton's theorems -- guarantee improvement in \emph{all} contexts, while in my case the eta-expansion is justified by an analysis of the actual context, and is generally unsafe in other contexts.

So in the current form, improvement theory is tailored to local transformations and, as Sands points out in \cite{sands}, would require the introduction of context information to apply to whole-program transformations such as Call Arity. Such a grand unified improvement theory for call-by-need would be a tremendously useful thing to have.


%Nothing in that particular direction has been carried out in a machine-checked manner. We hope that this work, together with our formalisation of Launchbury's natural semantics \cite{launchbury-isa}, will enable more use of interactive theorem proving in this field.