Joachim Breitner

Faster Winter: Statistics (the making-of)

Published 2019-11-24 in sections English, Haskell, Internet Computer.

(This is an appendix to the “faster winter” series, please see that post for background information.)

Did you like the graph and the stats that I produced? Just for completeness, I am including the various scripts I used. Nothing super exciting to see here, but maybe someone finds this useful.

This little shell one-liner collects the run-time statistics for each commit in the interesting range (line-wrapped for your convenience):

for h in $(git log 1cea7652f48fad348af914cb6a56b39f8dd99c6a^..5406efd9e057aebdcf94d14b1bc6b5469454faf3 --format=%H)
  echo -n "$h"
  git checkout -q "$h"
  cabal new-build -v0
  echo -n ":"
  rm -f stats/$h.txt
  for i in $(seq 1 5)
    cabal -v0 new-run exe:wasm-invoke -- -w loop.wasm  -f canister_init +RTS -t >/dev/null 2>> stats/$h.txt
    echo -n .

A small Perl script takes the minimum for each measurement across the five runs, and produces a CSV file:


use List::Util qw(min);

my @alloc;
my @in_use;
my @time;

while (<>) {
  m!<<ghc: (\d+) bytes, \d+ GCs, \d+/\d+ avg/max bytes residency \(\d+ samples\), (\d+)M in use, [\d.]+ INIT \(([\d.]+) elapsed\), [\d.]+ MUT \(([\d.]+) elapsed\), [\d.]+ GC \(([\d.]+) elapsed\) :ghc>>! or die ;
  push @alloc, 0+;
  push @in_use, ;
  push @time, ++;

printf "%d;%d;%f\n", min(@alloc), min(@in_use), min(@time);

To create a full file for all the commits in the range that have files, I used this bash one-liner (again line-wrapped for your convenience):

echo 'commit;allocations;memory;time' > stats.csv
for h in $(git log 1cea7652f48fad348af914cb6a56b39f8dd99c6a^..5406efd9e057aebdcf94d14b1bc6b5469454faf3 --format=%H|tac)
  git log -n 1 --oneline $h
  test -f stats/$h.txt && echo "$(echo $h|cut -c-7);$(./ < stats/$h.txt)" | tee -a stats.csv

The stats can be turned into the graphc using pgfplots by compiling this LaTeX file:


% From
    /pgfplots/flexible xticklabels from table/.code n args={3}{%


\pgfplotsset{every axis/.style={ymin=0}}
  scale only axis,
  axis y line*=left,
  ylabel=Allocation (bytes),
  flexible xticklabels from table={stats.csv}{[index]0}{col sep=semicolon},
  xticklabel style={rotate=90, anchor=east, text height=1.5ex, font=\ttfamily, color=black},
\addplot[const plot mark mid, color=skyblue1]
  table [x expr=\coordindex+1, y index=1, col sep=semicolon] {stats.csv};

  scale only axis,
  axis y line*=right,
  ylabel=Memory (MB),
  x tick style={draw=none},
\addplot[const plot mark mid, color=green]
  table [x expr=\coordindex+1, y index=2, col sep=semicolon] {stats.csv};

  scale only axis,
  axis y line*=right,
  ylabel=Time (seconds),
  x tick style={draw=none},
\pgfplotsset{every outer y axis line/.style={xshift=2cm}, every tick/.style={xshift=2cm}, every y tick label/.style={xshift=2cm} }
\addplot[const plot mark mid, color=red]
  table [x expr=\coordindex+1, y index=3, col sep=semicolon] {stats.csv};

And finally this Perl script allows me to paste any two lines from the CSV file and produces appropriate Markdown for the “improvement” lines in my posts:


my $first = 1;

my $commit;
my $alloc;
my $in_use;
my $time;

while (<>) {
  /(.*);(.*);(.*);(.*)/ or die;
  unless ($first) {
    printf "**Improvement**: Allocations: .2f%%  Memory: .2f%%  Time: .2f%% (Commit [%s...%s](\n",
      (100 * (/$alloc - 1)),
      (100 * (/$in_use - 1)),
      (100 * (/$time - 1)),
  $first = 0;
  $commit = ;
  $alloc = ;
  $in_use = ;
  $time = ;


