Joachim Breitner

Faster Winter: Statistics (the making-of)

Published 2019-11-24 in sections English, Haskell.

(This is an appendix to the “faster winter” series, please see that post for background information.)

Did you like the graph and the stats that I produced? Just for completeness, I am including the various scripts I used. Nothing super exciting to see here, but maybe someone finds this useful.

This little shell one-liner collects the run-time statistics for each commit in the interesting range (line-wrapped for your convenience):

for h in $(git log 1cea7652f48fad348af914cb6a56b39f8dd99c6a^..5406efd9e057aebdcf94d14b1bc6b5469454faf3 --format=%H)
do
  echo -n "$h"
  git checkout -q "$h"
  cabal new-build -v0
  echo -n ":"
  rm -f stats/$h.txt
  for i in $(seq 1 5)
  do
    cabal -v0 new-run exe:wasm-invoke -- -w loop.wasm  -f canister_init +RTS -t >/dev/null 2>> stats/$h.txt
    echo -n .
  done
  echo
done

A small Perl script takes the minimum for each measurement across the five runs, and produces a CSV file:

#!/usr/bin/perl

use List::Util qw(min);

my @alloc;
my @in_use;
my @time;

while (<>) {
  m!<<ghc: (\d+) bytes, \d+ GCs, \d+/\d+ avg/max bytes residency \(\d+ samples\), (\d+)M in use, [\d.]+ INIT \(([\d.]+) elapsed\), [\d.]+ MUT \(([\d.]+) elapsed\), [\d.]+ GC \(([\d.]+) elapsed\) :ghc>>! or die $!;
  push @alloc, 0+$1;
  push @in_use, $2;
  push @time, $3+$4+$5;
}

printf "%d;%d;%f\n", min(@alloc), min(@in_use), min(@time);

To create a full file for all the commits in the range that have files, I used this bash one-liner (again line-wrapped for your convenience):

echo 'commit;allocations;memory;time' > stats.csv
for h in $(git log 1cea7652f48fad348af914cb6a56b39f8dd99c6a^..5406efd9e057aebdcf94d14b1bc6b5469454faf3 --format=%H|tac)
do
  git log -n 1 --oneline $h
  test -f stats/$h.txt && echo "$(echo $h|cut -c-7);$(./tally.pl < stats/$h.txt)" | tee -a stats.csv
done

The stats can be turned into the graphc using pgfplots by compiling this LaTeX file:

\documentclass[class=minimal]{standalone}
\usepackage{mathpazo}
\usepackage{pgfplots}
\definecolor{skyblue1}{rgb}{0.447,0.624,0.812}
\definecolor{scarletred1}{rgb}{0.937,0.161,0.161}
\pgfplotsset{width=12cm,compat=newest}

% From https://tex.stackexchange.com/a/63340/15107
\makeatletter
\pgfplotsset{
    /pgfplots/flexible xticklabels from table/.code n args={3}{%
        \pgfplotstableread[#3]{#1}\coordinate@table
        \pgfplotstablegetcolumn{#2}\of{\coordinate@table}\to\pgfplots@xticklabels
        \let\pgfplots@xticklabel=\pgfplots@user@ticklabel@list@x
    }
}
\makeatother

\begin{document}
\begin{tikzpicture}

\pgfplotsset{every axis/.style={ymin=0}}
\begin{semilogyaxis}[
  skyblue1,
  scale only axis,
  axis y line*=left,
  ylabel=Allocation (bytes),
  flexible xticklabels from table={stats.csv}{[index]0}{col sep=semicolon},
  xticklabel style={rotate=90, anchor=east, text height=1.5ex, font=\ttfamily, color=black},
  xtick=data,
  ]
\addplot[const plot mark mid, color=skyblue1]
  table [x expr=\coordindex+1, y index=1, col sep=semicolon] {stats.csv};
\end{semilogyaxis}

\begin{semilogyaxis}[
  green,
  scale only axis,
  axis y line*=right,
  ylabel=Memory (MB),
  x tick style={draw=none},
  xtick=\empty,
  ]
\addplot[const plot mark mid, color=green]
  table [x expr=\coordindex+1, y index=2, col sep=semicolon] {stats.csv};
\end{semilogyaxis}


\begin{semilogyaxis}[
  red,
  scale only axis,
  axis y line*=right,
  ylabel=Time (seconds),
  x tick style={draw=none},
  xtick=\empty,
  ]
\pgfplotsset{every outer y axis line/.style={xshift=2cm}, every tick/.style={xshift=2cm}, every y tick label/.style={xshift=2cm} }
\addplot[const plot mark mid, color=red]
  table [x expr=\coordindex+1, y index=3, col sep=semicolon] {stats.csv};
\end{semilogyaxis}
\end{tikzpicture}
\end{document}

And finally this Perl script allows me to paste any two lines from the CSV file and produces appropriate Markdown for the “improvement” lines in my posts:

#!/usr/bin/perl

my $first = 1;

my $commit;
my $alloc;
my $in_use;
my $time;

while (<>) {
  /(.*);(.*);(.*);(.*)/ or die;
  unless ($first) {
    printf "**Improvement**: Allocations: %+.2f%%  Memory: %+.2f%%  Time: %+.2f%% (Commit [%s...%s](http://github.com/dfinity/winter/compare/%s...%s))\n",
      (100 * ($2/$alloc - 1)),
      (100 * ($3/$in_use - 1)),
      (100 * ($4/$time - 1)),
      $commit,
      $1,
      $commit,
      $1;
  }
  $first = 0;
  $commit = $1;
  $alloc = $2;
  $in_use = $3;
  $time = $4;
}

Comments

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.