Swirly Mein Kopf

Sunday, February 28. 2010

Exploiting sharing in arbtt

Haskell

My automatic rule-based time tracker (arbtt), which is written in Haskell, collects every minute a data sample consisting mainly of the list of currently open windows (window title and program name). Naturally, this log grows rather large. Since October of last year, I collected 70,000 samples. I already went from a text-based file format to a binary format using Data.Binary, which gave a big performance boost.

But by now, I was afraid that this is not enough. My log file is now 30MB large. Looking at the memory graph of gnome-panel, it is taking up more than half of my memory. When running arbtt-stats, the Haskell run time system reports 569 MB total memory in use and the command finishes after 28.5 seconds.

Naturally, the log file is highly redundant: Compressing it with bzip2 shrinks it to 1.6MB. But as I would like to preserve the ability to just append samples at the end, without having to read the file, I chose not just to add bzip2 or gzip compression. Rather, I am now exploiting a very obvious redundancy: Two adjacent samples usually list exactly the same windows, and a focus change only changes a flag. So now, when storing a string that is part of a sample, it will check if this string was already present in the previous sample and, in this case, just store the number of that string (one byte). Only if the string was not present it will write a zero byte and then the string. When reading the sample, the process is reversed.

This greatly reduces the file size: It is down to 6.2MB. It also improves the memory consumption, due to Haskell’s abilities with regard to sharing: When a reference to a string in a previous sample is read, then only one instance of this string is in memory, even if it occurs several times in the log. This brings the memory consumption down to 264 MB and the runtime to 17 seconds.

I released the changes as version 0.4.5.1 to Hackage, Debian and as a Windows installer. The log file is not automatically converted, but new samples will be written in the compressed format. If you want to convert your whole file, you have to stop arbtt-capture, run arbtt-recover, and then move the hopefully noticeable smaller ~/.arbtt/capture.log.recovered  to ~/.arbtt/capture.log.

The required code changes were not too big. I somewhat isolated the relevant code in the Data.Binary.StringRef module. Unfortunately, I have to use OverlappingInstances to be able to provide the special instance for String – is there a cleaner way (besides the trick used for the Show class)?

Thursday, February 11. 2010

Diploma Thesis Finished

Mathe

Earlier today, I went to a local copy shop and had my diploma thesis printed. This afternoon, I will hand it in. The title is “Loop subgroups of Fr and the images of their stabilizer subgroups in GLr(ℤ)” and discusses a group-theoretical result. I assume that very few readers care about the content of the thesis, but maybe some are interested in a few assorted LaTeX hints. I’m also publishing the full TeX source code, maybe someone can make use of it.

Less chatty varioref

I’m using the varioref package, in conjunction with the cleveref package. This provides a command \vref{fig:S3S4l} which will expand to, for example, to “Figure 1 on page 11”. But if the referenced figure is actually on the current page, the next page, the previous page or the facing page (in two-side layouts), it will say so: “Figure 1 on this page.”

This is very nice, but I assume that the reader of my thesis is able to find Figure 1 when it is visible, i.e. on the current or facing page. One can remove the referencing texts with the commands \def\reftextfaceafter{}, \def\reftextfacebefore{} and \def\reftextcurrent{}. But because varioref puts a space between “Figure 1” and this text, we will get a superfluous space – even before punctuation.

The remedy is the command \unskip, which removes this space again. So I use in my preamble:

\def\reftextfaceafter {\unskip}%
\def\reftextfacebefore{\unskip}%
\def\reftextcurrent   {\unskip}%

Palatino and extra leading

I chose the Palatino font for my thesis, using the mathpazo package. Various sources (such as the KOMA-Script manual) suggest to use 5% extra leading:

\linespread{1.05}

Counting figures independently from chapters

I don’t have too many figures and tables in my thesis, and I want them to be numbered simple 1, 2, ... By default, LaTeX would say 1.1, 1.2, 2.1, ... This can be fixed using the remreset package and these commands:

\makeatletter
\@removefromreset{figure}{chapter}
\renewcommand{\thefigure}{\arabic{figure}}
\@removefromreset{table}{chapter}
\renewcommand{\thetable}{\arabic{table}}
\makeatother

No widows and club lines

LaTeX already avoids these, but I wanted to get rid of them completely. This can be done with:

% Disable single lines at the start of a paragraph (Schusterjungen)
\clubpenalty = 10000
% Disable single lines at the end of a paragraph (Hurenkinder)
\widowpenalty = 10000 \displaywidowpenalty = 1000

Struck table lines

I had to typeset tables with some lines struck, and I could not find a ready command for that. I used the following definition, based on the code for \hline. Note that it probably does not adjust well to other font sizes and needs to be adjusted manually:

\makeatletter
\def\stline{%
  \noalign{\vskip-.7em\vskip-\arrayrulewidth\hrule \@height \arrayrulewidth\vskip.7em}}
\makeatother

Title page in one-sided layout

According to the KOMA manual, the title page as set by LaTeX is not meant to be the cover of a publication, and therefore has to be set with the margins of a right page – i.e. a larger right margin and a smaller left margin. But when printing cheaply, one often just put a transparent sheet on top of the print, so the title page is the cover. You can convince KOMA that you are right by using

\KOMAoptions{twoside=false}
\begin{titlepage}
...
\end{titlepage}
\KOMAoptions{twoside=true}

Not flushing the page for chapter heads

LaTeX would put the list of algorithms on a new right page. I found this a waste of paper for my few algorithms, and preferred to put the list right after the table of contents. You can override the LaTeX behavior using:

\tableofcontents
{
\let\cleardoublepage\relax  % book
\let\clearpage\relax        % report
\let\chapter\section
\listofalgorithms
}

This code also reduces the size of the heading to that of a section. The same trick also works with \chapter.

Math in headings vs. PDF bookmarks

LaTeX with the hyperref package creates nice PDF bookmarks from your chapter and section titles. Unfortunately, PDF bookmark names can only be plain strings, while the titles in the document might contain some math symbols. You can make both happy with \texorpdfstring:

\section{Stabilizer subgroups in \texorpdfstring{$\GL_r(\Z/2\Z)$}{GL\_r(Z/2Z)}}

Setting lines for the signature

The diploma thesis contains a small note which I have to sign, saying that I created it on my own etc. Below that, I put two labeled lines for date and signature, using the tabbing environment:

\begin{tabbing}
\rule{4cm}{.4pt}\hspace{1cm} \= \rule{7cm}{.4pt} \\
Ort, Datum \> Unterschrift
\end{tabbing}

Sunday, February 7. 2010

If I were a caricaturist

Politik

I’d draw a caricature involving a Toyota car, representing capitalism, with a stuck gas petal and Barack Obama trying to fix it. But as I cannot draw very well, especially recognizable people, I created this collage:

The photo of Obama was created by Beth Rankin.

Thursday, February 4. 2010

FontForge-Article in the German Linux-Magazin

Digital World

Yesterday, I found the 3/10-issue of the German “Linux-Magazin” in my mailbox. (I don’t dare to call it the March issue – they are a bit off schedule...) On page 62, you can find my 3½ page article about creating a symbol font with FontForge. I briefly covered the topic on my blog and later thought that it would made a nice article, even though I’m not an expert on this area. The article will be freely available in about three years.This is already my third publication, after my article on the Cross-Site-Authentication attack that was published in the same magazine (circulation ~63.000) and in its international counterpart in 2005 and my recent article in the “freeX” magazine (circulation ~15.000). Looks like I’ll have to add a  “Publications” section to my website soon...

(Page 1 of 1, totaling 4 entries)
Nach oben