Swirly Mein Kopf

Sunday, February 28. 2010

Exploiting sharing in arbtt

Haskell

My automatic rule-based time tracker (arbtt), which is written in Haskell, collects every minute a data sample consisting mainly of the list of currently open windows (window title and program name). Naturally, this log grows rather large. Since October of last year, I collected 70,000 samples. I already went from a text-based file format to a binary format using Data.Binary, which gave a big performance boost.

But by now, I was afraid that this is not enough. My log file is now 30MB large. Looking at the memory graph of gnome-panel, it is taking up more than half of my memory. When running arbtt-stats, the Haskell run time system reports 569 MB total memory in use and the command finishes after 28.5 seconds.

Naturally, the log file is highly redundant: Compressing it with bzip2 shrinks it to 1.6MB. But as I would like to preserve the ability to just append samples at the end, without having to read the file, I chose not just to add bzip2 or gzip compression. Rather, I am now exploiting a very obvious redundancy: Two adjacent samples usually list exactly the same windows, and a focus change only changes a flag. So now, when storing a string that is part of a sample, it will check if this string was already present in the previous sample and, in this case, just store the number of that string (one byte). Only if the string was not present it will write a zero byte and then the string. When reading the sample, the process is reversed.

This greatly reduces the file size: It is down to 6.2MB. It also improves the memory consumption, due to Haskell’s abilities with regard to sharing: When a reference to a string in a previous sample is read, then only one instance of this string is in memory, even if it occurs several times in the log. This brings the memory consumption down to 264 MB and the runtime to 17 seconds.

I released the changes as version 0.4.5.1 to Hackage, Debian and as a Windows installer. The log file is not automatically converted, but new samples will be written in the compressed format. If you want to convert your whole file, you have to stop arbtt-capture, run arbtt-recover, and then move the hopefully noticeable smaller ~/.arbtt/capture.log.recovered  to ~/.arbtt/capture.log.

The required code changes were not too big. I somewhat isolated the relevant code in the Data.Binary.StringRef module. Unfortunately, I have to use OverlappingInstances to be able to provide the special instance for String – is there a cleaner way (besides the trick used for the Show class)?

Thursday, February 4. 2010

FontForge-Article in the German Linux-Magazin

Digital World

Yesterday, I found the 3/10-issue of the German “Linux-Magazin” in my mailbox. (I don’t dare to call it the March issue – they are a bit off schedule...) On page 62, you can find my 3½ page article about creating a symbol font with FontForge. I briefly covered the topic on my blog and later thought that it would made a nice article, even though I’m not an expert on this area. The article will be freely available in about three years.This is already my third publication, after my article on the Cross-Site-Authentication attack that was published in the same magazine (circulation ~63.000) and in its international counterpart in 2005 and my recent article in the “freeX” magazine (circulation ~15.000). Looks like I’ll have to add a  “Publications” section to my website soon...

(Page 1 of 1, totaling 2 entries)
Nach oben