Joachim Breitner

GHC 7.4.1 speeds up arbtt by a factor of 22

Published 2012-02-26 in sections English, Haskell.

More than two years ago I wrote arbtt, a tool that silently records what programs you are using and allows you to do statistics on that data later, based on rules that you define afterwards, hence the name automatic rule based time tracker. I wasn’t doing much with it recently (the last release has been half a year ago), but it nevertheless was running on my machine and by now has tracked a total time span of 248 days in 350000 records.

Yesterday, I had a use for it again: measuring the time spent creating a certain document with LaTeX. So I added a rule to my categorize.cfg and ran arbtt-stats. I knew that it was not very fast, and that my data set has grown considerably since I last used it. And indeed, it took more than 6 minutes to process the data and spit out the result.

Since I’m currently working on the GHC 7.4.1 transition in Debian anyways, I decided to check what happens if I compile the code with that version of the Haskell compiler, instead of the previous version 7.0.4. And behold: The whole process took merely 17.3 seconds to complete! At first I did not believe it, but the result was identical, both binaries were built with the same option, i.e. no profiling enabled or anything like that. Wouldn’t you also like to have such speed ups for free, just by waiting for someone else to improve their work?

I tried to find out the reason for the speed up and created profiling output from both the old and the new binary. The old binary spends 83% of the time in Categorize.checkRegex, which basically just call Text.Regex.PCRE.Light.match. Since the version of pcre-light is the same in both binaries, I conclude that the Foreign Function Interface that GHC provides to interact with C libraries (libpcre in this case) is much faster now, although I do not find any mention in the release notes. And even if I do not count the 83% time spent in checkRegex, the code from the new compiler is still 2.7 times faster. Thanks, GHC devs, great work!

Comments

Thank you for getting fresh GHCs into Debian! Will give arbtt a try.
#1 Astro (Homepage) am 2012-02-26
Compiling ghc is a PITA though.

First I have to compile ghc6 6.8 with ghc6 6.6 (started that about two days ago), then ghc6 6.12 with that, then I hopefully can use that to build ghc 7.4… and hscolour, which has an indirect B-D on itself (luckily, the version in m68k is barely the minimum needed to satisfy it).

hugs98 built quicker, but doesn’t seem to be able to be used for building ghc…

(Not that I even speak Haskell, but it features prominently in Debian recently, so I figured I better try to have it keep up.)
#2 mirabilos (Homepage) am 2012-02-27
You can bootstrap ghc without hscolour, just the docs will be less useful.

Also when bootstrapping happy and alex (I think), you’ll find that the upstream tarball contains the generated files required to bootstrap them, but be careful: debian/rules clean removes them. Send d-haskell a mail if you need help.
#3 Joachim Breitner (Homepage) am 2012-02-27
Last september I reported a strange phenomenon with GHC, i.e. my parser combinator library actually ran faster when profiling was switched on.

Simon Marlow looked into the issue and found that 99.5% of the time was spent in the garbage collector, which he subsequently changed. It had a tremendous speedup (~35), and i would not be surprised when you are profiting from the same change.

See:

http://hackage.haskell.org/trac/ghc/ticket/5505
#4 Doaitse Swierstra am 2012-02-27
Interesting case, but unlikely the cause here. With the old code, 25% of the time was spent in the GC code, in the new code it was 36%. Also, the memory statistics were comparable.
#5 Joachim Breitner (Homepage) am 2012-02-27

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.