Swirly Mein Kopf

Sunday, February 26. 2012

GHC 7.4.1 speeds up arbtt by a factor of 22

Haskell

More than two years ago I wrote arbtt, a tool that silently records what programs you are using and allows you to do statistics on that data later, based on rules that you define afterwards, hence the name automatic rule based time tracker. I wasn’t doing much with it recently (the last release has been half a year ago), but it nevertheless was running on my machine and by now has tracked a total time span of 248 days in 350000 records.

Yesterday, I had a use for it again: measuring the time spent creating a certain document with LaTeX. So I added a rule to my categorize.cfg and ran arbtt-stats. I knew that it was not very fast, and that my data set has grown considerably since I last used it. And indeed, it took more than 6 minutes to process the data and spit out the result.

Since I’m currently working on the GHC 7.4.1 transition in Debian anyways, I decided to check what happens if I compile the code with that version of the Haskell compiler, instead of the previous version 7.0.4. And behold: The whole process took merely 17.3 seconds to complete! At first I did not believe it, but the result was identical, both binaries were built with the same option, i.e. no profiling enabled or anything like that. Wouldn’t you also like to have such speed ups for free, just by waiting for someone else to improve their work?

I tried to find out the reason for the speed up and created profiling output from both the old and the new binary. The old binary spends 83% of the time in Categorize.checkRegex, which basically just call Text.Regex.PCRE.Light.match. Since the version of pcre-light is the same in both binaries, I conclude that the Foreign Function Interface that GHC provides to interact with C libraries (libpcre in this case) is much faster now, although I do not find any mention in the release notes. And even if I do not count the 83% time spent in checkRegex, the code from the new compiler is still 2.7 times faster. Thanks, GHC devs, great work!

Trackbacks


No Trackbacks

Comments

Display comments as (Linear | Threaded)

*Thank you for getting fresh GHCs into Debian! Will give arbtt a try.
#1 Astro (Homepage) on 2012-02-26 19:35 (Reply)
*Compiling ghc is a PITA though.

First I have to compile ghc6 6.8 with ghc6 6.6 (started that about two days ago), then ghc6 6.12 with that, then I hopefully can use that to build ghc 7.4… and hscolour, which has an indirect B-D on itself (luckily, the version in m68k is barely the minimum needed to satisfy it).

hugs98 built quicker, but doesn’t seem to be able to be used for building ghc…

(Not that I even speak Haskell, but it features prominently in Debian recently, so I figured I better try to have it keep up.)
#2 mirabilos (Homepage) on 2012-02-27 10:32 (Reply)
*You can bootstrap ghc without hscolour, just the docs will be less useful.

Also when bootstrapping happy and alex (I think), you’ll find that the upstream tarball contains the generated files required to bootstrap them, but be careful: debian/rules clean removes them. Send d-haskell a mail if you need help.
#2.1 Joachim Breitner (Homepage) on 2012-02-27 11:40 (Reply)
*Last september I reported a strange phenomenon with GHC, i.e. my parser combinator library actually ran faster when profiling was switched on.

Simon Marlow looked into the issue and found that 99.5% of the time was spent in the garbage collector, which he subsequently changed. It had a tremendous speedup (~35), and i would not be surprised when you are profiting from the same change.

See:

http://hackage.haskell.org/trac/ghc/ticket/5505
#3 Doaitse Swierstra on 2012-02-27 18:23 (Reply)
*Interesting case, but unlikely the cause here. With the old code, 25% of the time was spent in the GC code, in the new code it was 36%. Also, the memory statistics were comparable.
#3.1 Joachim Breitner (Homepage) on 2012-02-27 19:15 (Reply)

Add Comment



To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Gravatar, Favatar, Identica author images supported.
What is the first name of the owner of this blog? / Wie heißt der Betreiber diess Blogs mit Vornamen?
 
 
Nach oben