Swirly Mein Kopf

Saturday, May 11. 2013

How to play Rock-Paper-Scissors online?

Digital World

There was an interesting question by ‘Fool’ recently on the StackExchange site for Boardgames: How do you play a game like Rock-Paper-Scissors with friends online? Or any other game where players have to simultaneously submit their moves (e.g. Diplomacy, or Rock-Paper-Scissors-Lizzard-Spock), which, as I just learned, are simultaneous action selection games. While there are websites dedicated to playing specific games, such as webdiplomacy.net, we could not find a generic one that you can use if you, for example, invent your own variants of a game.

So I created one: At you-say-first.nomeata.de you can enter rooms and share the URL with your friends. On the one hand, you have a regular chat room there. But there is also the possibility to enter moves (whatever a move may be to you) and only when all players have done that and marked the move as final, it is shown to everyone. If you want to try it out: There is an integrated, not very fancy Rock-Paper-Scissors-playing bot. Just enter a room, join and say „I want to play!“

Note that this site can be used for more than just for games. Have you ever observed that persons would often want other to express their preference (e.g. where to dine) first to not reveal their own preference, so that they can (or pretend to) change their mind if they would contradict? In such situations simultaneous action selection can be a fairer method.

A technical note: I created this web app using meteor (a JavaScript framework building on node.js and MongoDB that allows for reactive programming), and it is also hosted on meteor.com. I chose Meteor after someone mentioned Firebase to me, which looked very slick, but was not Free Software, so I looked for alternatives. I did not do any cross-browser-testing, and the UI design could be improved, so if you want to help out (or just complain), please use the GitHub code repository and issue tracker.

Wednesday, April 24. 2013

The carbondioxide footprint of Debian's Haskell packages

Debian Haskell

By now, Debian ships quite a lot of Haskell packages (~600). Because of GHC's ABI volatility, whenever we upload a new version of a library, we have to rebuild all libraries that depend on that. In particular, if we upload a new version of the compiler itself, we have to rebuild all Haskell library packages. So we have to rebuild stuff a lot. Luckily, Debian has a decent autobuilding setup so that I just need to tell it what to rebuild, and the rest happens automatically (including figuring out the actual order to build things).

I was curious how much we use the buildd system compared to other packages, and also how long the builders are busy building Haskell packages. All the data is in a postgresql database on buildd.debian.org, so with some python and javascript code, I can visualize this. The graphs show the number of all uploads by autobuilder on the amd64 architecture, with haskell uploads specially marked, and the second graph does the same for the build time. You can select time ranges and get aggregate statistics for that time span.

During the last four days a complete rebuild was happening, due to the upload of GHC 7.6.3. During these 2 days and 18 hours building 537 packages took 48 hours of build time and produced 15kg of CO2. That is 94% of all uploads and 91% the total build time. The numbers are lower for the whole of last year: 52% of uploads, 31% of build time and 57kg of CO2. (The CO2 numbers are very rough estimates.)

Note that amd64 is a bit special, as most packages are uploaded on this architecture by the developers, so no automatic builds are happening. On other architectures have, every upload of a (arch:any) package is built, so the share of Haskell packages will be lower. Unfortunately, at the moment the database does not provide me with a table across all architectures (and I was too lazy to make it configurable yet).

Wednesday, February 6. 2013

Evaluation-State Assertions in Haskell

Haskell

I have just uploaded a new version of ghc-heap-view to Hackage that provides “Evaluation state assertions” in the module GHC.AssertNF.

Imagine you are writing a web application in Haskell that sports a global number-of-visitors counter in an IORef Int. For every request, you call modifyIORef (+1). Eventually, you notice your very popular web site to hog more and more memory. So you browse to the internal page that shows the counter, and you have to wait for a long time until you eventually see the result (or get a stack overflow). The reason: The applications of  (+1) were not performed until you looked at the number; instead, a long chain of such computation first filled your heap and then your stack.

So you have learned the hard way that you might want to avoid space leaks, and want calculations to be done during the request that caused them, and want the IORef to always contain fully evaluated data. So you stumble about modifyIORef' in Data.IORef and indeed, this fixes your problem.

Later, you notice that you want to count POST and GET requests separately. You change the type to IORef (Int, Int) and call modifyIORef' (first (+1)) or modifyIORef' (second (+1)). And suddenly, the space leak is back (which you only notice after the next push to the real site, because your local tests never caused enough requests to make it noticeable). So you not only want to fix it, you also want to ensure that it does not break again.

In other words, you want to ensure the policy that values stored in an IORef are always in normal form. You achieve this with the following alternative to modifyIORef':

modifyIORef'Assert :: IORef a -> (a -> a) -> IO ()
modifyIORef'Assert ref f = do
    x <- readIORef ref
    let x' = f x
    x' `seq` return ()
    assertNF x'
    writeIORef ref x'

Using this instead of modifyIORef' will print this warning to standard error output right the first time you call modifyIORef'Assert (first (+1)):

Parameter not in normal form: 2 thunks found:
let x1 = (S# 1,S# 1)
in _bh (_thunk x1 (_bco (S# 1)),_sel x1)

(Otherwise, the program runs as usual.) So obviously, you need to use a strict variant of first (or strict pairs):

first' :: (a -> b) -> (a, c) -> (b, c)
first' f (x,y) = let { x' = f x; r = (x', y) } in x' `seq` r `seq` r

With this, the warning goes away. Whenever you now change the type of the IORef or modify it in a too-lazy-way, you can be sure that you’ll be warned about it, before the space leak itself becomes noticeable.

In the production code, you might want to disable the check. For that, simply put disableAssertNF somewhere in your main function.

Why is this better than just calling deepseq in modifyIORef'Assert? Because this way, the code still creates unwanted thunks that are then evaluated before storing them in the IORef, whereas with assertNF you are told about the thunks and can prevent them from being created in the first place. Also, assertNF does not add a type class constraint.

This is just one example application for assertNF (and its variants assertNFNamed, which includes a name in the warning to better spot the cause, and $assertNFHere, which uses Template Haskell to include the current source code position in the warning), and I hope that there are more. If you happen to make use of it, I’d like to hear your story.

Thursday, January 31. 2013

Going to FOSDEM after all

Debian

Earlier this week, things were looking different, and I did have to cancel a Haskell talk in Freiburg on Monday due to illness. But I’m back on track and will be travelling to Brussels tomorrow.

I’ll be holding a talk on how we package Haskell in Debian, on Suday at 15:30. I hope it will be useful to Debian users (who will better understand the packaging), other Debian Developers (who’ll learn about the peculiarities of Haskell and the implications for the Debian infrastructure), other distro’s maintainers (to compare best practices) and Haskell developers (to learn about the needs and worries of downstream packages). The talk will be based on my DebConf 11 talk on the same topic. I’m also happy to answer questions about Haskell, Haskell in Debian or any other topic that you want to hear my opinion about, so just talk to me during FOSDEM.

In related news: GHC 7.6.2 was uploaded to Debian experimental the day it was released; the rebuilding of all libraries is still in progress (~370 of ~570 done).

Thursday, January 10. 2013

“Haskell Bytes” again

Haskell

Janis Voigtländer has invited me again to give a lecture in his Advanced Functional Programming course at the University of Bonn. Like last time, when I talked about performance analysis in Haskell, I chose a topic that is more on implementationishy side to contrast the theoretic content of the rest of the course. I recycled my talk on the memory representation of GHC that I have held at last year’s Meta Main-Rhein Chaos Days. As usual I provide a transcript of the talk as intended (not as held). It is slightly revised over the old version and I have added a section on unboxed values in constructors, which I have nevertheless skipped today.

Thursday, December 20. 2012

GHCi integration for GHC.HeapView

Digital World

Given the very positive feedback for Dennis Felsing’s tool ghc-vis, which visualizes the heap representation of a Haskell value, including all the gory details such as thunks, values retained by thunks indirections, sharing etc, I saw the need to provide this information also directly in GHCi, without having to load a graphics library or opening extra libraries. So I added the required features (traversing the heap and pretty-printing the results) to my ghc-heap-view package and, also following ghc-vis’s lead, added a ghci file that, when loaded, provides you with a :printHeap command. Here you can see it in action:

A plain value

Prelude> :script /home/jojo/.cabal/share/ghc-heap-view-0.4.0.0/ghci
Prelude> let x = [1..10]
Prelude> x
[1,2,3,4,5,6,7,8,9,10]
Prelude> :printHeap x
_bh [S# 1,S# 2,S# 3,S# 4,S# 5,S# 6,S# 7,S# 8,S# 9,S# 10]

Note that the tools shows us that the list is a list of S# constructors, and also that it is still hidden behind a blackhole. After running System.Mem.performGC, this disappears.

A thunk

Prelude> let x = Just (1 + 1)
Prelude> :printHeap x
Just _bco
Prelude> x
Just 2
Prelude> System.Mem.performGC
Prelude> :printHeap x
Just (S# 2)

Here, we see how the calculation was deferred until forced by showing the value of x. The name _bco stands for a bytecode object as used by the interpreter. Getting useful information from them is a bit harder than for compiled thunks, so for more accurate results put the code in a Haskell source file, compile it and use the GHC.HeapView API to print the interesting parts.

A partial application

Prelude> let a = "hi"
Prelude> let partial = (a ++)
Prelude> partial ""
"hi"
Prelude> System.Mem.performGC
Prelude> let x = (a, partial)
Prelude> :printHeap x
let x1 = "hi"
in (x1,_fun x1)

The information which function is called there (++ in this case) is lost at runtime, but we still see that the second element of the tuple is a partial application of some value to the first element.

A cyclic structure

Prelude> let s = "ho"
Prelude> let x = cycle s
Prelude> length (take 100 (show x))
100
Prelude> System.Mem.performGC
Prelude> :printHeap x
let x0 = C# 'h' : C# 'o' : x0
in x0
Prelude> let y = map Data.Char.toUpper x
Prelude> length (take 100 (show y))
100
Prelude> :printHeap y
C# 'H' : C# 'O' : C# 'H' : C# 'O' : C# 'H' : C# 'O' : C# 'H' : C# 'O' : C# 'H' : C# 'O' : _bh (C# 'H' : C# 'O' : C# 'H' : C# 'O' : C# 'H' : C# 'O' : C# 'H' : C# 'O' : ... : ...)

The cyclic, tying-the-knot structure of cycle is very visible. But can also see how easily it is broken, in this case by mapping a function over the list.

Mutual recursion

Prelude> let {x = 'H' : y ; y = 'o' : x }
Prelude> length (show (take 10 x, take 10 y)) `seq` return ()
Prelude> System.Mem.performGC
Prelude> :printHeap (x,y)
let x1 = C# 'H' : x3
    x3 = C# 'o' : x1
in (x1,x3)

If you want to look at multiple variables at once, just pass a tuple to printHeap

In the hope that this will be a useful tool for you, I uploaded version 0.4.0.0 of ghc-heap-view to hackage.

Sunday, December 16. 2012

Circle Packing

Haskell

For an upcoming introductory Haskell talk of mine (next Tuesday in Karlsruhe – please come and join if you will) I was looking into data visualization as a example application. With libraries like gloss, getting some nice vector graphics up and running within one talk is quite possible.

For some reason, I want to arrange multiple circles of varying size so that they do not overlap and use as little space as possible. My first approach was to use general purpose optimizers such as cmaes, Numeric.GSL.Minimization from the hmatrix package and force-layout. I especially liked the interface of cmaes, which is able to minimize a function of, say, type [(String, Double, Double)] -> Double by automatically finding the values of type Double in the input, as described by Takayuki Muranushi. That would have looked great in the talk. Unfortunately, none of these libraries gave sufficient good results in a reasonable amount of time for this problem.

I then reviewed some of the academic literature on the circle packing problem and there were some algorithms proposed, but no simple one and none of the papers had a concise pseudo-code description that I could transfer to Haskell.

So eventually I set out to implement a relatively dump and slow greedy algorithm myself: I place one circle after another, starting with the largest, and among the valid positions where a new circle touches two already placed circles, I choose the positions closest to the center of mass. The result, available in the circle-packing package, looks surprisingly well, here visualized using the diagrams library (See the documentation of Optimisation.CirclePacking for the code used to generate that image):

Now showing just this image is not a very good way to demonstrate my code. A few years ago, I might have created a CGI script that would dynamically generate images based on your input. But not in 2012: Since my code is pure Haskell without fancy features like type classes, I can use the fay compiler to generate JavaScript code from it. Add a little Haskell code to interact with the HTML5 canvas and now you can interactively try out circle-packing in your browser.

Oh, and while I am talking about neat tricks: You can put vector graphics in your haddock documentation, as I have done for Optimisation.CirclePacking, using the syntax <<<data:image/svg+xml;base64,PD94bWwgdmV...c3ZnPg==>>.

Friday, December 14. 2012

Plätzchentetris

Digital World

What happens if the same person (in this case, my girlfriend) is both a bit geeky and likes to bake? She has ideas leading to this:

Tetris Biscuits

And what is the obvious thing to do with such biscuit (or are they cookies?). Play tetris:

(Also on YouTube if your browser does not play the video above.)

Oh, and to refute common stereotypes about me: The baking was done by me, but she helped me glazing the biscuits.

Friday, December 7. 2012

Calculating the internal rate of return with hledger

Haskell

For more than ten years now, I am a regular and happy user of GnuCash, the free accountancy program. Especially the integrated online banking support for HBCI that allows me not only to fetch my transactions, but also to transfer money using a cryptographic makes it a great tool. Nevertheless, it has shortcomings, especially when it comes to reports. And for some reason I never felt the urge to hack on that C/Guile code (although I recently hacked a bit on the UI).

Joey Hess’ blog post about Simon Michael’s hledger made me look at that command line based accountancy tool. It certainly won’t replace Gnucash for me, but luckily ledger, of which hledger is a re-implementation, can read GnuCash’s file format and convert it to a ledger file that hledger can read as well. So now I can do analysis on my financial data in Haskell – what would you want more?

One report that I was missing was the internal rate of investment of some account. This is the annual rate of a hypothetical fixed interest-bearing savings account that would, given the same deposits and payouts, result in the same final value. I now implemented this as the hledger-irr command and – as you would expect – you can find it on hackage. See the package description for documentation and example output.

It is very fresh, hardly tested code, so don’t base important decisions on it yet. Also, it needs work to work correctly with multiple currencies or things with varying prices, and I am not yet sure what to do there.

Update: Simon asked me to elaborate the difference to hledger-interest by Peter Simons (on which my code was initial based, but eventually I rewrote everything but the option parsing code). The difference is that hledger-interest answers the question „I have this investment with this yearly interest rate (which may even change), how large are the interest payments going to be.“ I imagine a usecase is that you have borrowed some money to a friend at a certain (surely amicable) rate and need to figure out what he owes you, even when you gave him the money mid year and he payed you some back shortly after easter. hledger-irr serves a different purpose: Here you already know your investment payments, gains and losses and you want to have a number describing “how good” your investment was. One usecase might be that, at the end of a live of paying into a pension fund, you ask yourself: “Given all the feeds and commissions, was this a good idea or should I just have invested the money myself?” Then the internal rate of investment tells you how lucrative an alternative fixed-rate investment would have had to been to match your pension fund investment. (Or course this is not the only aspect that you should look at when comparing investments, as it totally ignores risk and possible insurance elements of the investment.)

Saturday, November 24. 2012

screen-message is now scriptable more easily

Digital World

Motivated by a user’s mail, I worked a little bit on screen-message aka sm.This tool, which I wrote 6 years ago and is in Debian since 2007, displays a (possibly multi-line) text as large as possible on your screen and can be used in many, mostly spontaneous ways.

It has always been used in a scripted setting as well, e.g. as make-shift slides for a lightning talk. This is now possible without having to quit sm (and hence flickering): Since version 0.20, you can continuously feed text to "sm -" and it will display everything until the next form feed character (or the end of the file). So this command produces a make-shift digital wall clock, if you happen to need one:

       (while sleep 1; do date +%T; echo -e '\f'; done) | sm -

Since Debian is currently in a freeze to eventually release wheezy (and this feature is hardly release critical) you will have to fetch the new version from experimental.

By the way, screen message is also available as a web version on http://sm.nomeata.de/  and as a Mozilla WebApp (although the usability on smart phone can still be improved, considering that I do not own such a device).

Wednesday, October 24. 2012

c't features Haskell

Haskell

This week’s issue of c't, one of Germany’s largest and most influential computer magazine, has a six page introductory article about Haskell (c't 2012, Issue 23, pages 182-187). The author Harad Bögeholz demonstrates its features and strength by implementing a Slitherlink puzzle solver. As I have helped him a bit by checking the code and article for untypical Haskell code and general advice, I happen to be named at the end of the article – nice.

Saturday, October 13. 2012

Creating a Debian source package without unpacking the source

Debian

There may be a situation where you want to create a Debian source package without having to unpack the upstream source. One case is where you store your debian/ directory in a version control system of its own and to build your package, you pass the .dsc file to a tool like pbuilder or some remote builder. Now a modern debian source package is just a text file (the .dsc file) that references two tarballs, one being the upstream source and the other being the Debian directory. Thanks to Niels Thykier on #debian, I found the magic invocation of dpkg-source that allows you to create the .dsc directly from the two tarballs, the control file and the changelog:

> apt-get source --download-only -t experimental haskell-mtl
Reading package lists... Done
Building dependency tree       
Reading state information... Done
NOTICE: 'haskell-mtl' packaging is maintained in the 'Darcs' version control system at:
http://darcs.debian.org/darcs/pkg-haskell/haskell-mtl
Need to get 18.8 kB of source archives.
Get:1 http://http.debian.net/debian/ experimental/main haskell-mtl 2.1.2-1 (dsc) [1616 B]
Get:2 http://http.debian.net/debian/ experimental/main haskell-mtl 2.1.2-1 (tar) [13.7 kB]
Get:3 http://http.debian.net/debian/ experimental/main haskell-mtl 2.1.2-1 (diff) [3509 B]
Fetched 18.8 kB in 0s (22.3 kB/s)    
Download complete and in download only mode
> cp ./haskell-mtl_2.1.2-1.dsc ./haskell-mtl_2.1.2-1.dsc-orig
> tar xzf haskell-mtl_2.1.2-1.debian.tar.gz
> dpkg-source -cdebian/control -ldebian/changelog --format="3.0 (custom)" --target-format="3.0 (quilt)" -b / haskell-mtl_2.1.2.orig.tar.gz haskell-mtl_2.1.2-1.debian.tar.gz
dpkg-source: info: using source format `3.0 (custom)'
dpkg-source: info: building haskell-mtl in haskell-mtl_2.1.2-1.dsc
> diff  -u haskell-mtl_2.1.2-1.dsc haskell-mtl_2.1.2-1.dsc-orig
--- haskell-mtl_2.1.2-1.dsc	2012-10-13 15:09:00.031695856 +0200
+++ haskell-mtl_2.1.2-1.dsc-orig	2012-10-13 15:08:47.267696329 +0200
@@ -1,3 +1,6 @@
+-----BEGIN PGP SIGNED MESSAGE-----
+Hash: SHA1
+
 Format: 3.0 (quilt)
 Source: haskell-mtl
 Binary: libghc-mtl-dev, libghc-mtl-prof, libghc-mtl-doc
@@ -23,3 +26,11 @@
 Files: 
  943c110524d96126bfa0e61f7df1ebcd 13723 haskell-mtl_2.1.2.orig.tar.gz
  dda5ded58a8d009ecddeed68058ca787 3509 haskell-mtl_2.1.2-1.debian.tar.gz
+
+-----BEGIN PGP SIGNATURE-----
+Version: GnuPG v1.4.12 (GNU/Linux)
+
+iEYEARECAAYFAlBzRQIACgkQ9ijrk0dDIGwymwCfW87jwCv5ZvAXPBwh0LbvYLP9
+n+wAn3TCJwPRuBDEWRYmcv3Opn1Kb9+F
+=/Llc
+-----END PGP SIGNATURE-----

As you can see, the resulting .dsc file is identical to the original one, so this approach works. The parameter to -b does not matter as long as it is some existing directory. Of course you can easily shoot yourself and others in the foot, almost as easily as hand-editing the .dsc file... but if you know what you are doing, then this can be quite handy.

Next to do: Write a tool that takes an unpacked Debian directory as an argument, packs it, finds the original tarball using uscan and creates the .dsc file. Would this be something that people would want in the devscripts package?

Tuesday, October 2. 2012

Dennis Felsing’s thesis on ghc-vis submitted

Haskell

Last week, my first student ever submitted his bachelor thesis „Visualization of Lazy Evaluation and Sharing“ about a tool he created that allows you to investigate the heap of a running Haskell program, including unevaluated values and their sharing properties, in a visual and interactive manner. You can see it in action at the end of the  video of my lightning talk about ghc-dup at the Haskell Implementors Workshop this year, and read more about it on Dennis’ ghc-vis web page. I’m quite happy about the result and I think I can get used to have other people implement my ideas and needs :-)

Dennis said he is willing to continue the maintenance of the tool, so give it a shot and bombard him with bug reports and feature requests!

Friday, September 28. 2012

The might applicative left fold

Haskell

Just about three years ago, I created the “automatic rule-based time tracker” (arbtt), a tool that quietly runs in the background, records your current windows and their titles, and allows you to later process this data using, well, rules. By now, I have almost half a million records, spanning 317 days of me working at the computer. Processing this amount of data has become very memory-hungry: It takes 1G of RAM to process my data. Other users, with a presumably higher sampling rate, report even worse performance, which finally made me approach the problem again.

Naturally, one wonders why it is such a memory hog, after all it takes just one run over the data to produce a report. The data itself is a lazy list so the garbage collector should the be able to drop the seen elements immediately.

But arbtt is able to print several reports in one run; there is even an option (--each-category) that creates a number of reports based on the data itself. The code was structured that first a number of values that are possibly used by several reports are first created one and then re-used by the reports. And then there is the issue of the filters: The user specifies a what predicates are interesting (e.g., those where he was active) and some reports need to know not only which samples were matched, but also distinguish the consecutive runs of selected samples.

Lots of reasons that made it hard to not use the list of of samples several times, which causes it to be completely retained in memory. The idea of completely rewriting everything in a much more imperative way was not appealing. But luckily Haskell’s good abstraction possibilities can help here.

What is the “correct” way to traverse a lazy list and calculate some information about it? It is a strict left fold, i.e. an algorithm that can be represented as arguments to

foldl' :: (s -> x -> s) -> s -> [x] -> s

For those who are new to folds: The first argument is a function that processes one element from the list, returning the new state of the processing. The second argument is the initial state. And foldl' can apply such an algorithm to a list to return a final state.

So obviously I had to express my reports as left folds. To think clearer about it, I gave them a name. I also added a finishing step to the mix, which is handy:

data LeftFold x a = forall s. LeftFold {
    start :: s,
    process :: s -> x -> s,
    finish :: s -> a
    }

A value of type LeftFold x a is then an algorithm that traverses a list of xes and returns something of type a. The forall s. means that the user of such a processor must not make any assumptions about the type of the state. I also have a function that runs such a list processor, simply by invoking foldl' with the arguments stored in the LeftFold constructor:

runLeftFold :: LeftFold x a -> [x] -> a
runLeftFold (LeftFold st1 p1 f1) xs = f1 (foldl' p1 st1 xs)

Now, as a Haskell programmer, I am searching for monads everywhere, and it really would have been nice if this were a monad, as they are very easy to compose. But unfortunately, they do not form a monad. If they did then I could decide after running one processor and looking at the result how I would want to process the list now, but this obviously contradicts with the desire to traverse the list only once.

But there is the monad’s smaller ugly sibling, the applicative functor. One can still somewhat nicely compose them to form more complex algorithms, just with more restrictions. Here is my instance:

instance Functor (LeftFold x) where
    fmap f (LeftFold st1 p1 f2) = LeftFold st1 p1 (f . f2)

instance Applicative (LeftFold x) where
    pure x = LeftFold () const (const x)
    LeftFold st1 p1 f1 <*> LeftFold st2 p2 f2 = LeftFold {
        start   =                   st1 :!: st2,
        process = \(s1 :!: s2) x -> p1 s1 x :!: p2 s2 x,
        finish  = \(s1 :!: s2)   -> f1 s1 (f2 s2)
        }

The :!: operator is a strict tuple constructor. This is important, as otherwise the fold would be come a lazy one, I would be building up huge numbers of thunks and performance would be very bad. For the same reason every list processor should make sure that its state does not build thunks once it is in weak head normal form.

Now I am at a abstraction level that I feel comfortable working in, by writing functions that combine the list processors in various ways. Probably the most important one for me is already present in the libraries (here with a specialized type):

sequenceA :: [LeftFold x a] -> LeftFold x [a]

This is the whole trick: Now I can, based on the user input, run any number and selection of reports in one single list traversal. I find it very elegant that “under the hood”, the strict tuples are nested at precisely the right depth depending on the length of the list.

As hinted at before, I do some interesting stuff that was previously represented as lists of lists of values. Expressing these as left folds was a bit tricky, but again, the complexity is contained in the combinators that I wrote (although it still gets confusing, check out processIntervalReport in Stats.hs):

monoidFold :: Monoid m => LeftFold m m
mapElems :: LeftFold y a -> (x -> y) -> LeftFold x a 
filterWith :: (x -> Bool) -> LeftFold (Bool :!: x) a -> LeftFold x a
onSelected :: LeftFold x a -> LeftFold (Bool :!: x) a
onJusts :: LeftFold x a -> LeftFold (Maybe x) a
onAll :: LeftFold x a -> LeftFold (Bool :!: x) a
runOnGroups :: (x -> x -> Bool) -> LeftFold x y -> LeftFold y z -> LeftFold x z
runOnIntervals :: LeftFold x y -> LeftFold y z -> LeftFold (Bool :!: x) z
lfLength :: LeftFold x Int
lfFirst :: LeftFold x (Maybe x)
lfLast :: LeftFold x (Maybe x)
toList :: LeftFold x [x]
concatFold :: LeftFold [x] [x]

As a result, the program now only requires 10MB and runs 18% faster.

If you wonder why, in the definition of runLeftFold, I do not apply the finish function strictly: That is intentional; I am actually passing the result of a processor that calculates some often required data (e.g. total time recorded) as an argument to all the list processors. (Near the end of the main function in stats-main.hs) This “loop” works fine as long as I am using these only in the finish functions.

There is one downside to this approach: I cannot easily calculate stuff on demand and share it between two reports. Previously, I could just bind this to a variable, pass it to all reports, only if at least one is interested it would be evaluated and if several are interested, the result would be shared. I do not have a good solution for this yet.

The code above is in the self-contained module LeftFold.hs in the arbtt sources.

Several People have had this idea before, here is one blog post by Max Rabkin.

Wednesday, September 12. 2012

“Haskell Bytes”

Haskell

Last Saturday I have held a talk at the Meta Main-Rhein Chaosdays 2012 in Darmstadt, an event of the local Chaos Computer Club groups and not unsimilar to Karlsruhe’s yearly GPN. I was asked to give an talk on a advanced, Haskell related topic, so I took my audience on a guided tour through the memory of a running Haskell program, explained the different types of closures and their layout, and showed how info tables are used to allow the garbage collector to uniformly treat the objects. We also used that knowledge to roughly predict the memory footprint of a program processing a large file using the String data type.

Besides showing raw bytes (using my ghc-heap-view package and a small utility function) I included two visualizations: One is the video of a copying garbage collector at work that I blogged about last week, and the other is a nice tool that a student of mine creates for his bachelor thesis: It allows you to visualize the heap structure of a (potentially partially evaluated) value from GHCi and interactively evaluate parts of the structure – if you want, even thunks that are “hidden” behind other thunks, something that you cannot do from Haskell code. The following picture is a screen shot of ghc-vis visualizing the evaluation of the famous two-line primes definition:

ghc-vis at work

The slides of my talk do not cover everything that was included, but the transcript does. It is in German, so as before, if you want me to translate it, then (convince your professor or employer to) invite me to hold the talk again. The ghc-vis tool can be found on Hackage, more information about it is on the author’s website.

(Page 1 of 19, totaling 279 entries) » next page
Nach oben