Swirly Mein Kopf

Sunday, February 28. 2010

Exploiting sharing in arbtt

Haskell

My automatic rule-based time tracker (arbtt), which is written in Haskell, collects every minute a data sample consisting mainly of the list of currently open windows (window title and program name). Naturally, this log grows rather large. Since October of last year, I collected 70,000 samples. I already went from a text-based file format to a binary format using Data.Binary, which gave a big performance boost.

But by now, I was afraid that this is not enough. My log file is now 30MB large. Looking at the memory graph of gnome-panel, it is taking up more than half of my memory. When running arbtt-stats, the Haskell run time system reports 569 MB total memory in use and the command finishes after 28.5 seconds.

Naturally, the log file is highly redundant: Compressing it with bzip2 shrinks it to 1.6MB. But as I would like to preserve the ability to just append samples at the end, without having to read the file, I chose not just to add bzip2 or gzip compression. Rather, I am now exploiting a very obvious redundancy: Two adjacent samples usually list exactly the same windows, and a focus change only changes a flag. So now, when storing a string that is part of a sample, it will check if this string was already present in the previous sample and, in this case, just store the number of that string (one byte). Only if the string was not present it will write a zero byte and then the string. When reading the sample, the process is reversed.

This greatly reduces the file size: It is down to 6.2MB. It also improves the memory consumption, due to Haskell’s abilities with regard to sharing: When a reference to a string in a previous sample is read, then only one instance of this string is in memory, even if it occurs several times in the log. This brings the memory consumption down to 264 MB and the runtime to 17 seconds.

I released the changes as version 0.4.5.1 to Hackage, Debian and as a Windows installer. The log file is not automatically converted, but new samples will be written in the compressed format. If you want to convert your whole file, you have to stop arbtt-capture, run arbtt-recover, and then move the hopefully noticeable smaller ~/.arbtt/capture.log.recovered  to ~/.arbtt/capture.log.

The required code changes were not too big. I somewhat isolated the relevant code in the Data.Binary.StringRef module. Unfortunately, I have to use OverlappingInstances to be able to provide the special instance for String – is there a cleaner way (besides the trick used for the Show class)?

Friday, December 25. 2009

Building arbtt for Windows

Haskell

A friend of mine is interested in trying out the Automatic Rule Based Time-Tracker arbtt which I programmed. Unfortunately, he is using Windows and up to now, arbtt only worked on Linux. But as I wanted to check out Haskell’s cross-platform abilities for a while, this was a good opportunity to do so. I don’t have Windows installed myself (and did not plan to do so), so I did all this under WINE, the Windows compatibility layer, which works very well: It takes only a few minutes to install the Haskell Platform for Windows and then I was able to run wine ghc --make and cabal install.

I played around with some simple programs and was surprised by these timings:

$ rm *.o *.hi; ghc --make fourfours.hs ; time ./fourfours > /dev/null
[1 of 1] Compiling Main             ( fourfours.hs, fourfours.o )
Linking fourfours ...

real	0m1.909s
user	0m1.692s
sys	0m0.208s
$ rm *.o *.hi; wine ghc --make fourfours.hs ; time wine ./fourfours.exe > /dev/null
[1 of 1] Compiling Main             ( fourfours.hs, fourfours.o )
Linking fourfours.exe ...

real	0m1.631s
user	0m1.376s
sys	0m0.092s

So it is faster to run a compiled Haskell program on top of a compatibility layer than directly on Linux! The world is in order again, though, if optimization is enabled:

$ rm *.o *.hi; ghc -O --make fourfours.hs ; time ./fourfours > /dev/null
[1 of 1] Compiling Main             ( fourfours.hs, fourfours.o )
Linking fourfours ...

real	0m0.981s
user	0m0.876s
sys	0m0.108s
$ rm *.o *.hi; wine ghc -O --make fourfours.hs ; time wine ./fourfours.exe > /dev/null
[1 of 1] Compiling Main             ( fourfours.hs, fourfours.o )
Linking fourfours.exe ...

real	0m1.270s
user	0m1.036s
sys	0m0.072s

Funny. Anyways, I wanted to port arbtt. The only platform-dependent part is the capture module that gathers the list of open Windows. The Win32 package that comes with the Haskell Platform did not cover all the functions needed to do so, but creating additional function bindings is really easy with Haskell, as can be seen in the Graphics.Win32.Window.Extra module. I also replaced the locking code that prevents two instances of arbtt-capture to run at the same time by equivalent code using Windows mutexes (module System.Win32.Mutex). With these small changes and some CPP conditionals to make the code compile for either platform, the porting was done! Even accessing the files in ~/.arbtt works correctly on Windows, where it will look in the Application Data folder, without changing the code, thanks to System.Directory.getAppUserDataDirectory.

But Windows users won’t like compiling software on their own. They won’t even like installing software by copying various files to certain directories. Therefore, I also had to create a Windows Installer. I picked Inno Setup, because it’s Free Software and seems to be simpler than NSIS. The installer puts the compiled .exe files, the example categorize.cfg and the HTML documentation in the right spot, adds icons to the Start Menu (“Edit categorize.cfg”, which fires up wordpad, a link to the documentation and the uninstaller), puts arbtt-capture in the Autorun folder, puts the path to arbtt-stats in the PATH variable and starts arbtt-capture at the end (the last three points being optional). Of course it undoes all this when removing the program again. I integrated the call to the Inno Setup installer into the usual ./Setup build process of Haskell packages. Some more details of how to create the Windows installer are mentioned in the README file.

Now all this does not magically add a graphical user interface to arbtt, so users will still have to work with arbtt-stats on the command line – even on Windows. If this is not a problem for you then you can fetch the latest installer from the arbtt homepage. And if you happen to become a serious user of arbtt on Windows and want to help maintaining the Windows port, I’ll gladly share some responsibilities.

I’m very satisfied with the process and the result and I’m happy to know that I can offer some of my programs also to Windows users in the future.It is also a big plus for Haskell – with Python, shipping a program for Windows users is likely more difficult. The next step will be providing gtk-based graphical Haskell applications for Windows, including a nice installer that ideally includes all dependencies (gtk etc.).

Sunday, November 15. 2009

Darcs Hacking Sprint: Mission Complete

Haskell

The darcs hacking sprint is slowly nearing its end. As planned, I have worked on integrating DarcsWatch and bugs.darcs.net, and I am satisfied so far. From now on, if someone submits a Darcs patch to patches@darcs.net, the patch will also be tracked by DarcsWatch. DarcsWatch will display a link to the entry on bugs.darcs.net, and also add a comment to the bugtracker with a link to the patch on DarcsWatch. And eventually, if the patch is included in the darcs.net repository, DarcsWatch will change the state of the ticket to accepted, removing one step of work for the Darcs maintainers. Currently, it checks the state of the repository three times per hour, so expect a delay after you applied the patch to the repository before the state is updated.

Saturday, November 14. 2009

Arrived at the Darcs hacking sprint

Haskell

Today, my alarm clock was set to 4:30, as I was going to Vienna, to attend the Darcs hacking sprint. I’ll be working on DarcsWatch, making it a bit more modular and hopefully integegrate it better into bug tracking systems (especially roundup, as that’s used by the Darcs team). On Monday, I’ll be a tourist until I leave in the evening. If any Debianers or Haskellers want to meet for keysigning or sightseeing, just drop me a mail!

Sunday, October 11. 2009

arbtt: Now with Documentation

Haskell

Yesterday I did what Free Software authors supposedly don’t do. I wrote documentation. In fact, I had a relatively detailed README already, but I thought this would be a good opportunity to create a more elaborate documentation, using the ubiquitous DocBook. You can read the HTML documentation for arbtt online, where it’s automatically updated when I push to the darcs repository. You can see, I use the same CSS file that most Haskell-related DocBook documentation seems to use.

One motivation to use DocBook was that I can extract manpages from it, which should be present if I package arbtt for Debian. I was about to complain that references from the manpages to other part of the documentation can not work sensibly, but using the “profiling” feature of docbook-xsl, I can replace them by a textual reference to the user’s manual if the file is processed for manpage output. Have a look at my changes for that if you want to know how it works.

For some reason, the <refentry>-tags that make up the manpages are split to separate files when generating chunked HTML output, but they do not appear in the table of contents. You have to find a spot in the text where they are linked, as they are now in the section “Program references”, to find them. This is unfortunate, I expect that a few readers might miss this important part of the documentation. Am I doing something wrong?

For the configuration file, I put an EBNF-style grammar description in the documentation. There is a special DocBook module for that, but I’m not satisfied with it. On the right hand side of production rules, it has only special support for nonterminals, but no tags to semantically mark up choice, repetition etc. I followed the lead found at the YAML specification and put <quote>-Tags around literal parts of the grammar, but this is not valid according to the DocBook DTD. Speaking of the DTD: I guess since the EBNF-stuff is just a module, the tag <productionset> is not a valid content of <figure>. So, if I choose to use the EBNF module in a sensible way, I will not have a valid DocBook file.

All in all, I am satisified with the result. Especially that nobody can say any more “I like your program, but I can’t contribute, because I don’t know Haskell”. Just improve the docs! :-)

Sunday, October 4. 2009

arbtt goes Binary

Haskell

Three weeks ago, I announced the automatic rule-based time tracker here, and it seems that there are actually users out there :-). Since then, it has recorded more than 240 hours of my computer’s uptime in about 15000 samples. Until now, this data was stored in a simple text file, one line per entry, and relying on Haskell’s Show/Read instances to do the serialization. Although not quite unexpected, this turned out to be a severe bottleneck: Already, it takes more than 25 seconds to parse the log file on my computer.

Following an advice given to me on the #haskell IRC channel, I switched to a binary representation of the data, using the very nice Data.Binary module. The capture program will automatically detect if the log file is in the old format, move it away and convert the entries to binary data. And voilà, the statistics program evalutes my data in about two seconds! This should be fast enough for quite a while, I hope.

In my binary instances, which you can find in Data.hs, I prepended a version tag to each entry. This hopefully allows me to add more fields to the log file later, while still being able to parse the old data directly and without conversion. To still be able to manually inspect the recorded data, the program arbtt-dump was added to the package. The new version is uploaded to hackage as 0.3.0.

One thing still worries me: With the old format, I could easily throw away unparseable lines in the log file (e.g. from a partial write) and still read the rest of it. With the new code, there is no such robustness: If the file breaks somewhere, it will be unreadable in its whole, or at least to the broken point. I’m not sure what to do about that problem, but given the very low number of bytes written each time, I hope that it will just not happen.

Saturday, June 27. 2009

GPN8: 50% complete

Haskell

Die achte GulaschProgrammierNacht des Karlsruher Entropia e.V. ist hat so langsam Halbzeit, und ich will ein wenig berichten:

Gestern Abend war der erste Vortrag meine Einführung in das Programmierspiel L-seed (mit in Kürze zusammengeschriebene Folien). Ich hatte ja eigentlich keine großen Hoffnungen, dass das Spiel großen Zuspruch erfährt, sind die Möglichkeiten darin ja eher beschränkt und die Ausgabe nicht besonders schick, aber da habe ich mich getäuscht: Seitdem kurz nach dem Vortrag das Netzwerk lief und der Server erreichbar war, wurden ununterbrochen und die Nacht durch Pflanzen programmiert!  Etwa 30 Pflanzen laufen gerade, von vermutlich fast so vielen Spielern. Vor lauter Vorschlägen, Wünsche und Fragen kam ich kaum mehr zum weiterprogrammieren. Immerhin konnte ich die schlimmsten Fehler noch ausbügeln (etwa, dass man mit sehr kurzen Pflanzen sich beliebig schnell vermehren konnte, oder dass vertikale Pflanzen zu stark sind).

Inzwischen wird auch regelmäßig der Zustand des Spiels als Bilddatei rausgeschrieben (natürlich nur während der GPN erreichbar).

Die ersten kämpfen sich jetzt durch die Installation der Haskell-Toolchain, unter Debian momentan wegen Paketen in NEW stark erschwert . Vielleicht kommen bald auch noch Patches!

Wenn Ihr momentan nicht auf den Lseed-Server zugreifen könnt, dann kann das am „Routing-Loch“ liegen: Teile des Internets sind von hier aus nicht erreichbar, weil wohl manche Routingtabellen nicht aktualisiert wurden. Leider auch mein Server, so dass dies hier nur über Umwege schreiben kann...

Saturday, June 13. 2009

Introducing L-seed

Haskell

In two weeks, the eighth „Gulasch-Programmier-Nacht“ will be held in Karlsruhe, a yearly geek event  by the Entropia e.V, which is the local CCC club. It will, as usually, offer a lot of interesting talks and events. One of my personal highlights have always been the programming games: Games, where you write your own code to compete against others, while the playing field is projected in the hacking area. The last few years, dividuum has done a great job providing these (as regular readers of my blog might remember).

This year, I’m trying to follow in his footsteps and will provide the programming game, called „L-seed“. This blog post is an introduction (and a call for contribution, at the bottom of the post :-))

The idea

The participants will write code (the „genome“) that describes how plants (the biological type, not the industrial) will grow. The plants will grow simultaneously on the screen (the „garden“), will compete for light and will multiply. The players can not change the code of a growing plant, but they do have the chance to update their code for the next generation – when a plant drops a seed, it will run the newest code. All in all, the game aims to be slowly paced and relaxing, something to just watch for a while and something that does not need constant attention by the players. The score is based on the total amount of biomass produced, but I expect (and hope) that some players will aim for the most beautiful or weirdest shapes.

The plant code

In contrast to the previous years, this year’s game will not allow player to use a full-fledged Turing-complete programming language, but a rather minimalistic rule based language to describe the plant’s growth. Especially, it will be hard to coordinate different branches of the same plant: Information mostly flows from the leaves to the root, and not the other direction.

The simplest plant is based on this code:

// This is the trivial plant, which just grows and grows
RULE "Very simple Rule"
GROW BY 1

You can see that each rule has a name (which is purely informational), and an action which tells the current branch to, well, grow by one. The syntax allows for Java-style comments, whitespace and newlines are insignificant and the reserved words are case-insensitive. The result will be a plant that just grows straight up, for ever and ever. A more complex rule might be this:

RULE "Start"
WHEN Length <= 0
GROW BY 1
SET TAG = "Root1"

RULE "Story 1"
WHEN TAG = "Root1"
// No Percentage means 100%
BRANCH ANGLE = 70°, LENGTH = 2, Tag = ""
ANGLE = -70°, LENGTH = 2, Tag = ""
ANGLE = 0°, LENGTH = 1, TAG = "Root2"
SET TAG = ""

RULE "Story 2"
WHEN TAG = "Root2"
BRANCH AT 100% ANGLE = 70°, LENGTH = 1.5, Tag = ""
ANGLE = -70°, LENGTH = 1.5, Tag = ""
ANGLE = 0°, LENGTH = 1, TAG = "Root3"
SET TAG = ""

RULE "Story 3"
WHEN TAG = "Root3"
BRANCH AT 100% ANGLE = 70°, LENGTH = 1, Tag = ""
ANGLE = -70°, LENGTH = 1, Tag = ""
ANGLE = 0°, LENGTH = 1, TAG = "Root4"
SET TAG = ""

RULE "Story 4"
WHEN TAG = "Root4"
BRANCH AT 100% ANGLE = 70°, LENGTH = 0.5, Tag = ""
ANGLE = -70°, LENGTH = 0.5, Tag = ""
ANGLE = 0°, LENGTH = 0.5, Tag = "Tip"
SET TAG = ""

RULE "Star"
WHEN TAG = "Tip"
Blossom

I added a picture with the resulting tree. The yellow blob at the top is a not-yet-polished rendering of a blossom. At the right, there is already the first offspring of the plant. One thing to keep in mind while writing a genome is that rules are applied to single branches, and not the whole plant. The program will, for each branch individually, check which rules apply and choose one.  I’ll skip a detailed description of the syntax here, eventually you will find proper documentation on the entropia wiki page. You can find more examples in the source repository.

The gameplay

The players will register at a website providing the usual CRUD functionality for their code, with integrated syntax checking. They can have more than one code at the same time, but only one can be marked as „active.“ The program actually serving the projector will regularly fetch the active code and run a around (called „season“) of the game. Whenever a new seed grows, the program will get the possibly updated active code of that user and use that. A season will probably last for a fixed amount of time, and at the end the total biomass accumulated by each player is added up and written back to the database.

The game code

You can fetch the source code from my git repository and browse the haddock documentation. Unsurprisingly, it is written in Haskell. To compile it yourself, you will need the GHC Haskell compiler, parsec version 3 and for the visualization the gtk2hs package, all of which are packaged in Debian unstable. The main.hs is the interesting program. You pass it one or more plants as an argument, and it will start the simulation. If it’s too slow for test runs, then reduce the dayLength variable in Lseed/Constants.hs. If you have trouble getting it to run, just talk to me.

The call for help

As you can see in the picture above, the graphical output is not very aesthetic. I am no artist, and I don’t pretend to be one. So, if you think you have the right touch, maybe know OpenGL and a bit of Haskell, I’d be very grateful if you make it look better. The UI interface is quite simple: You need to have a module that returns an Observer value, which contains a few callbacks for various situations. The code in Lseed/Renderer/Cairo.hs can of course be used as a guideline. I’m suggesting OpenGL because my code is not only ugly, it is also too slow very quickly. If you need any help, just contact me by mail or jabber.

I’m also interested in comments about the game balance, and the expressiveness of the programming language. If you play around with the code and discover that there are missing features in the language, or that your plants grow too fast or too slow, or when you discover bugs, please also tell me.

Thanks

L-seed is based on an old idea of mine, advanced together with Cupe, Sven Hecht is programming the web interface, and Lay is testing the game and bugs me about it to keep the motivation going.

Update: I uploaded the package to hackage, to encourage contributions.

Friday, June 5. 2009

Third place in AI programming contest

Haskell

My contribution to the programming contest held by the German „FreiesMagazin“ got a third place out of 13 submissions. This is quite good, considering that I only wrote a small wrapper around the generic game-tree Haskell library by Colin Adams, and hardly gave any serious thought into the problem.

All entires are available for download. I have annotated the table containing the results with the line count as given by ohcount:

W D L Points Language Code lines
1. Kroschinsky 904 242 54 2954 Python 589
2. Schulz 858 263 79 2837 Python 544
3. Breitner 837 281 82 2792 Haskell 264
4. Jackermeier 754 306 140 2568 Perl 183
5. Roth 574 338 288 2060 C++ 1731
6. Eitel 567 355 278 2056 Ruby 352
7. Reichel 342 328 530 1354 Python 266
8. Zimmermann 303 400 497 1309 Java 1070
9. Apensiv 190 353 657 923 Perl 410
10. Maraun 150 300 750 750 C++ 690
11. Golemo 131 319 750 712 Python 104
12. Ziegelwanger 120 337 743 697 C++ 868
13. Fuest 32 254 914 350 Python 645

Note that the line count for my haskell program includes the game-tree library, which I bundled in my submission. Without it, it’s 156 lines of code I had to write, which is second best in the code golf category.

If you look at the timing statistics, you will see that my program took the longest. When the contest was started, the timelimit was one minute per round – which I of course tried to use as much as possibly, by increasing the search tree depth. Later into the contest, the rules were changed to limit it to one minute for a whole game, and that long-running programs will get points deducted. I did some minor changes based on a profiling run, but did otherwise not care too much about performance. I would have tried to improve the runtime by using Haskell’s good ability for parallelization. But when I asked on what kind of machine the code will be run, but they would not tell me. They said that this is a hobby programmer’s contest where allowing for parallelization were not fair, so I did not work in that direction.

All in all it was a positive experience, showing of Haskell’s qualities as a language that you can quickly get good results with.

Wednesday, January 21. 2009

darcswatch uploaded to hackage

Haskell

I just made my first upload to hackage (not counting uploads I did during my work in Dresden). Don Steward repeatedly told me to package and upload darcswatch, so I just did that. Thanks to Gwern Branwen to contribute the first cabal file.

There is little documentation on how to set up darcswatch yourself, so if you actually want to try it out, you most likely will have to get in touch with me. Note that you can just use the installation on http://darcswatch.nomeata.de/.

If you, for some reason, feel like hacking on darcswatch, I’d be interested in memory reduction. I currently process 916 mails containing 1867 patches and 47MB, as well as 13 repositories, some of which are rather large, and the numbers are increasing.

Wednesday, December 31. 2008

Handling explicit and implicit recusion in Haskell data

Haskell

For a while I have been pondering over a problem that arises when your functionally written program has some state with cross references – for example a list of users, each of which uses a number of computers, and a list of computers, each having an owner.

Implicit referencing

For doing queries on such data, it would be convenient if every reference is just the referenced object itself. Although we would visualize this as a graph, semantically, it is more like an infinite tree. This is possible in Haskell, due to laziness, and if you create the data structure cleverly, it even uses constant memory, no matter how “deep” you enter this infinite tree (a recent post of mine talks about this). A possible data definition would be:

data User = User {
userName :: String,
uses :: [Computer]
}
data Computer = Computer {
computerName :: String,
owner :: User -- references the Users
}
data State = State [User] [Computer]

testState = let user = User "Conrad" [computer]
computer = Computer "Z3" user
in State [user] [computer]

Explicit referencing

But such a representation is very unsuitable for updates (at least I can’t think if a nice way of updating such a graph without breaking the internal cross-references) and serialization, which is a must for a HAppS based application. So what one would probably start with is this data structure:

data User = User {
userId :: Int,
userName :: String,
uses :: [Int] -- references the Computers
}
data Computer = Computer {
computerId :: Int,
computerName :: String,
owner :: Int -- references the Users
}
data State = State [User] [Computer]

testState = State
[User 0 "Conrad" [1]]
[Computer 1 "Z3" 0]

I think the semantics of this are clear. Note that the referencing is currently not type-safe, but this can be provided by phantom types. Maybe I’ll write more about that later.

Now imaging you want to display the information about the first computer with your web application. You extract the information with let State _ cs = testState in head cs and pass that to your templating engine. But what if your template wants to display the name of the owner? It only has access to his userId. You would either need to know what information the template will ever need, extract that from the state beforehand and pass it along, or give the template access to the whole state. In that case, though, there has to be lookup-logic in your output code, which is also not nice.

Woudln’t it be nice if you could, in your application logic, work with the explicit references, which are easy to modify and store, but somehow turn that into the implicit referencing?

Duplicated representation

One way would be to have two unrelated sets of data structures, ExplicitState, ExplicitUser, ExplicitComputer, which use explicit identifiers to reference each other, and ImplicitState,... which are defined as the first representation of our state. It is then mostly trivial to write a function that converts ExplicitState to ImplicitState.

The big disadvantage of this is that you have to maintain these two different hierarchies. It also means that every function on the state has to be defined twice, which often almost identical code. Clearly, this is not desirable.

Annotated representation

It would be more elegant if the state is stored in one data type that, controlled by a type parameter, comes in the one or the other representation. To do that, we need two types: One that contains a value, and one that contains just a reference:

newtype Id v = Id v deriving (Show, Typeable, Data)
newtype Ref v = Ref Int deriving (Show, Typeable, Data)

Then we need to adjust our data definitions, to make use of these. (I’ll leave out the names, to keep the code smaller)

data User ref = User {
userId :: Int,
uses :: [ref (Computer ref)]
}
data Computer ref = Computer {
computerId :: Int,
owner :: ref (User ref)
}
data State ref = State [User ref] [Computer ref]

Here we introduce a type parameter “ref”, which will later be either Id or Ref. Note that now a reference also states the object it is a reference for, which greatly increases type safety. Functions on these data types that don’t work with the references will be polymorphic in the “ref” type parameter, so only need to be written once. A User Id is a complete user with all related data, while User Ref is a user with only references. And a Ref (User Ref) is reference to a user, which contains references...

Not so kind kinds

Did you notice the lack of a deriving clause? Our data structures have the relatively peculiar kind ((* -> *) -> *), which makes it hard for the compiler to derive instances for things like Show. But we already know that we will only use Id or Ref for the type variable, so we can use ghc’s StandaloneDeriving language extension and have these instances created:

deriving instance Show (User Id)
deriving instance Show (User Ref)
deriving instance Show (Computer Id)
deriving instance Show (Computer Ref)
deriving instance Show (State Id)
deriving instance Show (State Ref)

Toggling a type parameter

The next step is to write the conversion function. It will have type

unrefState :: State Ref -> State Id

For that, and for later, we need lookup functions:

unrefUserRef :: State Id -> Ref (User Ref) -> Id (User Id)
unrefUserRef (State l _) (Ref i) = Id $ fromJust $
find (\u@(User i' _) -> i == i') l
unrefComputerRef :: State Id -> Ref (Computer Ref) -> Id (Computer Id)
unrefComputerRef (State _ l) (Ref i) = Id $ fromJust $
find (\u@(Computer i' _) -> i == i') l

These expect a State (with implicit referencing) and a reference, and look up this reference. The function unrefState then looks like this:

unrefState :: State Ref -> State Id
unrefState (State us cs) =
let unrefState = State (map (unrefUser unrefState) us)
(map (unrefComp unrefState) cs)
in unrefState
where unrefUser :: State Id -> User Ref -> User Id
unrefUser s (User i refs) = User i (map (unrefComputerRef s) refs)

unrefComp :: State Id -> Computer Ref -> Computer Id
unrefComp s (Computer i ref) = Computer i (unrefUserRef s ref)

Note how we “tie the knot” in the let expression. This is the trick that ensures constant memory consumption, because every reference points back to the same place in memory.

Satisfied already?

So what do we have? We have no duplication of data types, we can write general functions, and we can resolve the explicit referencing. We can also easily write functions like unrefUser :: State Ref -> User Ref -> User Id, which transform just a part of the state.

But writing unrefState is very tedious when the State becomes more complex. Each of the other unrefSomething functions are very similar, but need to be written anyways. This is unsatisfactory. What we want, is a generic function, something like

gunref :: State Ref -> a Ref -> a Id

which, given a state with explicit references, replaces all explicit references in the first argument (which could be State Ref or User Ref or anything like that) with implicit ones. This function can not exist, because we would not know anything about a and b. But maybe we can do this:

gunref :: (Data (a Id), Data (a Ref)) =>  State Ref -> a Ref -> a Id

Typeable and Data

Before being able to do so, we need to derive Data for our types. We can start with

deriving instance Data (User Id)
deriving instance Data (User Ref)
deriving instance Data (Computer Id)
deriving instance Data (Computer Ref)
deriving instance Data (State Id)
deriving instance Data (State Ref

but that will complain about missing Typeable instances. Unfortunately, ghc’s deriver for Typeable (even the stand-alone-one), does not handle our peculiar kind, so we need to do it by hand. With some help from quicksilver on #haskell, I got it to work:

instance Typeable1 ref => Typeable (User ref) where
typeOf _ = mkTyConApp (mkTyCon "User") [typeOf1 (undefined :: ref ())]
instance Typeable1 ref => Typeable (Computer ref) where
typeOf _ = mkTyConApp (mkTyCon "Computer") [typeOf1 (undefined :: ref ())]
instance Typeable1 ref => Typeable (State ref) where
typeOf _ = mkTyConApp (mkTyCon "State") [typeOf1 (undefined :: ref ())]

everywhere is not enough

Turning to the documentation of Data.Generics, I notice with some disappointment that there is no function that is able to change a type – they all seem to replace a value by another value of the same type. But the functions gfoldl and gunfold sounded like they could be used for this.

Warning: What comes now is a very non-haskellish hack that subverts the type system, just to get the job done. Please read it with a healthy portion of distrust. If you know of a cleaner way of doing that, please tell me!

Wrapped Data

I want to do some heavy type hackery, so I need to disable haskell’s type system. There is Data.Dynamic, but not even that is enough, as we need to carry a type’s Data instance around as well. So let’s wrap that up:

data AData where AData :: Data a => a -> AData
instance Show AData where show (AData a) = "<" ++ show (typeOf a) ++ ">"

fromADataE :: forall b. Data b => AData -> b
fromADataE (AData d) = case cast d of
Just v -> v
Nothing -> error $ "Type error, trying to convert " ++
show (typeOf d) ++ " to " ++
show (typeOf (undefined :: b)) ++ "."

There is also a function that converts an AData back to a normal type, if possible. If it’s not possible, then there is a bug in our code, so we give an informative error message.

AData transformers

Similar to everywhere, we want to have transformers that combinable. They need to have the change to convert an arbitrary value, but also signal that they could not convert something (and this something has to be recursed into). I came up with this:

type ADataT = AData -> Maybe AData

extADT :: forall a b. (Data a, Data b) => ADataT -> (a -> b) -> ADataT
extADT at t a@(AData v) = case cast v of
Just x -> Just (AData (t x))
Nothing -> at a

doNothing :: ADataT
doNothing = const Nothing

ADataT is the type for such a transformer. doNothing will not transform anything, and extADT can be used to add any function to the list of tried transformers, in the spirit of extT.

The ugly part

To apply such a transformer, I want to use this function, which I’ll describe in the code comments:

everywhereADT :: forall a b. (Data a, Data b) => ADataT -> a -> b

-- first check if we can transform this value already
everywhereADT f v = case f (AData v) of
-- if so, coerce it to the users’ requested type, which hopefully goes well
Just r -> fromADataE r
-- if not, we need to recurse into the data structure
Nothing -> recursed

-- for that, we first need to figure out the arguments to the
-- constructor. We store them in the untyped list
where args :: [AData]
(Const args) = gfoldl k z v
-- gfoldl lets us have a look at each argument. We wrap it in AData
-- and append it to the list
k (Const args) arg = Const (AData arg : args)
z start = Const []

-- We need the data constructor of the input data type. If the user did not want
-- it to be transformed, it will be the same
c = toConstr v

-- To give better error messages, we make sure that the outmost type constructor
-- of the requested type actually has the data constructor we were given. Otherwise
-- gunfold will complain in a non-helpful way
input_type = dataTypeRep (constrType c)
output_type = dataTypeRep (dataTypeOf (undefined :: b))

recursed = if input_type /= output_type
then error $ "Can not convert <" ++ show input_type ++ ">"++
" to <" ++ show output_type ++ ">."
-- the types match, so we assemble the output type, using gunfold
else snd (gunfold k' z' c)
k' :: forall b r . Data b => ([AData], b -> r) -> ([AData],r)

-- we start by reversing the input list
z' start = (reverse args,start)

-- then we call us recursively on the argument and feed the result
-- to the output constructor
k' ((AData a) : args, append) = (args, append (everywhereADT f a))

-- Used for folding (we don’t need to retain the original constructor)
data Const a b = Const a

What a beast! But surprisingly, it works. Here are some examples. Note that I always have to specify the requested output type:

bool2Int :: Bool -> Int
bool2Int False = 0
bool2Int True = 1

*Main> everywhereADT (doNothing `extADT` bool2Int) True :: Int
1
*Main> everywhereADT (doNothing `extADT` bool2Int) True :: ()
*** Exception: Type error, trying to convert Int to ().
*Main> everywhereADT (doNothing `extADT` bool2Int) (True,False) :: (Int,Int)
(1,0)
*Main> everywhereADT (doNothing `extADT` bool2Int) ([True,False],True,()) :: ([Int],Int,())
([1,0],1,())
*Main> everywhereADT (doNothing `extADT` bool2Int) ([True,False],True,()) :: [()]
*** Exception: Can not convert to .
*Main> everywhereADT (doNothing `extADT` bool2Int) [True] :: [Bool]
[*** Exception: Type error, trying to convert Int to Bool.

I hope this code does not inflict too much pain on any Haskell-loving reader. I know I violated the language, but I didn’t know how to do it better (at least not without using Template Haskell). I also know that this is not very good performance wide: Every single value in the input will be deconstructed, type-compared several times and re-assembled. If that is an issue, then this function should only be used for prototyping.

Almost done

To apply this to our state, we only need to glue it to our lookup functions from above:

gunref :: (Data (a Id), Data (a Ref)) =>  State Ref -> a Ref -> a Id
gunref s w = let unrefState = gunref' unrefState s
in gunref' unrefState w

gunref' :: (Data (a Id), Data (a Ref)) => State Id -> a Ref -> a Id
gunref' unrefState = everywhereADT unref'
where unref' = doNothing `extADT`
unrefUserRef unrefState `extADT`
unrefComputerRef unrefState

Now we have a generic unreferencer. I set the type a bit more specific than necessary, to make it safe to use (under the assumption that the list of lookup functions is complete and will not leave any Ref in the output).

*Main> testState 
State [User {userId = 0, uses = [Ref 1]}] [Computer {computerId = 1, owner = Ref 0}]
*Main> gunref testState testState
State [User {userId = 0, uses = [Id (Computer {computerId = 1, owner = Id (User {userId = 0,
uses = [Id (Computer {computerId = 1, owner = Id (User {userId = 0, uses = ..

Oh, and by the way, if you want to test this code, you’ll need at least:

{-# LANGUAGE DeriveDataTypeable, FlexibleInstances, GADTs,
FlexibleContexts, StandaloneDeriving, ScopedTypeVariables #-}

Sunday, December 28. 2008

MuMer relaunch – preview online

Haskell

A few years ago, I had some ideas about a real-world trading game. In short, a combination of the game play of “Settlers of Catan”, the cute pseudo-medieval world of “The Settlers” (the computer game), which you can play in your every day live along, without having to sit in front of the computer for a long time. I then started some code, lost motivation and let it sit there for a while.

Recently, I re-developed interest in the idea and started from scratch, using Haskell and HAppS. To avoid losing interest again, I’m now putting the code online and set up the server. I invite everyone to play around with it, maybe have a look at the code, send me patches or comments. As you can see, the web user interface is plain ugly HTML and could need some love. Some CSS is definitely needed, some AJAX would be nice.  Also, the resource tree is very small at the moment – there are a lot of things to work on, even if you don’t want to touch Haskell!

You can register at http://mumer.net/. You will be either given a forest (a source for wood) or a source of stone. You can reap your source, and trade on the “Free Market”, which is where you can always trade online, at bad prices. The idea is that you find real trade partners, to get better prices.

For now, trading without physical contact is possible. You can create so-called issue ids, which represent stips of paper. You can then load resources on them, and give the paper (i.e. the number) to other players, who can then redeem them. Eventually, it is planned that these pieces of paper are provided centrally and (sufficiently) unforgeable, so that it is clear who owns a resource.

You can also bid on certain stuff, such as a sawmil, which allows you to turn wood into boards. It will regularily be re-leased to the highest bidder.

You can get the code (in a darcs repository) from http://darcs.nomeata.de/mumer2 and also browse the code.

If you happen to be at the CCC right now and would like to talk about it, please do so!

Wednesday, October 15. 2008

Infinite loops in Haskell

Haskell

Just some small thoughts about cyclic lists in Haskell. Every Haskell beginner knows that you can use infinite lists, as long as you don’t fully evalute them. So, this is perfectly valid

endless = [0..] -- All natural numbers!
main = print (endless !! 10)

It will not crash, but print "10" as the list has not been fully used.

What happens if we take a piece of the list further down, let’s say at position 1000000000:

endles = [0..]
main = print (endless !! (10^9) )

If you try this at home, better run "ulimit -S -v 1000000" before, because then you’ll get "test: out of memory (requested 1048576 bytes)" instead of a sluggishly swapping machine. What happened? The long list will be evaluated, and fills the memory.

Does this mean that we can not use arbitrary long lists? Let’s try a special case: A list that is infinitely long, but repeating the same values all over again:

list = "Oh, happy day! Oh, happy day. "
cycle' list = list ++ cycle' list
main = do let rlist = cycle' list
          print ( rlist !! (10^9), rlist !! 0 )

We repeat the (finite) list endlessly, and then try to pick the 1000000000th element. We also pick the first element again, to make sure the compiler does not cheat by forgetting the first 999999999 elements (It’s actually pretty nice that the compiler will forget these elements, but not what I want to demonstrate here). Running that code, sure enough, fills up the memory.

But maybe I was coding badly. At least I have re-implemented a function that already exists, which is bad practice. Let’s try with Haskell’s own cycle:

list = "Oh, happy day! Oh, happy day. "
main = do let rlist = cycle list
          print ( rlist !! (10^9), rlist !! 0 )

Now it takes a while, but surprisingly, the memory gauge does not skyrocket, and in the end I’m told that the 1000000000th character in my infinite character string is 'd'. This leads me to the conclusion that the Haskell library uses black magic. Or does it? Here is the definition of cycle:

cycle xs = xs' where xs' = xs ++ xs'

What is the difference to our cycle'? Here, the result is given a name (xs'), which is used again inside the function. So while our cycle' appends the list over and over again, filling up the RAM, their cycle ties a loop and makes the end of the list refer to it’s beginning. And my list lookup will no longer evaluate the list up to infinity, but just run around in, well, cycles until it has counted down from 1000000000. I could even ask for the last element of this list, and it will not use any more RAM than a small, finite list, while endlessly searching for the end of the list.

Despite Haskell being a very high level language, I sometimes wonder how my data will look like on the physical memory. And as you can see, it can make a difference. Some more thoughts on this were written down by Duncan Coutts.

Wednesday, October 8. 2008

Haskell work in Dresden

Haskell

Since Saturday, I’m in Dresden, to work for Janis Voigtländer at the TU Dresden on some of his projects. I’m mostly working with his code from “Bidirectionalization for Free”: QuickCheck-Properties, statstics, a web interface. I’ll do this for just two weeks, and then return to Karlsruhe.

So if someone wants to meet me, maybe for some Keysigning (since DebConf8, I’m on Rank 31, so it’s worth it :-)), just drop me a note.

Saturday, May 17. 2008

FrakView: An Haskell Renderer for Iterated Function Systems

Haskell

For a recent university seminar, I wrote a haskell program to render and edit iterated function systems (IFS), which generates a certain class of fractals, namely self-similar sets. I think the result is quite nice, so I’m sharing the code.

FrakView screenshot

With FrakView you can view a rendering of the attractor of the IFS, with a choice of two algorithms (a straight forward, and a probabilistic), configurable depth and anti-aliasing. You can also modify the IFS by dragging the colored boxes with arrows you see on the screenshot. For the academically inclined, there is also support to visualize cylinder sets and otherwise explore the coding space of the IFS a bit.

The program is written in haskell and uses gtk2hs, the gtk bindings for haskell. It might be interesting for other gtk2hs programmers to see how FrakView solves some issues: For example, it uses the CoroutineT monad transformer I recently blogged about – check out the pausingForM_ function in GUI.hs. Also, the current state of the screen is in one algebraic data type (ScreenConfig) that supports equality checks, so when the user interacts, the code recomputes the new ScreenConfig (using getRenderer), but only redraws the screen if it differs from the previous. This is much easier and more robust than having to decide for each possible user interaction whether it changes what’s on the screen.

You can get the source from the FrakView darcs repository.

(Page 1 of 2, totaling 28 entries) » next page
Nach oben