Three weeks ago, I announced the automatic rule-based time tracker here, and it seems that there are actually users out there :-). Since then, it has recorded more than 240 hours of my computer’s uptime in about 15000 samples. Until now, this data was stored in a simple text file, one line per entry, and relying on Haskell’s Show/Read instances to do the serialization. Although not quite unexpected, this turned out to be a severe bottleneck: Already, it takes more than 25 seconds to parse the log file on my computer.
Following an advice given to me on the #haskell IRC channel, I switched to a binary representation of the data, using the very nice Data.Binary module. The capture program will automatically detect if the log file is in the old format, move it away and convert the entries to binary data. And voilà, the statistics program evalutes my data in about two seconds! This should be fast enough for quite a while, I hope.
In my binary instances, which you can find in Data.hs, I prepended a version tag to each entry. This hopefully allows me to add more fields to the log file later, while still being able to parse the old data directly and without conversion. To still be able to manually inspect the recorded data, the program arbtt-dump was added to the package. The new version is uploaded to hackage as 0.3.0.
One thing still worries me: With the old format, I could easily throw away unparseable lines in the log file (e.g. from a partial write) and still read the rest of it. With the new code, there is no such robustness: If the file breaks somewhere, it will be unreadable in its whole, or at least to the broken point. I’m not sure what to do about that problem, but given the very low number of bytes written each time, I hope that it will just not happen.
Trackbacks