Swirly Mein Kopf

Sunday, August 22. 2010

ipatch on hackage

Darcs Haskell

With the Beta 3 release of Darcs 2.5 on Hackage (the Haskell library and program repository), the ipatch program I recently introduced could now be uploaded to hackage, too. If you use cabal-install, you can now install and use it with a simple run of "cabal install ipatch".

I also made the program now handle patches that add or remove files, extended the help texts a bit and added a test suite. This means that you can actually make use of ipatch as of now, to split patches into several small patches and to apply a patch interactively. Of course it needs some more testing, and you might have feature wishes – in either case, let me know.

Tuesday, August 3. 2010

ipatch, the interactive patch editor

Darcs Haskell

The problem: Splitting patches

As a Debian maintainer, I often work with patches (files listing changes to text files), for example when tracking the modification I make to some software before I upload the  package to Debian. To manage these patches, quilt is a nice tool: It helps you maintain a stack of patches on top of the original code and encourages you to keep your variously modifications separate.

One use case is not supported by quilt at all: Splitting patches. One often has a large patch containing several independent changes. This might happen after you fix a few problems in the upstream code and then run dpkg-buildpackage, which will create one patch of your changes and put it in debian/patches. Before, I had to manually edit the patch and write the hunks, which are the building blocks of patches, into separate file.

Where it already works

There is no such problem when using a version control system, such as Darcs. Especially Darcs is rightly famous for its user-friendly interface and powerful hunk-selection features. You can even split a single hunk (which could be a change to one line) into two separate steps! Have a look at the HunkEditor page on the Darcs wiki to see how that works.

Let’s steal a feature

Well, it is not stealing if it is Free Software... Darcs has these nice capabilities and provides them in the context of version control systems, while we need them in the context of patch files. But Darcs is providing an API to its code, so shoudn’t it be possible to create a program that uses the Darcs code to split patch files? As a matter of fact, it is possible: You can see that program in action on this 3min Ogg Theora-Video or directly here if your browser supports HTML5:

Nice, can I use it?

The code is a working proof of concept. What you see works. You do not see how it handles patches that create or delete files, patches that do not apply cleanly or are already applied or any kind of error handling. That does not work yet. If you still want to try it, you can grab the code from the Darcs repository at http://darcs.nomeata.de/ipatch, but you need to build the latest development state of the Darcs library first.

I think ipatch could become a very useful and powerful tool with applications in areas where nobody would think of using Darcs. I definitely want some integration into quilt, replacing the splitted patch in the series by the replacing patches automatically. Maybe even a git plugin could be created? But I don’t think I can push this project far enough on my own. So this is an invitation to join me and make ipatch a great tool. This invitation goes especially to the Darcs developers: Please have a look how the code uses the Darcs API and help to improve the collaboration here. I think we can use the darcs-users mailing list until there is need for a dedicated mailing list.

Monday, August 2. 2010

Protecting static content selectively by OpenID

Digital World

When I created my on-line photo album, we had the year 2001 and there was nothing wrong with publishing ones photos in the Internet. It has been almost ten years now, and things have changed. Privacy is an issue, especially when it also affects your friends and family.

The problem

So I needed to find some protection. These were my requirements:

  1. I only want to protect parts of the site, as there are some pictures that I intentionally share with the public. This selection should be possible down to the individual image.
  2. The solution should work without any dynamic server component besides the web server itself. This rules out any self-written CGI scripts as well.
  3. One password, given to all my friends etc., is not sufficient as it might leak to unintended audience.
  4. I do not want my visitors to have to register and remember yet another username and password just for my site. I also do not want to manage this user database.
  5. I do not want to have do large changes to the file structure of my photo album.

This rules out most common options, e.g. protection by a .htpasswd file. Requirements 3 and 4 point to a solution based on OpenID. With OpenID, my visitors can authenticate against a service they already use (Google, Yahoo, etc.), relieving me from the burden of maintaining a user database and them from having to remember a password.

There is a mod_auth_openid module for the Apache webserver, and it is even distributed with Debian in the libapache2-mod-auth-openid package. So requirement 2 is fulfilled. The tricky part is: How do we achieve OpenID protection for some images, and not for others.

The solution

I first played around with selectively enabling or disabling mod_auth_openid based on <FileMatch> directives in the Apache configuration, but it was not elegant and would not scale well. I have more than 20.000 pictures to manage, and have already selected over 5000 pictures to be shown without protection. My solution is based on a partial copy of the whole directory tree that contains all public files. To save disk space, these are just symbolic links to the real file in the protected location. Some mod_rewrite magic then takes care of giving the user the impression that all files are in the same location. I set up a small example of my solution, which has this directory structure:

.:
drwxr-xr-x 2 root root 4096  2. Aug 12:03 images
lrwxrwxrwx 1 root root   18  1. Aug 12:05 index.html -> private/index.html
lrwxrwxrwx 1 root root   18  1. Aug 12:05 login.html -> private/login.html
drwxr-xr-x 3 root root 4096  2. Aug 12:03 private

./images:
lrwxrwxrwx 1 root root 33  2. Aug 12:00 pleaselogin.png -> ../private/images/pleaselogin.png
lrwxrwxrwx 1 root root 28  2. Aug 12:03 public.png -> ../private/images/public.png

./private:
drwxr-xr-x 2 root root 4096  2. Aug 12:00 images
-rw-r--r-- 1 root root  267  2. Aug 12:03 index.html
-rw-r--r-- 1 root root   94  2. Aug 12:01 loggedin.html
-rw-r--r-- 1 root root 2091  2. Aug 12:03 login.html
-rw-r--r-- 1 root root   10 18. Nov 2009  protected.html

./private/images:
-rw-r--r-- 1 root root 4074  2. Aug 11:58 pleaselogin.png
-rw-r--r-- 1 root root 2670  2. Aug 11:58 private.png
-rw-r--r-- 1 root root 2043  2. Aug 11:58 public.png

As you can see, real files only reside in private/, outside of that, only symbolic links exist.

The apache configuration protects the private directory and blends it into the main directory:

   <directory /var/www/nomeata.de/openid-test>
        RewriteEngine On
        # Abuse the login page as an error image
        RewriteCond %{QUERY_STRING} \.(png|jpg)
        RewriteRule ^login.html$ /openid-test/images/pleaselogin.png
        # Ship private files, if they exist, unless public files exist
        RewriteCond  $1 !^private
        RewriteCond  /var/www/nomeata.de/openid-test/$1 !-f
        RewriteCond  /var/www/nomeata.de/openid-test/private/$1 -f
        RewriteRule  ^(.+)$ /openid-test/private/$1
   </directory>
   <directory /var/www/nomeata.de/openid-test/private>
        AuthOpenIDEnabled        On
        AuthOpenIDDBLocation     /var/lib/apache2/mod_auth_openid/mod_auth_openid.db
        AuthOpenIDLoginPage      /openid-test/login.html
        AuthOpenIDTrustRoot      http://nomeata.de
        AuthOpenIDCookiePath     /
        AuthOpenIDCookieLifespan 2592000
    </directory>

A special trick handles the “login page” for protected images: If the login page is requested and the referrer indicates that the user tried to access a .png or .jpg file, apache will instead ship an image containing an error message.

For my photo album I have a small Perl script that, given a directory with a private/ directory therein and a list of rules in form of glob patterns, will symlink matching files and remove symlinks that are not allowed any more.

What’s next?

As you can see, this does not actually protect the content. It only requires the user to authenticate, then everything is visible. To select which OpenIDs are allowed to access which code, some bugs will have to be fixed in mod_auth_openid first. There was little activity there recently, I hope that the project is not dead.

Saturday, July 17. 2010

How forky may one maintain a Debian package?

Debian

I maintain most of my Debian packages because I use them myself. Sometimes, I have some needs that go slightly beyond what is currently offered by the software. This is not a problem: Debian ships Free Software and I can program, therefore I can patch the software to also do what I want it to do. Trying to be a good member of the Free Software community, I then submit the patch to the upstream author. If he accepts the patch (which is usually the case), everything is fine. But what if he does not reply to the report or rejects it because he does not want this feature (although the patch is technically fine)? I see two options:

  1. I could continue to use a privately patched and built version of the package, while separately building packages for Debian. This way, Debian ships the software as intended by the upstream maintainer while I can use the features I need. On the other hand, I would not be using the version that I upload to Debian, which is not good, and it causes double work when a a new version is released.
  2. I could upload a package to Debian that contains my patch. The technical infrastructure to add patch in Debian packages has always been there... I would actually use the package as it is in Debian and only manage one line of versions. But would I be abusing my powers as a Debian maintainer? If I were not the maintainer, I could not make this decision by myself (this happend with my patch to nagstamon). Plus it could have a negative effect on the Debian-upstream relationship.

How do other Debian Developers handle such issues? The actual case I’m considering is a feature enhancement for link-monitor-applet (but I only just wrote the patch, so it does not yet fall in the category “upstream does not reply”).

Thursday, June 24. 2010

nagstamon forklet necessary

Digital World

A while ago, I discovered nagstamon, a very useful piece of software by Henri Wahl. This program sits in the notification area of your desktop and alerts you when your nagios-monitored services have problems. Using nagstamon allows me to keep my servers under close surveillance, and it also adds another channel besides e-mail alerts, which will be helpful in case my mail server has problems.

The wish

I am not a full time sysadmin, I only monitor very few hosts and the services rarely have problems. Therefore, I do not want nagstamon to constantly sit in the notification area but only use it when there is something, well, to notify me about. It turned out that nagstamon did not support this mode of operation, so I created a ticket and asked whether this feature could be added. The author raised two points, one being that then the user would not know when nagios crashed and the other being that you would not be able to configure nagstamon because you do not see it. He also indicated that he does not have the resources to work on it and asked if I could find the time.

The patch

Since I really liked nagstamon, but really want to keep my panel uncluttered, I found the time: I created a series of self-containing patches, adding an option for the feature, adding code to prevent more than one instance of nagstamon running in parallel and adding a "nagstamon --settings" flag that would signal the running instance to show the settings – similar to how mail-notification is been behaving. The author then raised the valid point that some people run more than one instances in parallel, with different configuration options. I then extended the patch to cater for that.

The rebuff

The author remained reserved, did not answer my last commend on the ticket and then, six weeks later, closed the bug without explanation and turned off the possibility to add comments. I can understand when people are reluctant to add contributed features to their code, I often feel the same way. But completely blocking more comments is not a nice way of communicating with possible contributors.

The fork(let)

So I’m left with no option but patching each released version with my changes and building my own package. As I have to do this work anyways, I’d like to share it. You can find my branch in my git repository. If you happen to want this feature as well and are using a Debian-based distribution, please let me know: I am building modified Debian packages anyways and can publish them as well. As I don’t want to maintain this fork of nagstamon I don’t plan to diverge any more from Henri’s code, so if you have other feature requests, please talk to him first.

Saturday, June 12. 2010

bluetile in Debian

Debian

I just packaged and uploaded Jan Vornberger’s window manager bluetile to Debian. This very nice piece of software brings the benefits of a tiling window manager to users who prefer to use the mouse and who don’t want to learn a new programing language to configure their window manager. Bluetile uses the xmonad libraries and extends them with an easy to use and discoverable user interface.

Friday, April 23. 2010

Making dictionary passing explicit in Haskell

Haskell

Haskell provides type classes to support polymorphism. A type class defines a few methods, which can then be implemented for a concrete type in the type class instance. This is a powerful system, but it also has it drawbacks. Most notably, each type can have at most one implementation of the type class. But sometimes you need to use a different implementation.

If, for example, you used the Binary class to store data on disk. Now you changed your data type and the binary instance, and you can not read the old data any more. One solution is to re-name your type using “newtype” and implement another type instance for that. Often, this is enough. But still, instances are not first-class-citizens. You can not pass them around or modify them, as you can pass around and modify data and functions.

Under the hood of the compiler, things look different. The ghc puts the methods of the instance in a dictionary and passes that implicitly to any functions having a (Class a) constraint. (Other implementations exist though)  If one could make that behavior explicit, one could easily modify the instance before passing it to the function. But this is unfortunately not possible.

But it is possible to pass an explicit dictionary along the data. I use the Monoid class as an example, and define a representation of the dictionary to-be-passed, as well as the dictionary of the default instance:

data MonoidDict a = MonoidDict
  { ed_mempty :: a
  , ed_mappend :: a -> a -> a
  }

monoidDict :: Monoid a => MonoidDict a
monoidDict = MonoidDict mempty mappend

(For conciseness, I ignore the mconcat method.) My first idea was to pass this instance along with data: (MonoidDict a, a). But this would not work because there are methods, such as mempty, who need the dictionary without getting passed a value to use. Therefore, I need to put the dictionary both in the covariant and the contravariant position:

newtype WithMonoidDict a = WithMonoidDict (MonoidDict a -> (MonoidDict a, a))

We need functions to clamp a dictionary to a value, and to extract it again:

wrapWithCustomMonoidDict :: MonoidDict a -> a -> WithMonoidDict a
wrapWithCustomMonoidDict dict val = WithMonoidDict $ const (dict, val)

extractFromCustomMonoidDict :: MonoidDict a -> WithMonoidDict a -> a
extractFromCustomMonoidDict dict (WithMonoidDict f) = snd (f dict)

Note that both expect the dictionary, so that it can be fed into WithMonoidDict from “both sides”. For convenience, we can define variants that use the standard instance:

wrapWithMonoidDict :: Monoid a => a -> WithMonoidDict a
wrapWithMonoidDict = wrapWithCustomMonoidDict monoidDict

extractFromMonoidDict :: Monoid a => WithMonoidDict a -> a
extractFromMonoidDict = extractFromCustomMonoidDict monoidDict

We want to be able to pass the wrapped values as any other value with a Monoid instance, so we need to declare that:

instance Monoid (WithMonoidDict a) where
    mempty = WithMonoidDict (\d -> (d, ed_mempty d))
    mappend (WithMonoidDict f1) (WithMonoidDict f2) = WithMonoidDict $ \d ->
        let (d1,v1) = f1 d
            (d2,v2) = f2 d
        in  (d1, ed_mappend d1 v1 v2)

Note that mappend has the choice between three dictionaries This is not a good sign, but let’s hope that they are all the same.

Does it work? Let’s see:

listInstance :: MonoidDict [a]
listInstance = monoidDict

reverseInstance :: MonoidDict [a]
reverseInstance = monoidDict { ed_mappend = \l1 l2 -> l2 ++ l1 }

examples = do
    let l1 = [1,2,3]
    let l2 = [4,5,6]
    putStrLn $ "Example lists: " ++ show l1 ++ " " ++ show l2
    putStrLn $ "l1 ++ l2: " ++ show (l1 ++ l2) 
    putStrLn $ "l1 `mappend` l2: " ++ show (l1 `mappend` l2) 
    putStrLn $ "Wrapped with default instance:"
    putStrLn $ "l1 `mappend` l2: " ++ show (
        extractFromMonoidDict $ wrapWithMonoidDict l1 `mappend` wrapWithMonoidDict l2)
    putStrLn $ "Same with reversed monoid instance:"
    putStrLn $ "l1 `mappend` l2: " ++ show (
        extractFromCustomMonoidDict reverseInstance $
            wrapWithCustomMonoidDict reverseInstance l1 `mappend`
            wrapWithCustomMonoidDict reverseInstance l2)

Running examples gives this output:

Example lists: [1,2,3] [4,5,6]
l1 ++ l2: [1,2,3,4,5,6]
l1 `mappend` l2: [1,2,3,4,5,6]
Wrapped with default instance:
l1 `mappend` l2: [1,2,3,4,5,6]
Same with reversed monoid instance:
l1 `mappend` l2: [4,5,6,1,2,3]

Indeed it works.

Unfortunately, this approach is not sufficient for all cases. It is perfectly valid to have a function with signature (Monoid a => Maybe a -> Maybe a), whose behavior depends on the instance of a, even when being passed Nothing and returning Nothing. Such a function would have a problem here, because the dictionary would not be passed to the function.

I wonder if it would be possible to extend the Haskell language somehow to be able to properly pass an alternative dictionary to such functions. But given that not all compilers use dictionary passing, my hopes are low.

Thursday, April 15. 2010

zpub article in “Linux-Magazin”

Digital World

The 05/2010 issue of the German “Linux-Magazin“ contains an article of mine about DocBook, Subversion and zpub. I was quite surprised to find it there – I submitted it in January and did not receive any feedback. But of course it is a nice surprise to find out it was accepted!

The article, the zpub website and zpub itself is only available in German so far, but there is an English blog post describing zpub.

Thursday, March 18. 2010

libnss-gw-name: A stable name for your gateway

Digital World

I often find myself running /sbin/route to get the IP address of the current gateway, especially when using a wireless LAN while traveling. For example, if the “Internet does not work” I usually ping the local gateway to see where the connectivity problem lies. I also need the IP if I want to access the routers configuration web interface. This is somewhat tedious, so I wrote libnss-gw-name, and now:

$ sudo apt-get install libnss-gw-name
[...]
$ ping gateway.current
PING gateway.current (172.20.239.1) 56(84) bytes of data.
64 bytes from hhicalvin.stud.uni-karlsruhe.de (172.20.239.1): icmp_seq=1 ttl=64 time=2.16 ms
64 bytes from hhicalvin.stud.uni-karlsruhe.de (172.20.239.1): icmp_seq=2 ttl=64 time=1.48 ms
64 bytes from hhicalvin.stud.uni-karlsruhe.de (172.20.239.1): icmp_seq=3 ttl=64 time=2.73 ms
^C
--- gateway.current ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 1.482/2.129/2.739/0.513 ms

Once libnss-gw-name is installed, it hooks into the system’s Name Service Switch, which is, among other things, responsible for resolving hostnames to ip addresses. It will only react on the name “gateway.current”, checking the system’s routing table and returning the IP address of the current default gateway.

It’s a pretty simple and small tool, but it could well prove very handy to the power user. I uploaded libnss-gw-name to Debian sid, you can download the source code or access the git repository.

Update: Changed the name to gateway.localhost, as that is within a reserved top-level-domain.

Video of my CeBIT talk online

Digital World

As tolimar already said, videos of talks at the Linux New Media booth at CeBIT are online now, including mine about how to submit patches. It is in German, though.

Tuesday, March 16. 2010

kexec saved my day

Digital World

Yesterday evening, when returning from a two-day trip with no connectivity, I found my server to be broken. I still reacted on ping, but no service would respond. I tried to restart it using my hoster’s web interface, but it would not come back up.

I booted into the recovery system and checked the hard disk, but could not find any issues. File system checks went through without a hitch. But it would still not boot. Unfortunately, my hoster does not provide access to the system console, so I had no idea what was going wrong.

I never did anything with kexec, (a relatively new feature of the Linux kernel to act as a bootloader to load another system) and I was very positively surprised to find that it works out-of-the-box and flawlessly: I was able to load my system’s kernel and initrd from the recovery system and successfully booted it. I then ran lilo and rebooted right again, which now worked. I’m not sure if running lilo fixed it, or the clean shut-down, nor do I know what caused the problem in the first place, but kexec saved my day here.

Tuesday, March 2. 2010

Talking at CeBIT tomorrow

Digital World

Today, I arrived at the CeBIT conference in Hannover, and had a first look around. I find trade fairs like that quickly boring, and I was glad to meet some some other Debian folk and listen to some of the talks at the CeBIT Open Source Forum in hall 2, including tolimar’s talk about Debian GNU/kFreeBSD.

Tomorrow (Wednesday), I will talk at the same place at 13:45, explaining some basic stuff about patches and bug tracker. The target audience are users of Free Software who modify it for their private or company-wide use and would like to see their changes included in the official project. There will be a live stream of the talk, which I am officially holding as an employee of the ITOMIG GmbH.

Sunday, February 28. 2010

Exploiting sharing in arbtt

Haskell

My automatic rule-based time tracker (arbtt), which is written in Haskell, collects every minute a data sample consisting mainly of the list of currently open windows (window title and program name). Naturally, this log grows rather large. Since October of last year, I collected 70,000 samples. I already went from a text-based file format to a binary format using Data.Binary, which gave a big performance boost.

But by now, I was afraid that this is not enough. My log file is now 30MB large. Looking at the memory graph of gnome-panel, it is taking up more than half of my memory. When running arbtt-stats, the Haskell run time system reports 569 MB total memory in use and the command finishes after 28.5 seconds.

Naturally, the log file is highly redundant: Compressing it with bzip2 shrinks it to 1.6MB. But as I would like to preserve the ability to just append samples at the end, without having to read the file, I chose not just to add bzip2 or gzip compression. Rather, I am now exploiting a very obvious redundancy: Two adjacent samples usually list exactly the same windows, and a focus change only changes a flag. So now, when storing a string that is part of a sample, it will check if this string was already present in the previous sample and, in this case, just store the number of that string (one byte). Only if the string was not present it will write a zero byte and then the string. When reading the sample, the process is reversed.

This greatly reduces the file size: It is down to 6.2MB. It also improves the memory consumption, due to Haskell’s abilities with regard to sharing: When a reference to a string in a previous sample is read, then only one instance of this string is in memory, even if it occurs several times in the log. This brings the memory consumption down to 264 MB and the runtime to 17 seconds.

I released the changes as version 0.4.5.1 to Hackage, Debian and as a Windows installer. The log file is not automatically converted, but new samples will be written in the compressed format. If you want to convert your whole file, you have to stop arbtt-capture, run arbtt-recover, and then move the hopefully noticeable smaller ~/.arbtt/capture.log.recovered  to ~/.arbtt/capture.log.

The required code changes were not too big. I somewhat isolated the relevant code in the Data.Binary.StringRef module. Unfortunately, I have to use OverlappingInstances to be able to provide the special instance for String – is there a cleaner way (besides the trick used for the Show class)?

Thursday, February 4. 2010

FontForge-Article in the German Linux-Magazin

Digital World

Yesterday, I found the 3/10-issue of the German “Linux-Magazin” in my mailbox. (I don’t dare to call it the March issue – they are a bit off schedule...) On page 62, you can find my 3½ page article about creating a symbol font with FontForge. I briefly covered the topic on my blog and later thought that it would made a nice article, even though I’m not an expert on this area. The article will be freely available in about three years.This is already my third publication, after my article on the Cross-Site-Authentication attack that was published in the same magazine (circulation ~63.000) and in its international counterpart in 2005 and my recent article in the “freeX” magazine (circulation ~15.000). Looks like I’ll have to add a  “Publications” section to my website soon...

Saturday, January 30. 2010

pidgin-blinklight goes subliminal

Digital World

A long while ago I wrote a plugin for gaim called gaim-thinklight that blinks ones ThinkPad ThinkLight when a new message arrives. By now it is called pidgin-blinklight and supports some other hardware as well, but has not changed since over a year. Today, I implemented a new feature, and I’m curious if it will actually work:

Until now, the blink pattern was hardcoded: ON, wait 150ms, OFF, wait 125ms, ON, wait 150ms, OFF. Since version 0.11, pidgin-blinklight will calculate these three delay times based on the contacts login name. So different contacts will have very slightly different blinking patterns. The idea is that, after a while, you start to recognize your frequent buddies already by the blinking. The wait times are from the range from 50ms to 250ms, I hope that range works well.

Users of Debian unstable will get the new version automatically. If you want to compile pidgin-blinklight from source, you will have to grab it from the debian ftp server. The source is in the pidgin-blinklight Darcs repository.

(Page 1 of 15, totaling 222 entries) » next page
Nach oben