SAT-solving the testing transition problem

Published 2011-07-26 in sections English, Debian.

It is half-time for me at DebCamp/DebConf11 in Banja Luka, Bosnia-Herzegovina, and that is a good time to write about my work so far. I started with some finger exercises, such as the HTML copying for gnome-terminal (as blogged previously) and cosmetic changes to the buildd.debian.org /status website. After that, I began my main project: A SAT-based solver for the testing transition problem.

In Debian, new software first enters a repository called unstable, where it is tested. Once some requirements are fulfilled (package has been in unstable for usually at least 10 days and has no new release critical bugs), it is entitled to enter the repository called testing, which will eventually form a new stable Debian release. But some packages need to migrate together, usually because either requires the new version of the other. Also, no packages in testing ought to become uninstallable by some seemingly unrelated change.

The software that decides these things is called britney. It is sufficiently good in making sure nothing bad happens, but not smart enough to figure out what to do in case more than two packages need to migrate simultaneously. My goal is to improve this. Now, the the various requirements can all be expressed as predicative formulas, and a lot of research has been going into writing good solvers for such problems, called SAT-solvers. Hence my plan was to only specify what we expect from a testing migration, but leave the search for a solution to such a general purpose and highly optimized program.

My initial progress was good and I had some result after two days of hacking, and the approach is promising. Because the data sets are pretty large (1.5GB of input data, the final SAT problem has 1.8 million clauses with 250.000 variables), so I learned quite a bit about profiling and optimizing Haskell programs, and that parsec is slower than working with BS.lines and BS.split to parse simple input. I also used the FFI to use dpkg code directly to compare Debian version numbers.

At some point I noticed that I actually want to solve PMAX-SAT problems: Given two set of clauses, hard clauses and soft clauses, find a variable assignment that fulfills the hard clauses and as many of the soft clauses as possible. Unfortunately, there are no fast Free PMAX-SAT solvers around. The ones that I found and that were fast enough that I could use them, msuncore by Joao Marques-Silva at the University College Dublin and MiniMaxSat by Federico Heras at the Universitat Politècnica de Catalunya, are only available as statically linked binaries. It is a shame that this is acceptable by the academic community; imagine mathematicians would stop including the proofs in their papers, and only share their theorems.

So if you happen to have written a PMAX-SAT solver and can solve this instance (in weighted DIMACS format) in less than five minutes, and want to brag that the Debian project is using your code, then please release it under a Free license (e.g. GPL or BSD) and tell me about it!

The code of my SAT-based solver is, of course, available, though slightly unpolished. Parts of it (notably the interface to picosat and the PMAX-SAT solvers) might be of general interest and I plan to put them on Hackage as a separate license.

Update: I have contacted the authors of the SAT solvers mentioned above, and they have reconfirmed that they have no intention of releasing the source. Now I put my hope in maxsatz2009 by Chumin LI, which is GPL but was not able to cope with my large instance directly; I think I need to change the memory management.

Comments

Interesting. Do you have a more precise description of the testing transition problem, and/or perhaps a syntax or format for expressing the individual problem instances? I think that would help understanding this a lot better...

(I too am in DebConf)

I could try attacking them with ASP (answer set programming, aka stable models) solvers too. Many of them support weighted rules and minimize clauses.

#1 Sami Liedes am 2011-07-26

You can have a look at the sources, in the file TransRules.hs, to get an idea of the clauses generated. Or even better, talk to me at DebConf if you can find me :-)

#2 Joachim Breitner (Homepage) am 2011-07-27

clasp solves it in 1.4s here and is available under the GPL - see http://www.cs.uni-potsdam.de/clasp/ .

SAT4J-maxsat also solves it in a few seconds and is available under the Eclipse Public License and/or the LGPL.

These same problems can also be encoded as Weighted Pseudo-boolean problems.

See http://maxsat.ia.udl.cat/solvers/ and http://www.cril.univ-artois.fr/PB11/results/results.php?idev=54 .

If the problems are aren't that difficult, you could probably role your own incremental solver if necessary. You just translate it as a series of SAT problems that say "we can falsify <= k soft clauses" and then you keep decreasing k. This can be done by introducing a new variable which is true if it's corresponding soft constraint is satisfied. Then a cardinality constraint to say "<=k variables are true". Then just keep decreasing k until the problem becomes unsatisfiable.

I'll look into your code a little later. If there's any way I can help, let me know!

#3 Matthew Gwynne (Homepage) am 2011-07-28

Hmm.. it seems my message got cut off.

If you want to role your own Partial MaxSAT solver, you can do it fairly easily using incremental SAT solving.

Just encode that you can satisfy all hard clauses (i.e., they are just normal clauses) and falsify less than k soft clauses.

You can do this by just introducing a new variable for each soft clause, which is true if the soft clause isn't satisfied. Then encode a cardinality constraint which says that the sum of all the new variables must be less than k.

Interatively decrease k until the problem becomes unsatisfiable, then your optimum is k+1 :).

#4 Matthew Gwynne (Homepage) am 2011-07-28

Great, perfect. How comes that clasp, which does not even seem to be a dedicated MAX-SAT-solver beats all the rest? Maybe because of pre-processing that better suits such real-life input.

Anyways, folding constant variables in my code, before passing it to the max solver, gets the instance size down very quickly, so speed has become less of an issue.

#5 Joachim Breitner (Homepage) am 2011-07-29

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.

Go up