Joachim Breitner

Handling explicit and implicit recursion in Haskell data

Published 2008-12-31 in sections English, Haskell.

For a while I have been pondering over a problem that arises when your functionally written program has some state with cross references – for example a list of users, each of which uses a number of computers, and a list of computers, each having an owner.

Implicit referencing

For doing queries on such data, it would be convenient if every reference is just the referenced object itself. Although we would visualize this as a graph, semantically, it is more like an infinite tree. This is possible in Haskell, due to laziness, and if you create the data structure cleverly, it even uses constant memory, no matter how “deep” you enter this infinite tree (a recent post of mine talks about this). A possible data definition would be:

data User = User {
	userName :: String,
	uses :: [Computer]
	}
data Computer = Computer {
	computerName :: String,
	owner :: User -- references the Users
	}
data State = State [User] [Computer]

testState = let user = User "Conrad" [computer]
                computer = Computer "Z3" user 
            in State [user] [computer]

Explicit referencing

But such a representation is very unsuitable for updates (at least I can’t think if a nice way of updating such a graph without breaking the internal cross-references) and serialization, which is a must for a HAppS based application. So what one would probably start with is this data structure:

data User = User {
	userId :: Int,
	userName :: String,
	uses :: [Int] -- references the Computers
	}
data Computer = Computer {
	computerId :: Int,
	computerName :: String,
	owner :: Int -- references the Users
	}
data State = State [User] [Computer]

testState = State
		[User 0 "Conrad" [1]]
		[Computer 1 "Z3" 0]

I think the semantics of this are clear. Note that the referencing is currently not type-safe, but this can be provided by phantom types. Maybe I’ll write more about that later.

Now imaging you want to display the information about the first computer with your web application. You extract the information with let State _ cs = testState in head cs and pass that to your templating engine. But what if your template wants to display the name of the owner? It only has access to his userId. You would either need to know what information the template will ever need, extract that from the state beforehand and pass it along, or give the template access to the whole state. In that case, though, there has to be lookup-logic in your output code, which is also not nice.

Woudln’t it be nice if you could, in your application logic, work with the explicit references, which are easy to modify and store, but somehow turn that into the implicit referencing?

Duplicated representation

One way would be to have two unrelated sets of data structures, ExplicitState, ExplicitUser, ExplicitComputer, which use explicit identifiers to reference each other, and ImplicitState,... which are defined as the first representation of our state. It is then mostly trivial to write a function that converts ExplicitState to ImplicitState.

The big disadvantage of this is that you have to maintain these two different hierarchies. It also means that every function on the state has to be defined twice, which often almost identical code. Clearly, this is not desirable.

Annotated representation

It would be more elegant if the state is stored in one data type that, controlled by a type parameter, comes in the one or the other representation. To do that, we need two types: One that contains a value, and one that contains just a reference:

newtype Id v = Id v deriving (Show, Typeable, Data)
newtype Ref v = Ref Int deriving (Show, Typeable, Data)

Then we need to adjust our data definitions, to make use of these. (I’ll leave out the names, to keep the code smaller)

data User ref = User {
	userId :: Int,
	uses :: [ref (Computer ref)]
	}
data Computer ref = Computer {
	computerId :: Int,
	owner :: ref (User ref)
	}
data State ref = State [User ref] [Computer ref]

Here we introduce a type parameter “ref”, which will later be either Id or Ref. Note that now a reference also states the object it is a reference for, which greatly increases type safety. Functions on these data types that don’t work with the references will be polymorphic in the “ref” type parameter, so only need to be written once. A User Id is a complete user with all related data, while User Ref is a user with only references. And a Ref (User Ref) is reference to a user, which contains references...

Not so kind kinds

Did you notice the lack of a deriving clause? Our data structures have the relatively peculiar kind ((* -> *) -> *), which makes it hard for the compiler to derive instances for things like Show. But we already know that we will only use Id or Ref for the type variable, so we can use ghc’s StandaloneDeriving language extension and have these instances created:

deriving instance Show (User Id)
deriving instance Show (User Ref)
deriving instance Show (Computer Id)
deriving instance Show (Computer Ref)
deriving instance Show (State Id)
deriving instance Show (State Ref)

Toggling a type parameter

The next step is to write the conversion function. It will have type

unrefState :: State Ref -> State Id

For that, and for later, we need lookup functions:

unrefUserRef :: State Id -> Ref (User Ref) -> Id (User Id)
unrefUserRef (State l _) (Ref i) = Id $ fromJust $
	find (\u@(User i' _) -> i == i') l 
unrefComputerRef :: State Id -> Ref (Computer Ref) -> Id (Computer Id)
unrefComputerRef (State _ l) (Ref i) = Id $ fromJust $
	find (\u@(Computer i' _) -> i == i') l 

These expect a State (with implicit referencing) and a reference, and look up this reference. The function unrefState then looks like this:

unrefState :: State Ref -> State Id
unrefState (State us cs) =
	let unrefState = State (map (unrefUser unrefState) us)
	                       (map (unrefComp unrefState) cs)
	in unrefState
  where unrefUser :: State Id -> User Ref -> User Id
        unrefUser s (User i refs) = User i (map (unrefComputerRef s) refs)

	unrefComp :: State Id -> Computer Ref -> Computer Id
	unrefComp s (Computer i ref) = Computer i (unrefUserRef s ref)

Note how we “tie the knot” in the let expression. This is the trick that ensures constant memory consumption, because every reference points back to the same place in memory.

Satisfied already?

So what do we have? We have no duplication of data types, we can write general functions, and we can resolve the explicit referencing. We can also easily write functions like unrefUser :: State Ref -> User Ref -> User Id, which transform just a part of the state.

But writing unrefState is very tedious when the State becomes more complex. Each of the other unrefSomething functions are very similar, but need to be written anyways. This is unsatisfactory. What we want, is a generic function, something like

gunref :: State Ref -> a Ref -> a Id

which, given a state with explicit references, replaces all explicit references in the first argument (which could be State Ref or User Ref or anything like that) with implicit ones. This function can not exist, because we would not know anything about a and b. But maybe we can do this:

gunref :: (Data (a Id), Data (a Ref)) =>  State Ref -> a Ref -> a Id

Typeable and Data

Before being able to do so, we need to derive Data for our types. We can start with

deriving instance Data (User Id)
deriving instance Data (User Ref)
deriving instance Data (Computer Id)
deriving instance Data (Computer Ref)
deriving instance Data (State Id)
deriving instance Data (State Ref

but that will complain about missing Typeable instances. Unfortunately, ghc’s deriver for Typeable (even the stand-alone-one), does not handle our peculiar kind, so we need to do it by hand. With some help from quicksilver on #haskell, I got it to work:

instance Typeable1 ref => Typeable (User ref) where
	typeOf _ = mkTyConApp (mkTyCon "User")     [typeOf1 (undefined :: ref ())]
instance Typeable1 ref => Typeable (Computer ref) where 
	typeOf _ = mkTyConApp (mkTyCon "Computer") [typeOf1 (undefined :: ref ())]
instance Typeable1 ref => Typeable (State ref) where 
	typeOf _ = mkTyConApp (mkTyCon "State")    [typeOf1 (undefined :: ref ())]

everywhere is not enough

Turning to the documentation of Data.Generics, I notice with some disappointment that there is no function that is able to change a type – they all seem to replace a value by another value of the same type. But the functions gfoldl and gunfold sounded like they could be used for this.

Warning: What comes now is a very non-haskellish hack that subverts the type system, just to get the job done. Please read it with a healthy portion of distrust. If you know of a cleaner way of doing that, please tell me!

Wrapped Data

I want to do some heavy type hackery, so I need to disable haskell’s type system. There is Data.Dynamic, but not even that is enough, as we need to carry a type’s Data instance around as well. So let’s wrap that up:

data AData where AData :: Data a => a -> AData
instance Show AData where show (AData a) = "<" ++ show (typeOf a) ++ ">"

fromADataE :: forall b. Data b => AData -> b
fromADataE (AData d) = case cast d of
	Just v -> v
	Nothing -> error $ "Type error, trying to convert " ++
			   show (typeOf d) ++ " to  " ++
			   show (typeOf (undefined :: b)) ++ "."

There is also a function that converts an AData back to a normal type, if possible. If it’s not possible, then there is a bug in our code, so we give an informative error message.

AData transformers

Similar to everywhere, we want to have transformers that combinable. They need to have the change to convert an arbitrary value, but also signal that they could not convert something (and this something has to be recursed into). I came up with this:

type ADataT = AData -> Maybe AData

extADT :: forall a b. (Data a, Data b) => ADataT -> (a -> b) -> ADataT
extADT at t a@(AData v) = case cast v of 
		Just x ->  Just (AData (t x))
		Nothing -> at a

doNothing :: ADataT
doNothing = const Nothing

ADataT is the type for such a transformer. doNothing will not transform anything, and extADT can be used to add any function to the list of tried transformers, in the spirit of extT.

The ugly part

To apply such a transformer, I want to use this function, which I’ll describe in the code comments:

everywhereADT :: forall a b. (Data a, Data b) => ADataT -> a -> b

-- first check if we can transform this value already
everywhereADT f v = case f (AData v) of
        -- if so, coerce it to the users’ requested type, which hopefully goes well
	Just r ->  fromADataE r 
	-- if not, we need to recurse into the data structure
	Nothing -> recursed

	-- for that, we first need to figure out the arguments to the 
        -- constructor. We store them in the untyped list
  where args :: [AData]
        (Const args) = gfoldl k z v
	-- gfoldl lets us have a look at each argument. We wrap it in AData
        -- and append it to the list
	k (Const args) arg = Const (AData arg : args)
	z start = Const []

	-- We need the data constructor of the input data type. If the user did not want
        -- it to be transformed, it will be the same
	c = toConstr v

	-- To give better error messages, we make sure that the outmost type constructor
        -- of the requested type actually has the data constructor we were given. Otherwise
        -- gunfold will complain in a non-helpful way
	input_type = dataTypeRep (constrType c)
	output_type = dataTypeRep (dataTypeOf (undefined :: b))

	recursed = if input_type /= output_type
                   then error $ "Can not convert <" ++ show input_type ++ ">"++
                                            " to <" ++ show output_type ++ ">."
	-- the types match, so we assemble the output type, using gunfold
                   else snd (gunfold k' z' c)
	k' :: forall b r . Data b => ([AData], b -> r) -> ([AData],r)

	-- we start by reversing the input list
	z' start = (reverse args,start)
	
	-- then we call us recursively on the argument and feed the result
        -- to the output constructor
	k' ((AData a) : args, append) = (args, append (everywhereADT f a))

-- Used for folding (we don’t need to retain the original constructor)
data Const a b = Const a 

What a beast! But surprisingly, it works. Here are some examples. Note that I always have to specify the requested output type:

bool2Int :: Bool -> Int
bool2Int False = 0
bool2Int True = 1

*Main> everywhereADT (doNothing `extADT` bool2Int) True :: Int
1
*Main> everywhereADT (doNothing `extADT` bool2Int) True :: ()
*** Exception: Type error, trying to convert Int to  ().
*Main> everywhereADT (doNothing `extADT` bool2Int) (True,False) :: (Int,Int)
(1,0)
*Main> everywhereADT (doNothing `extADT` bool2Int) ([True,False],True,()) :: ([Int],Int,())
([1,0],1,())
*Main> everywhereADT (doNothing `extADT` bool2Int) ([True,False],True,()) :: [()]
*** Exception: Can not convert  to .
*Main> everywhereADT (doNothing `extADT` bool2Int) [True] :: [Bool]
[*** Exception: Type error, trying to convert Int to  Bool.

I hope this code does not inflict too much pain on any Haskell-loving reader. I know I violated the language, but I didn’t know how to do it better (at least not without using Template Haskell). I also know that this is not very good performance wide: Every single value in the input will be deconstructed, type-compared several times and re-assembled. If that is an issue, then this function should only be used for prototyping.

Almost done

To apply this to our state, we only need to glue it to our lookup functions from above:

gunref :: (Data (a Id), Data (a Ref)) =>  State Ref -> a Ref -> a Id
gunref s w = let unrefState = gunref' unrefState s
             in gunref' unrefState w

gunref' :: (Data (a Id), Data (a Ref)) =>  State Id -> a Ref -> a Id
gunref' unrefState = everywhereADT unref'
  where unref' = doNothing `extADT`
		 unrefUserRef unrefState `extADT`
		 unrefComputerRef unrefState

Now we have a generic unreferencer. I set the type a bit more specific than necessary, to make it safe to use (under the assumption that the list of lookup functions is complete and will not leave any Ref in the output).

*Main> testState 
State [User {userId = 0, uses = [Ref 1]}] [Computer {computerId = 1, owner = Ref 0}]
*Main> gunref testState testState 
State [User {userId = 0, uses = [Id (Computer {computerId = 1, owner = Id (User {userId = 0,
uses = [Id (Computer {computerId = 1, owner = Id (User {userId = 0, uses = ..

Oh, and by the way, if you want to test this code, you’ll need at least:

{-# LANGUAGE DeriveDataTypeable, FlexibleInstances, GADTs,
             FlexibleContexts, StandaloneDeriving, ScopedTypeVariables #-}

Comments

Scary.
#1 Felipe am 2008-12-31
i just hope folks like you who have the ability to think and work through such details will be brave enough to continue to do so, and that eventually it will be understood enough to be boiled down to something the rest of us can understand and use :-)



thanks for forging ahead.
#2 Raoul Duke am 2008-12-31
Thanks very much for this;

It strikes me that the problem of cross-referencing may require a lot of work in Haskell, where as in an OO environment it is trivial.

Any comments?

Thanks again,

Paul
#3 Paul Marganian am 2009-01-09
Well, that’s a price to pay for referential transparency (which is probably worth a high price). Note that you can have other ways of representing references (e.g. IORefs) that are closer to what a imperative programmer is used to – I was just exploring some other, more elegant, techniques. This is by no means meant to be the best way.
#4 Joachim Breitner (Homepage) am 2009-01-09
Object orientation does not cost you referential transparency. Nor does OO programming imply imperative. Even in haskell does instancing a type class associate methods with a data type which is at heart OO. The complexities are due to Haskell's limitations but its pros out weigh this con.
#5 Eric Jone am 2009-04-21
Reminds me of http://www.contextfreeart.org/
#6 Brian DeRocher (Homepage) am 2009-06-16
I know Context Free, but I don’t see the relation to that haskell post. Could you elaborate? :-)
#7 Joachim Breitner (Homepage) am 2009-06-16

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.