======================================================== REDUCERON MEMO 6 Case factorisation and special treatment of constructors Matthew N, 17 November ======================================================== Compilation of the program type Var = Int data Exp = Const Bool | Var Var | And Exp Exp eval :: [(Var,Bool)] -> Exp -> Bool eval env (Const b) = b eval env (Var v) = lookup v env eval env (And e0 e1) = eval env e0 && eval env e1 produces the following combinators. eval env e = e const (var env) (add env) (1) const b = b (2) var env v = lookup v env (3) add env e0 e1 = eval env e0 && eval env e1 (4) Look at the RHS of eval: the variable "env" must be passed down to each case alternative that uses it. In general, several variables may need to be passed down to each case alternative and the number of case alternatives could be large. It is a waste of time and space to write these variables onto the heap several times. One possible solution is to factorise out the environment as follows. eval env e = e const var add env (5) const b env = b (6) var v env = lookup v env (7) add e0 e1 env = eval env e0 && eval env e1 (8) Notice that env is only referred to once in (5) rather than twice as in (1). Not only is the body of (5) three nodes smaller than that of (1) but it also contains one wide application and no small applications. Case memory ----------- Factorising the environment also opens a new oppertunity. Notice that in the body of (5) each case alternative is a *constant*. Instead of writing a group of constants on the heap every time eval is applied, they could just be stored once in "case memory". To illustrate, we might define an array to represent the case alternative constants for eval as evalAlts = [| const, var, add |] and redefine equation (5) as eval env e = e evalAlts env Now constructors must be encoded differently. Instead of encoding them by the equations Const x c v a = c x Var x c v a = v x And x y c v a = a x y we would define them as Const x alts = alts[0] x Var x alts = alts[1] x And x y alts = alts[2] a x y where square brackets represent array lookup. Case memory could be accessible in parallel with other memories, reducing any potential contentions. It does not need to be a wide memory. Special constructors -------------------- There is a more efficient way of dealing with these new kinds of constructors if we avoid treating them as functions. First, observe that a constructor contains two pieces of information: 1. Its index in the list of all constructors of the same type, say i. 2. Its arity, say a. A constructor can be represented by the packed pair (i,a). When such a pair appears on top of the stack, it can be reduced simply to s[sp+a][i] where s is the stack, sp the stack pointer. The expression s[sp+a] points to the array of alternatives in case memory. The ith element of this array up contains the case alternative corresponding the the constructor being scrutinised. To allow constructors to be reduced in this way, we need to make every case alternative ignore the case table sitting "a" places from the top of the stack. For example, we'd need to redefine the case alternatives (equations 6, 7, 8) as follows. const b alts env = b var v alts env = lookup v env add e0 e1 alts env = eval env e0 && eval env e1 Notice that alts is ignored. Potential impact ---------------- Case expressions with many alternatives can be handled more efficiently and constructors can be reduced in a single cycle. Open questions -------------- Colin asked: What is the criterion for deciding when this kind of factorisation is applied? I suppose that even with such a new fast encoding of constructors, it will still be worth in-lining/reducing "standard" constructors at compile/transform time?