Full vs. partial reification

My research project is called interactive programming. The central concepts are (1) reification, where a big-step semantics is taken to characterise a data type, and evaluation to produce instances of this data type; (2) reactivity, the interpretation of a program as a transition system whose states are such instances; and (3) persistence, a means for sharing state across transitions so that common substructure is not duplicated. Since none of these ideas is particularly novel, I need to be careful in my treatment of related work.

Reified evaluation arises, in a restricted form, in Umut Acar‘s self-adjusting computation, as well as in debuggers that record execution traces or similar structures. Reactivity is central to visual programming, spreadsheet languages, functional reactive programming (FRP), and again, self-adjusting computation. Persistence also arises in self-adjusting computation, as a form of memoisation. Self-adjusting computation is therefore of special importance: it is the only prior work which combines all three concepts. What I want to do here is compare Acar’s (mature, well-worked out) system with what I’m hoping to do.

Before I start, I should point the reader to this interim technical report. My report has a number of problems, which I won’t attempt to enumerate, and unfortunately it doesn’t use the term “reification”. (That was Tom Harke’s idea.) It should shed some light on what I mean by a persistent evaluation graph, which is essentially Acar’s notion of a “trace”. My notation is intentionally similar to Acar’s.

Self-adjusting computation

Self-adjusting computation is a language and runtime system for incremental computation. After an initial evaluation, the inputs of a program can be repeatedly modified, and the resulting changes to the output observed. During the initial evaluation, the runtime records a trace identifying how parts of the computation depend on other parts. When an input is modified, the output is re-calculated by a change propagation algorithm, which exploits the information in the trace to perform the update efficiently.

Early formulations required the programmer to deal explicitly with much of the technical machinery involved in making a computation self-adjusting. Although the resulting programs were still much easier to write than hand-coded dynamic algorithms, the approach was difficult and error-prone.

The most recent version, a self-adjusting extension of Standard ML called ΔML, provides more direct language support. This brings it closer in concept to interactive programming. Nevertheless it differs in several important respects, thanks to its focus on incremental performance. The following discussion is organised around the common features of reification, reactivity and persistence. The figure show some typical ΔML code.


structure ModList =
struct
datatype 'a cell = NIL | CONS of 'a * 'a modlist
withtype 'a modlist = 'a cell box
type 'a t = 'a modlist

afun lengthLessThan (l: 'a modlist) : bool box =
let putM = mkPut $ ()
afun len (i,l) =
if i >= n then false
else case get $ l of
NIL => true
| CONS(h,t) => len $ (i+1,t)
in putM $ (NONE, len $ (0,l)) end

afun map (f: 'a -> 'b) (l: 'a modlist) : 'b modlist =
let val putM = mkPut $ ()
mfun m l =
case get $ l of
NIL => NIL
| CONS(h,t) => CONS (f h, putM $ (SOME h, m t))
in putM $ (NONE, m l) end
end

Reification

In both systems, the initial evaluation is reified into a data structure capturing the dependencies between parts of the computation.

The key difference seems to be in whether the reification is “full”, or only “partial”. In self-adjusting computing, the trace only captures the aspects of evaluation relevant to efficient incremental update. In interactive programming, the entire evaluation is reified.

Self-adjusting computation Interactive programming
The programmer must explicitly identify the adaptive aspects of the computation. The type constructor box distinguishes changable data, which may change after the initial evaluation, from stable data, which cannot. Keywords afun and mfun identify adaptive functions, which produce and consume changeable data. All data is ‘changeable’; all functions are ‘adaptive’.
Computations are partly reified into traces. Traces are data structures which record the dependencies between adaptive parts of the computation, and the code fragments associated with them. Computations are fully reified into evaluations. Evaluations are data structures “all the way down”, recording individual steps of the big-step semantics.
No direct requirement that traces reflect in any way the derivation tree given by the original semantics, but only that the final computed values are consistent. Evaluation graph can at any state be unravelled into the derivation tree given by the original semantics.
Runtime interface exports operations for querying and updating changeable data, but not for observing traces. But trace information could in principle be exported. Runtime interface exports evaluation graphs as well.

Reactivity

In both systems, data can be modified after the initial evaluation, causing the output to be updated. Synchronisation updates the trace as well as the output, producing a result consistent with an initial evaluation, and leaving the system ready to respond to another input change. Both implementations are imperative, to allow state to be reused across transitions.

The differences seem to follow from the different treatment of traces: partial reification means that self-adjusting computation has an (efficient, but potentially problematic) hybrid execution model.

Self-adjusting computation Interactive programming
Partial reification means that change propagation must re-execute, rather than merely synchronise, adaptive computations when the modifiables they read have changed. Full reification of evaluation means no notion of “re-execution”. Synchronisation is the reactive mutation of structure, “all the way down”.
Re-execution interacts poorly with imperative features such as I/O (effects may be duplicated) and memory allocation (effects may differ). The occurrences of the mkPut primitive in the figure above are a symptom of the latter problem. Full reification implies a purely functional treatment of effects: the evaluation can include descriptions of effects, and these descriptions can be updated by synchronisation. Invites the question of how interactive programming accommodates “traditional” I/O, such as launching missiles or writing snapshots of execution to a file.
Modification to data values only. Since ΔML is higher-order, these values may be of function type. However, functional values are opaque, rather than represented as changable data structures. Intention is to permit modification to code as well as data, as a separate consideration from higher-order vs. first-order. Requires programs to have a similar representation to user-defined data types: in Acar’s terms, modifiable code, as well as modifiable data.
Functional specification; mutable traces not part of observable behaviour. Imperative specification; mutable evaluation nodes essential part of observable behaviour.
Imperative, bottom-up implementation, based on priority queue. No proof that algorithm conforms to functional specification. Top-down implementation, corresponding closely to imperative specification.
Implementation shown to be optimally efficient for programs which are monotone with respect to a particular class of input changes. Inefficient. No performance guarantees.

Persistence

In both systems, sub-computations are shared across states by a generalised memoisation facility which stores not just computed values but associated traces.

Here, the differences arise because interactive programming makes the identities of evaluation nodes part of the observable structure. This shifts memoisation from a performance trick, to playing an essential role in the treatment of evaluation graphs as purely functional data structures.

Self-adjusting computation Interactive programming
Traces are not exported by runtime interface, so memoisation can only be observed indirectly, as incremental performance. Evaluations are exported by runtime interface, so memoisation can be observed as an evaluation subgraph shared by adjacent states.
Only adaptive functions can be memoised, not arbitrary terms. Programmer specifies when adaptive functions are memoised, via keyword mfun. Evaluation of every term is memoised, effectively by memoising the eval function of the interpreter. Common substructure is therefore always shared by adjacent states; evaluation graph is a purely functional data structure.
Memoisation only occurs across evaluations, not within a single evaluation. Memoisation occurs within evaluations too, meaning that common sub-computations are shared.
Essential role of memoisation in performance restricts language to CBV. Unclear what kind of incremental performance is possible under CBN. Not limited to CBV, because performance is disregarded. Intention is to support CBN.

The $64,000 question is: can full reification ever be done efficiently?

Comments are closed.