Changing the past in open systems

This post is about something I call retroactive update and its significance for editing a live program. Execution environments that allow you to hack running programs are an idea constantly being reinvented. I take that as a sign that it’s a good idea. There are debuggers and virtual machines that allow you to change definitions while execution is suspended at a breakpoint, and then to resume under the new definitions. In live programming, which means different things to different people but which here I will just define by example (impromptu, Bricks, fluxus, SuperCollider), you hack concurrent processes to change their ongoing behaviour. Lots of web frameworks have something like Rails “development mode” where you change code and have it reloaded into the browser automatically without restarting the app. Erlang has hot code loading. And I haven’t even mentioned the old stuff (ok, I can’t help myself: SketchPad, ThingLab, VisiProg, ARK), or the new stuff (Bret Victor, Light Table, VisiPro, Subtext, Kahn Academy Computer Science, RDP). Then there are spreadsheet languages and visual programming and…I could go on. This just scratches the surface of what’s already been done. Although these things differ wildly in what they mean by “execution”, “program”, and “make a change”, there is something they have in common. My personal suspicion is that Light Table will be the thing that brings this “something” to the masses, and at that point the pressure will be on to understand what that something actually is.

I’ll call this general ability to modify a running program online update. Online update is important — essential, I would say — for lots of reasons beyond the “immediate feedback” motivation often touted. I really like David Barbour’s short list of reasons why we should care. But although online update is fast becoming essential, in most of its current forms it is ad hoc and unsound. It’s basically just hacking and hoping. The user’s interactions and the program together form a crazy hybrid computation that interweaves changes to the program with execution of the program, in an informal and unspecified way. Sometimes this is the point; sometimes it’s not the point, but it doesn’t matter much. But in general, it matters a lot. We want implementations of paradigms, not just “implementations”. Informality by design. By way of example, in all the discussion around Light Table, no-one seems to have wondered out loud, “what are we actually trying to do here?”. Somehow it doesn’t matter, we’ll just hack something together and see what happens.

Programming in the subjunctive

One important step towards a more systematic approach to online update is to make the dimension of interaction explicit. This is one of the things I’ve focused on in my own research, which I call interactive programming, although that term has probably already been laid claim to. I allow the user to step sideways in time, into a “counterfactual” execution where it is “as though” the program had been written differently from the outset. Inspired by Demaine etal‘s retroactive data structures, which are imperative data structures which permit modifications to the historical sequence of operations performed on them, I’ll refer to this notion of online update as retroactive update. Retroactive update allows the “computational past” to be changed. Self-adjusting computation (SAC) is another system based on retroactive update. SAC explores another crucial aspect of online update: efficient update, via an algorithm called change propagation. SAC’s commitment to retroactivity appears in the correctness of change propagation, which is defined as consistency with a from-scratch run under the modified code. (The kinds of changes you can make in SAC are quite limited, but this need not distract us here.)

There is an apparent thorn in the side of retroactive update: computational effects. In order for it to be “as though” the previous computation never happened, effects have to be reversible. We see this in SAC, where all mutating operations must be revocable. (Related, but probably another distraction: effects also need to be persistent in spacetime, which amounts to idempotence. There must be no sense in which the “same” effect can happen “twice”.) The problem is when the application being updated is embedded in a larger computational system, as live systems usually are. Then its effectful interactions with other agents are typically irreversible. Writing output to a remote console, committing a financial transaction, and emitting a sound through a speaker are all examples of actions that cannot usually be revoked.

Even when we are able to revoke all the effects we would need to for semantic consistency with a from-scratch run, it might just be impractical to do so. For example, for a long-running application such as an online retail website, which writes many transactions to a database, an update semantically equivalent to re-running all past transactions with the new code against an empty database would quite possibly be infeasible to execute. I can imagine applications where this would make sense. But it cannot in general be a correctness requirement.

Retroactive update, at least naively construed, thus embodies a closed-world assumption: that every past interaction falls within the scope of the update. For open systems, we need an approach to online update which is not so committed to reinventing the past. Dynamic software updating (DSU) is one attempt at this. In its most basic form, DSU tries to balance flexibility of update – permitting as broad a class of updates as possible – with safety, meaning that subsequent execution will not fail with a runtime type error as a consequence of the update. I’ll explain the basics of DSU with reference to the Proteus system and its successor Proteus-tx, which I believe are roughly representative of the state of the art. What we’ll see is that, while consistency with a from-scratch run is apparently too strong a property for many applications, merely guaranteeing that an update will not introduce a type error is far too weak. It’s probably a necessary condition, for online update in a typed language, but it’s insufficient. It’s essentially just a type-safe version of hack-and-hope. Proteus-tx is a partial fix, but I think what that improvement actually points towards is a revisiting of the retroactive semantics in a new light.

In Proteus, the user first specifies a collection of modifications called an update which they intend to apply to a running system. An update is a set of new bindings for selected global symbols, with no fine-grained information about how they changed. (This is in contrast with interactive programming, where changes are syntactic deltas, but this isn’t the root of the problem.) DSU also permits data type definitions to change. In that case, the update must additionally include a “migration” function which, when the update is applied, will be used to translate values created under the old data type definition to forms consistent with the new definition.

An update point is a static program location specified by the user. At runtime, if an update is pending and an update point is reached for which the update is deemed type-safe, the update is performed and then execution resumed. Otherwise execution continues as normal. What distinguishes a DSU update, compared with a retroactive one, is that invocations of functions that are active when the update is applied will finish executing with the old version of the code, and only subsequent invocations will use the new code. For this to be sound, DSU relies on a property called “con-freeness”. Intuitively, the idea behind con-freeness (for an update to a data type definition, say) is that old code that will resume after the update will not concretely manipulate values of that type. Accessing a field or pattern-matching are concrete usage; simply mentioning a value of that type, without relying on its representation, does not compromise con-freeness. Notions of con-freeness are also defined for function and variable update, and then the authors give an algorithm which statically approximates con-freeness for a given update site.

This form of DSU ensures that updating is sound from a typing point of view. However, it doesn’t rule out other runtime errors which would not arise under retroactive update, but which can arise under a hybrid execution of old and new code. Here is an example, taken from the Proteus-tx paper:

   proc f () =                            proc f () =
      ...                                    ...
                                             g();

   proc h () =                            proc h () =
      ...                                    ...
      // update point                        // update point
      f();                                   f();
      g();

Suppose the program on the left is edited into the program on the right by moving the call of g in h into the body of f. Don’t worry about whether there are other calls of f which are adjusted similarly, i.e. whether this is a refactoring, or a behaviour-changing edit. It doesn’t matter. We need only note is that if we dynamically update the program during an invocation of h, just before h makes the call to f (at the point indicated above), then that call to f will invoke the new version of f, which will call g. Then, the old definition of h will finish executing, resulting in g being called twice, even though this is not possible under either version of the program alone. If g has side-effects, this could be catastrophic. In fact, this isn’t really a solution. It’s just a type-safe version of the problem.

Transactional version consistency

Needless to say, this observation didn’t escape the DSU folks. It motivated an improved approach to DSU, called transactional version consistency, which has been implemented in the Proteus-tx system mentioned above. The idea of transactional version consistency is to allow the user to designate blocks of code as transactions, whose execution will always be attributable to a single version of the program. A contextual effect system, for any expression, statically computes approximations of the effect of the computation that has already taken place (the prior effect), and of the effect that has yet to take place (the future effect). An update is permitted during the execution of a transaction if it will be “as though” that transaction had run under the new code from the outset, or under the old code from the outset, but not some hybrid of the two, using the prior and future effects of the update point to decide this conservatively. If neither of these is the case, either the update or the transaction must be rejected. This approach is quite effective for so-called “long-running” applications, since (as the designers note) such applications are in fact often structured around relatively short-lived concurrently executing user sessions. Each session either aborts or executes to completion, committing changes to a shared persistent store. For these systems, transactional version consistency offers a less insane version of DSU.

Stoyle and colleages, in the Proteus paper, attribute the difficulties with DSU, such as those addressed by transactional version consistency, to the flexibility of being able to change the program in the future. But equally they can be attributed to the inflexibility of being unable to change the program in the past. So here’s my tentative conclusion: transactional version consistency is a hankering, in long-running systems, for some of the semantic consistency of retroactive update. Their observation is that what we mean, in practice, by a “long-running system” is not that the code is long-running, but rather that the effects of code (changes to data, etc) are external and long-lived. That’s a good observation. But what it suggests to me is that we can use retroactive update after all, even in a long-running system, by just amending the semantic correctness criterion: requiring consistency with a from-scratch run of the transaction or session, rather than a from-scratch run of a closed program.

What’s the advantage of thinking in terms of retroactive update, rather than transactional version consistency? Well, suppose we want to push a change to pricing, purchasing terms or other business logic while a session is active. With DSU, even with transactional version consistency, there are many undesirable outcomes. The session may not see the update, because it was applied too late in its execution (it will complete under the old code). This is consistent, but it may not be what we wanted. Or, the update may be rejected entirely, because there was no consistent update point available. Or, we might have pushed the update through, but only by rejecting some user transactions.

With retroactive update, we can reflect changes to the session at a point that is convenient to the user, but preserving retroactive consistency. Retroactive update certainly presents challenges of its own, in particular efficient implementation with arbitrary code changes, but I think these are outweighed by the benefits. Finally, another reason to prefer retroactive update is that implementing it properly implies reification of the computational history. This means we can show how the computation changed, by making the delta visible to the user. This is the philosophy behind LambdaCalc. After all, the use case the DSU folk are explicitly considering is deploying a change to an application in the middle of a transaction. It’s not hard to think of situations where it would be important for the user to have visibility on that. We’re yanking them into a parallel universe, so we should at least show them what happened differently.