Monday, January 07, 2008

The art of the science of software

Gregor Kiczales gave one of the keynotes at last year's OOPSLA (slides and audio). The abstract was promising, and it sounds like the talk was well received. Nonetheless, I think Kiczales is off target. But as the talk has a number of important themes running through it, and the subject matter is extremely pertinent to my own research interests, I will attempt some course correction, rather than just pick holes. As with my last post, I intend these remarks as friendly constructive criticism. The talk was in the exploratory spirit, after all, and it deserves a response in the same spirit. Unfortunately the various strands of the talk are woven together in ways that are both confusing, and confused.

Pluralism in software, pluralism in science

Kiczales' central observation is this: software abstractions are transient beasts, subservient to our equally transient purposes. Abstractions in a sense represent points of view, and thus flex and mutate as we shift our perspective. Philosophers sometimes call this kind of "perspective-friendly" meta-perspective pluralism. Pluralism acknowledges that there can be a plurality of distinct descriptions of a system - perhaps with radically different ontologies - without requiring that at most one of these descriptions is "correct". Put like this, pluralism sounds like plain common sense. And it seems inevitable that one day our languages and tools will reflect this pluralism. What's less clear of course is how painful the journey will be.

Kiczales also makes the interesting point that the shift towards perspectivalism in software in many respects mirrors a similar shift in how we think about natural systems. (At one point he asks the audience, leadingly, whether anyone believes in "scientific objectivity".) Here it seems that Kiczales has been heavily influenced by Brian Smith's book On the Origin of Objects. Smith was in the office next door to Kiczales for several years, and some of his ideas seem to have cross-pollinated. I'll come back to Smith's book shortly. But first let's consider this analogy between software systems and natural systems. (I apologise for the brief philosophical digression, but I think it's one of the threads of the talk which hits on an important point.)

Twentieth-century philosophy of science, certainly since the decline of logical empiricism, was dominated by various flavours of a metaphysical position called realism. The basic supposition of realism is that there exist theory-independent facts about the macroscopic organisation of the physical world, and that these facts determine whether a given successful theory is really true, or "merely" capable of accounting for all the empirical data one could, in principle, obtain. A realist might, for example, claim that there is an objective fact about exactly how many hairs there are on my head, and that this fact exists independently of - is ontologically prior to - any particular theory of what hairs are, or indeed what heads are, or what it is for a hair to be attached to a head. Some such commitment to an objective, theory-independent, "natural kind" ontology - what Nelson Goodman famously called a "ready-made world" - is the cornerstone of any realist world-view.

What's wrong with the realist picture is that there is something smelly about the idea of a fact which is in principle beyond the reach of empirical science. Scientific theories typically "parse" low-level ontologies into higher-level ontologies; these macro-ontologies are really nothing more than patterns swirling in the low-level structure, and the "truth", or otherwise, of such theories is, in scientific terms, fully exhausted by the empirical success of that theory. From science's point of view, "molecules" are just patterns in the quantum-mechanical substrate (or whatever) which satisfy a certain behavioural or structural description, and the extent to which the theory of molecules is more (or less) objectively true than any other theory is just the extent to which that theory successfully (or unsuccessfully) systematises the phenomena. There are no further scientific facts - no "trans-empirical" facts - which determine which theory is "the one true theory".

I can't hope to have done justice to this topic in that one paragraph, even if I knew enough about the subject to give it proper treatment (and I don't). But my aim here is only to concede to Kiczales and Smith what I think is fair concession: that any plausible alternative to realism must be pluralistic. It must allow for there to be multiple descriptions of the same natural system - perhaps with radically differing ontologies - without imposing the requirement that at most one of them is "correct". A theory of quantum gravity, if we ever find one, will not reveal General Relativity to have been "false" - but mysteriously successful - all along. It will just be a better theory.

So let us grant the point that realism is a metaphysical red herring. And I think we can also agree that the analogy with what's wrong with our traditional conception of software is compelling. We tend to think of there as being a unique objective fact about what a piece of sofware "does" - a unique theory of its behaviour. Our awareness of the existence of some underlying source code tends to fuel this intuition. But really we need to be much more pluralistic, and accept that what a piece of software "does" inescapably depends on your point of view. A security engineer might have a completely different view of a system than an end user. Each end user probably has a different view than other users, in as much as she can't see what other users are doing. Reports or audits produced for management are really nothing more than abstractions of how the system behaves. Even a bug-fix, without too much of a stretch of the imagination, is just a view of an erroneous program that applies a correcting delta to its behaviour. And many of these views and perspectives aren't just design-time artifacts, but are live perspectives on a running program. This pluralistic way of thinking about software is even more dynamic and fluid than fluid AOP: let's call it superfluid AOP.

An exciting possibility, then, is that fixing our philosophy of the natural world and learning how to think properly about software might end up dovetailing rather nicely. God's not a mathematician, he's a programmer, right?

Deconvolve this

So far so good. But where Kiczales' talk goes awry is in its leap from pluralism, to the ushering in of a new era of "formality-free computing". In this fluffy new future, we will sit around engaging in social negotiation and situated action, interactions which will somehow manage not to be "formal all the way down". Unfortunately, these are just Smith's bad memes at play. I guess I can see how the kind of pluralism just discussed might be innocently mistaken for post-modernism of the sort offered by Smith in his cryptic book, but it's a serious mistake. All I know is that if there is a place for post-modernist, lit-crit, social constructivist thinking in the modern world, it's nowhere near the field of computing.

The following excerpt from the Amazon "review" of Smith's book (presumably written by his publisher) captures the sickly flavour of Smith's vision:
Critics of programming practice have compared it to alchemy and Smith recalls the characterisation of Newton as the last of the magicians. Is this a pre-Newtonian phase, lacking "Laws", awaiting the differential calculus? Another position is suggested:

"... that we are post-Newtonian, in the sense of being inappropriately wedded to a particular reductionist form of scientism, inapplicable to so rich an intentional phenomenon. Another generation of scientists may be the last thing we need. Maybe, instead, we need a new generation of magicians". [p362]

Magician? Magus? Seeking the secret of how it is we "deconvolve the deixis" - plus ça change, plus c'est la même chose. The Alchemist: not a charlatan, but one possessed of much empirical wisdom stumbling after the scheme of things; as this new Science of the Artificial must do, self constructed, self referential, post-post-modern, a metaphysics for the 21st century.
I'm sorry, what?? When exactly did Gary Gygax get together with Jacques Derrida? It's somewhere between uninformative and downright misleading to attach significance to the idea that software is intentional (in the philosophical sense originally popularised by Dennett, and somewhat misappropriated by Smith). We can, too, skip gaily past Smith's notions of registration and zest with no fear that we're missing any useful insights. Like it or not, the bedrock of computing is the scientific world view, and Smith's anti-scientistic stance and vaguely Continental-style philosophy are about as compatible with this world view as creationism. Indeed, with the situation arguably inverting - and computation gradually becoming the conceptual foundation upon which science is built - it is even more important that we keep computing free of this kind of pretentious twaddle. It matters too much.

And while it may be true that interfaces are, unsurprisingly, often socially negotiated, we must be careful what we infer from this. So is the spelling of identifiers, the pattern of whitespace in a source file, the arrangement of plant pots in an office, after all. What we must cleanly demarcate are the forces that define a particular technical problem, and any particular solution to that problem. The problem that Kiczales has quite rightly identified is just this: abstractions are essentially dynamic and context-sensitive. There is no unique "correct" ontology for any man-made system, any more than there is a unique correct theory of any natural system. And one of the key forces that happens to drive this dynamism and context-sensitivity - but only one among many - is social interaction. ("One man's constant is another man's variable", as Alan Perlis nicely put it.) But it is a mistake to think that any observations about computing as a social activity offer insight into potential solutions to this problem.

Formality all the way down

This leads us to the final Smithesque strand we need to extract from Kiczales' talk and lay to one side. We are all familiar with the observation that simple interactions between parts often give rise to "emergent" phenomena, behaviours that are somehow novel or surprising, such as the macroscopic behaviour of ant colonies or eBay shoppers, but which are not in any way mystical or magical. As Figure 1 attempts to show, emergent behaviours are in a sense dual to the requirements on a solution. Requirements are known and obligate the system in certain ways, whereas emergent behaviours ("emergents", one could call them) are those which are permitted by the system, but which were not known a priori.

Figure 1: Required behaviours vs. emergent behaviours


Emergence is an important topic. But again, we must be careful not to make the leap from the uncontroversial phenomenon of emergence, to the highly controversial idea that reality (and by analogy software) might not be "formal all the way down", as Kiczales, following Smith, suggests. Formal all the way down is exactly what reality is. What else could it possibly be?

Smith's new-age version of emergentism is just an invalid inference from the failure of the reductionist programme in science. In the 1960's, many scientists, as well as philosophers such as Ernest Nagel, were optimistic that we would eventually be able to deductively derive all of science from fundamental physics, by establishing the right "bridge laws" between theories. Half a century later, this optimism looks naive. There has been only limited success, for example, in deriving much of chemistry from quantum mechanics on a "first principles" basis.

But the failure of this kind of reductionist programme does not mean giving up on formalism. We simply need a more mature perspective on the relationship between two theories, perhaps seeing the relationship as closer to one of computational abstraction than one of deductive derivation. During the Q&A session after the talk, someone asked whether "biology rather than formalism" was a better model of software. Yet much of the recent success in biology has come from developing new technical perspectives and formalisms. Witness the fast-evolving population of process calculi, equipped with ambients, stochasticity, branes, and whatnot, which are fast becoming mainstream tools in systems biology. Biological reductionism may no longer be plausible, but there is no inference to the inadequacy of formalism.

Once we concede this, then as with social negotiation, we can see that emergence is only indirectly related to the technical problem of enabling "perspectival programming". We don't need to design for emergence; what we mean by "emergent" is, after all, just that which doesn't come built-in. There are no insights we can export from emergence itself to the foundations of computing. Emergence comes about from the way we use things, the way things contingently interact, not from the mechanisms of interaction per se, which can and indeed can only be strictly formal.

The technical challenge: a new paradigm for interactive computing

So at last, I think we can distill the central challenge lurking at the heart of Kiczales' talk. How do we expect to realise the task-centric, perspectival model of programming that we know is coming? If abstractions indeed need only exist only in the service of specific interactions the programmer or user has with the program, then in the future we may be abstracting and unabstracting as frequently as we switch between edit buffers today. In their various ways, systems like Mylyn, fluid AOP, and Subtext offer a glimpse of what this world might look like (although Subtext is the only one of these that offers a glimpse of just how fluid the new paradigm might be). But do we have the technical maturity to realise this superfluid, aspects-on-steroids vision?

I suspect that Kiczales would agree that the answer is no. We simply lack a compelling paradigm for building robust interactive systems. But contra Kiczales, and as I argued in my last post, working out this new paradigm will require us to to embrace the formal, not reject it. The answer is not going to be to make things less effective (in his semi-technical sense of the word - roughly synonymous with "executable", I think, or perhaps "live" or "connected") but precisely the opposite. My own suspicion is that to make this sort of fancy stuff work properly, we will ultimately need a paradigm where software components are intrinsically, persistently and bidirectionally connected, and where interactive computation is the automatic and incremental synchronisation of distributed state. I've talked about this before, and will hopefully have more to say about it in the future, but for now I only wish to suggest that this, or something like it, is where we should focus our attention. It's where the real challenge lies.

Conclusion: less pop, more sci

To sum up, I sincerely doubt that there is an impending "post-formalist" reconstruction of the foundations of computing. If we want things to be fluffy, they're damn well going to have to be fluffy in some kind of technical, mathematically robust sense, not in some...well, fluffy sense. We should embrace pluralism and perspectivalism, both in science and computing, but not at the price of sloppy pop sci or new-age philosophy. The member of the audience who wondered whether Kiczales' "radical thesis" had more in common with quantum mechanics than classical mechanics should be sent to bed without any tea.

What this kind of question, the popularity of Smith's book, and to a lesser degree, Kiczales' talk, ultimately brings home is perhaps this: that if it's socially negotiated artifacts we're after, we need look no further than the world of technical ideas, mediated by the situation action of conference attendance. Smart people often believe strange things; maybe that's the object lesson.

Labels: , , ,

Sunday, November 11, 2007

Why syntax won't go away

Some time ago Jonathan Edwards wrote:
Let’s face it: syntax is a simple and intuitive way to represent structure. But a fundamental principle of Subtext is to avoid syntactically encoding programs so that their semantics are explicit and immediate. It should be possible to use syntactic techniques as a visualization in the UI instead of as a representation of the source.
Jonathan's hope is that we will one day extricate ourselves from the muddy waters of syntax and float free in the semantic stratosphere. It's hard to argue with the spirit of this hope, and his recent achievement with Subtext 2.0 is an exciting glimpse of the kinds of virtual realities for programming we might inhabit in 5 to 10 years' time. (Let's recap...he did all that thinking about schematic tables on paper - ok, electronic paper - and then just went ahead and implemented it, only to have it actually work...I'm jealous.)

The demo itself is a nice reminder of how we are still in the pre-dawn of the computing age, fumbling forlornly in the dark, largely ignorant of our own ignorance. (Paraphrasing Jonathan's OOPSLA presentation: "In the beginning was the static void...") Subtext is far from a fumbling exercise, though; or at least it's clever fumbling, fumbling in the direction of the light switch. But Subtext is both nascent, and philosophically radical, and like most radical youth it's still a little confused: not all of its ideas quite make sense yet. My own research interests are largely compatible with Jonathan's, and I have a few thoughts on how Subtext might come of age...a little friendly constructive criticism (although I guess Jonathan would agree with much of what I have to say), but also some more general considerations for the long-term future of these sorts of highly visual, highly interactive languages.

So where is Jonathan trying to go with his idea of "explicit and immediate semantics"? Just what is a UI visualisation of a program, after all, if it is not a representation of "the source"? Conversely, what modern tools really use unstructured text - character sequences interspersed with line-terminators - as representations for programs? Compilers certainly don't, and nor do any non-trivial editors. Most work with richer representations. (How successfully they do so is another question. The fact that they are often flaky and unrobust is perhaps what feeds our intuition that they are still just working with flat files "under the hood".) Flat files endure as an interoperability format, but they are not how anything that does anything with programs represents programs. So the thought must be something deeper: that we should be able to somehow present the "semantics" of the program directly to the user - for example using direct links in place of textual names - reaping significant ergonomic benefits by so doing.

Appealing though this idea obviously is, it is problematic. The problem is not with the claim that there are significant ergonomic gains to be had from advances in UIs, but with the idea that non-textual UIs are doing something different - deeply different - from textual UIs. Because user interfaces are syntactic...and not just in some academic sense, but quite straightforwardly. They have a recursive, tree-like structure; they can be described inductively by a term algebra over a fixed set of constructors. They have semantically irrelevant flourishes, like rounded corners and drop shadows; like most textual syntaxes, in fact, with their curly braces and fancy keywords. UI's must have syntax because that's how they convey meaning: there's no way to represent a set or a function (qua mathematical object), or whatever it is you take your program to mean, "directly"; you have to denote it in some way. (And unless you're some kind of mathematical Platonist, perhaps there are only denotations. If you are you need help.)

I'm being pedantic, I admit, but I'm not just being pedantic. I think it's crucial that we start to treat fancy UIs as fancy syntaxes, for the following reason. As UIs become more powerful and interactive - and as VPLs become more compelling - they need to become better behaved. We need to stop thinking of our programming environments as (metaphorically, as well as often quite literally) little "windows" through which we squint awkwardly at the "real" program underneath. Instead our UIs need to become our programs, and our programs our UIs. This is exactly the vision Jonathan has for Subtext, of course. But it can only be practically achieved if we understand UIs as bona fide syntactic structures in their own right, so that we can treat interaction (by end-users or other software processes) as the creation and manipulation of views and transformations of that structure. Think of those cool projections that approximate "method views" in Subtext: if they are to be any use for building real-world programs (whose [partial] meaning we need to be able to grasp from a [partial] visualisation), then these views need to both have a well-defined structure - yes, a syntax - and a well-defined interpretation. To borrow from Jonathan again: notation matters.

So here's my main thesis: only if we take the syntactic nature of UIs seriously we can turn advanced browsing and editing features into the powerful multi-dimensional programming constructs that they should be. Such features become robust and well-defined transformations: sometimes preserving behaviour, sometimes changing behaviour in well-defined ways. In this hypothetical future, refactoring, browsing and AOP become closely interrelated: browsing becomes a kind of refactoring, where the "factors" are something like aspects. This may sound fanciful, but this is just what the Subtext 2.0 UI gives us a tantalising glimpse of. Refactoring is already an interactive activity, but is nothing like robust enough: for refactoring to come of age it needs to be as formally reliable as the kind of batch-mode transformations that a (well-behaved) optimising compiler might implement. The same applies to advanced browsing features, particularly if they support "putback" editing - otherwise we won't be able rely on these views or trust edits mediated by them. And conversely, traditional AOP is a robust batch-mode transformation, but lacks interactivity: and so for AOP to come of age, we need to turn aspect-weaving into a live, interactive browsing activity - a UI, in other words, that allows programs to be woven and unwoven while they are being edited [HoK05].

So VPLs have the potential to turn UIs, in an important and I hope non-contrived sense, into rich and full-featured languages for meta-programming. This is potentially in tension with the fact that, to be practical to design, describe, and implement, programming languages need to be simple, or at least modular. Are schematic tables part of a programming language or part of a UI? Think about all the advanced features that schematic tables provide: rendering of predicates in Blake canonical form, enforcement of partioning. Are these features part of the Subtext language or the Subtext UI? And if this distinction is never to be made, how is anyone else to implement Subtext? Must we require bona fide implementations to be isomorphic to the full Subtext UI? (Must it be possible to encode the entire state of the Subtext UI in the state of the candidate implementation, and vice versa?) If we agree this is almost certainly an unreasonable burden to place on implementors, then it is not clear where to draw the line. The engineering risk to which Subtext is thus exposed is non-modularity: neatly sidestepping all the horrible complexity of maintaining a live mapping to textual syntax, only to gradually mire itself in a new kind of complexity that arises from pushing all those fancy views and direct manipulation features into a single "language".

The way to avoid this potential tarpit is, I suggest, not to shun syntax, but to embrace it. Visual programming languages do not get rid of syntax; they enrich it and make it come alive. But to think of powerful UIs as syntaxes and interactions as syntactic transformations that satisfy certain semantic properties presents a significant architectural challenge. We need be able to layer views of top of views on top of views...so that it's views all the way down, until we reach a "primitive" view. (In MVC language, we could just as well call them models: the point is that "model" and "view" are, as with "syntax" and "UI", just relative notions.) The primitive view, or model, is simply the lowest level of syntax for the language, but it is not otherwise distinguished ontologically. These syntaxes are related by semantic functions (homomorphisms), and ideally those mappings would execute both automatically and incrementally, so that changes in one syntactic structure are interpreted as changes in any dependent syntactic structures. The biggest challenge facing this paradigm is perhaps that many of these semantic functions will need to be bidirectional - i.e. functional in both directions, like Pierce's lenses [FGMP05] - so that we can edit at our chosen perspective...and it is this that really blurs the distinction between syntax and user interface from the end-user's perspective, providing the rich, integrated experience required for a programming tool, without compromising the modularity of the language. In fact each tier, of a system so conceived, is a separate language and each homomorphism on languages is a denotational semantics which interprets the syntax of one language in the syntax of another. But it's syntax all the way down. These languages are live, like visual programmings languages, but they need not be visual; my preferred term is interactive programming language (IPL).

So do schematic tables belong in a programming language or a UI? The question turns out to be ill-posed, because these are really relative terms. A similar analysis can be applied to Subtext's proposed treatment of names (or lack thereof). Theoreticians are familiar with the formal redundancy of names (or at least of bound names): they prefer to work with terms "up to alpha conversion of bound variables", or to rely on encodings like de Bruijn indices to sidestep name-related issues. But for programming purposes, does life without names get easier or harder? There are no doubt ergonomic benefits to be had from using direct links instead of names. But there are almost certainly ergonomic benefits associated with names too. Imagine a Subtext-like language with only direct links equipped with comment-like annotations. For any non-trivial programming exercise it would probably not be long before we started embedding naming conventions into comments - in other words treating them as structured, rather than flat. We might for example adopt an informal rule whereby no two sibling program elements can have the same comment, so that we can unambiguously find an element by finding its parent and then searching the comments of its children. We might then want to be able to "repoint a link" by overtyping on the comment. At other times, we might expect to be able to maintain the identity of the referent and update the text of the comment instead.

I don't propose any resolution to the tension between names and links here, because I suspect the ultimate requirement is going to be to have characteristics of both. Methodologically, at any rate, it seems unwise to rule out forever the use of textual names on a priori grounds. So the risk for Subtext as it stands is that if its goal of emancipating us from the world of textual names turns out to be untenable for the kinds of reasons just mentioned, it may be forced to reintroduce names on the sly, by providing various naming-like features in its UI (and effectively embedding a complex language extension). It then faces the very "creeping complexity" dilemma that I suggested arises with schematic tables. On the one hand, if we are not careful to design the language to be layered and modular, implementations become unmanageably complex and lack clarity on exactly what they must provide. Yet if we accept the challenge of designing our language to be layered and modular, and propose language extensions to deal specifically with names - new syntaxes, plus semantics which map terms in our "nominal algebra" to direct links - then we still seem to be lacking a robust implementation paradigm. (And no, MVC doesn't touch the sides.)

As Alan Perlis said, "There's no such thing as a free variable." Or as the counterpart of Fred Brooks probably said in a nearby possible world, no syntactic silver bullets. Maybe we're forced to accept that there's a significant bootstrapping issue here: to build mature IPLs, we will need powerful IPLs. Closing the bootstrap gap is one of the focuses of my own research, so maybe one day Jonathan and I will meet in the middle.

From the sublimity of Subtext to...the Charles Petzoldiness of Charles Petzold. I have the disturbing (but in a good way) feeling that this guy is completely serious; as a precautionary measure, however, I shall be taking his observation with a suitably large pinch of monosodium glutamate. Clearly this dude is a comic genius, and the joke is on me for not spotting it. He writes:

Here’s another example of perhaps the most notorious syntactical absurdity in all of traditional C# — the for loop:

for (i=0, j=0; i < 1000; i++)
if (IsPrime(i))
j++;

In CSAML, that jumble of symbols and semicolons is abandoned for a structural beauty that can almost induce the modern programmer to weep with joy:

<ForLoop>
<ForLoop.Initializer>
<StatementExpressionList>
<Assignment LValue="i">
<Assignment.Expression>
<Literal Type="{x:Type Int32}"
Value="0" />
</Assignment.Expression>
</Assignment>
</StatementExpressionList>
</ForLoop.Initializer>
<ForLoop.Condition>
<BooleanExpression>
<LessThanExpression LeftSide ="i">
<LessThanExpression.RightSide>
<Literal Type="{x:Type Int32}"
Value="1000" />
</LessThanExpression.RightSide>
</LessThanExpression>
</BooleanExpression>
</ForLoop.Condition>
<ForLoop.Iterator>
<StatementExpressionList>
<PreIncrementExpression Identifier="i" />
</StatementExpressionList>
</ForLoop.Iterator>
<ForLoop.EmbeddedStatement>
<IfStatement>
<IfStatement.Condition>
<BooleanExpression>
<InvocationExpression MemberAccess="IsPrime">
<InvocationExpression.ArgumentList>
<Variable Identifier="i" />
</InvocationExpression.ArgumentList>
</InvocationExpression>
<BooleanExpression>
</IfStatement.Condition>
<IfStatement.EmbeddedStatement>
<StatementList>
<PreIncrementExpression Identifier="j" />
</StatementList>
</IfStatement.EmbeddedStatement>
</IfStatement>
</ForLoop.EmbeddedStatement>
</ForLoop>

I'm not sure that weeping with joy is quite the response the CSAML code would elicit (but I think I'm in agreement that it would involve some kind of bodily function).

Labels: ,

Saturday, March 10, 2007

Whither domain/j?

One or two people have wondered what happened to our self-styled refactoring tool to end all refactoring tools, domain/j. It no longer really exists; it is a mere flash demo of its former self. This demo and associated papers and abstracts is pretty much all that remains.

Labels: , ,

Friday, March 02, 2007

FInCo 2007

Here is the draft of a FInCo 2007 paper which elaborates (although still entirely informally) the programming model called declarative interaction which I mentioned in my first posting. FInCo is one of the satellite workshops of ETAPS 2007.

The review comments were useful, and reasonably positive. I'm looking forward to some interesting discussions. I conspicuously failed to give even a passing nod to agent-oriented programming; given the likely focus at the workshop, this is something I need to learn about.

Labels: , , , , , , ,