Roly Perera

transparency, explorability and explainability in data science and data visualisation

My interest is in new programming language foundations for transparent and explorable digital media. A scientific paper should allow a reviewer to drill into a bar chart to see the underlying distribution, tweak model parameters, or remove an outlier from the dataset to see how the results would be affected, all within the context of the paper.

Modern interactive notebooks for data science, such as Jupyter, provide a certain level of transparency by capturing the steps of a workflow and showing intermediate results such as statistical data or charts. However the steps themselves remain black boxes: if I want to understand how a complex chart is obtained, there is no easy way to break that step into substeps or to investigate how various aspects of the chart depend on specific bits of data or code.

My research focuses on tools and techniques that expose, to readers as well as authors, this fine-grained structure. This will make it possible to create digital media which reveal how parts of visualisations relate to data and code. For example, removing a data point or changing visualisation logic will automatically update only the affected parts of a dependent visualisation and moreover make it visually apparent which parts changed. Both authors and readers will be able to conduct “experiments” — to formulate hypotheses about the fine-grained relationships between code and data which they can test by changing things and observing the consequences.