W8 - Version Management
Author: R. Chan
Last updated
Author: R. Chan
Last updated
Using version management is standard practice in software development. Tools like git
have many functionalities – branching, merging, reverting, ... - that allow for effective collaboration through version control. In fact, many of us have previously written more or less exactly the following lines:
Version management has many benefits, such as easy collaboration, and tracing back or recovering older code versions, as shown in a typical development workflow illustrated below.
The benefits of version management are not only relevant to large-scale productionalized codebases, but also coding environments that work on smaller scales, such as the rapid and exploratory prototyping and data analysis frequently implemented in jupyter notebooks. However, managing Jupyter Notebook versions using standard versioning tools such as github is notoriously difficult and messy. This raises the following questions:
What messes or conflicts may arise during such prototyping? Which ones may be addressed by (automated) version management?
How can we adapt "traditional" version management to notebook development such that it addresses developers' needs?
This blog post mainly discusses the 2019 CHI paper Code Gathering: Managing messes in computational notebooks. [1], and its broader impact around version management for computational notebooks.
Developing code in jupyter notebooks is... a messy process. Getting to a final version may take many iterations of exploration, testing, and cleaning up during development. (An interesting related study in the reading list analyzes this process in more detail; Code code evolution).
During exploration, it is natural to prioritize discovery over writing high-quality code. However, when it comes to sharing or going back on old results, most analysts would want to first clean up their code to
keep track of old versions of notebook code, and
be able to have portable (minimal) snippets of code, which they can easily copy and re-use.
But why are these things even hard? To answer this, the authors first evaluate the issues that commonly arise when developing code for notebooks, which make the aforementioned goals generally difficult. Namely, they are:
disorder: because of the persistent jupyter kernel memory, code may be developed out of execution order. If the kernel is restarted and the code is re-run from the top (e.g., by a collaborator), it will raise an error.
dispersal: relevant code snippets may be spread across cells, and first need to be gathered to reproduce a cell output.
deletion: relevant code may be accidentally deleted and old code versions overwritten during analysis or cleanup. This is especially pertinent to notebooks, where, e.g., persistent kernel memory may hide accidental deletion.
Such messes occur in most notebooks. For instance, nearly half of public notebooks on GitHub have cell disorder, i.e., the cells were executed in a different order than they are listed.
Code Gathering tries to address the data analyst's requirements through automatic version management; by keeping track of the order the code cells were executed (= scanning the jupyter execution log). This allows for a straightforward implementation of two features:
the user can select a result (= any code output, e.g., a plot), and the Code Gathering tool generates a minimal snippet that reproduces the result if run top to bottom in the provided order.
the user may then see the result versions for code cells that were executed multiple times, e.g., when a user may test different parameters for the same model.
[Version Management in Jupyter Notebooks]: This work proposes one of the earliest approaches to "version management" in notebooks. Unlike in manual staging, like in more traditional git versioning, the developers' analysis progress is automatically tracked and used for both the minimal snippet as well as the result versions features. To this end, the authors drew much implementation inspiration from the related work around automated code refactoring and code walkthroughs in non-notebook coding environments and adapted it to the unique challenges arising from development in exploratory notebook development.
[Challenges in End-to-End Jupyter Notebook Version Management]: While this tool provides some foundational capabilities for version management for the rapid prototyping nature of juypter notebooks, it still leaves open questions about their end-to-end usage. For instance:
Although the minimal snippet feature is considered highly useful, the version management tool is less so, according to the user evaluation. This raises the questions: how can we design better features for notebook version exploration beyond simply listing the changed snippets of code? And further, how do we make notebook versioning scale to a more collaborative setting with multiple users or longer project edit histories?
Notebooks are still supposed to be literate programming tools, so how can we properly handle markdown during the automatic cleanup process? Such cells are executed differently to code cells, and are currently ignored by the Code Gathering tools.
Finally, how does this integrate with other version management tools, like git?
[Opportunities for Interactivity and Visualization for Versioning]: The highlighted limitations of the Code Gathering tools show the potential for follow-up work that helps users better navigate result versioning. One future avenue of work is improving notebook diff visualizations across text, code, and plots on different granularities to help a user better understand and aggregate changes for notebooks with large edit histories. For instance, Loops proposes such compact and detailed diff widgets that help a user get a fast overview of notebook versions. Further, providing navigational guidance between versions is highly relevant to the issue of scale, and could potentially also be supported through interactive features such as intelligent semantic version grouping and search, improving on the ones implemented by Verdant, a notebook provenance tool.
Code Gathering: Head, A., Hohman, F., Barik, T., Drucker, S. M., & DeLine, R. (2019, May). Managing messes in computational notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1-12).
Code code evolution: Raghunandan, D., Roy, A., Shi, S., Elmqvist, N., & Battle, L. (2023). Understanding how people change data science notebooks over time. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23) (Article 863, pp. 1–12). Association for Computing Machinery.
Rule, A. (2018). Design and Use of Computational Notebooks. UC San Diego. ProQuest ID: Rule_ucsd_0033D_17498. Merritt ID: ark:/13030/m5fn63pf.
Verdant: Kery, M. B., John, B. E., O'Flaherty, P., Horvath, A., & Myers, B. A. (2019, May). Towards effective foraging by data scientists to find past analysis choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1-13).
Loops: Eckelt, K., Gadhave, K., Lex, A., & Streit, M. (2024). Leveraging provenance and visualization to support exploratory data analysis in notebooks. IEEE Transactions on Visualization and Computer Graphics, 31(1), 1213–1223.