|
1 |
| -# Welcome to GitHub Actions for Scientific Workflows |
| 1 | +# Welcome to GitHub Actions for Scientific Data Workflows |
2 | 2 |
|
| 3 | + |
| 4 | +Tutorial presented at [SciPy 2024 Conference](https://www.scipy2024.scipy.org/) |
| 5 | + |
| 6 | +Authors: Valentina Staneva, Quinn Brencher, Scott Henderson |
| 7 | + |
| 8 | +## Abstract |
| 9 | + |
| 10 | +In this tutorial we will introduce GitHub Actions to scientists as a tool for lightweight automation of scientific data workflows. We will |
| 11 | +demonstrate that GitHub Actions are not just a tool for software testing, but can be used in various ways to improve the reproducibility |
| 12 | +and impact of scientific analysis. Through a sequence of examples, we will demonstrate some of GitHub Actions' applications to scientific |
| 13 | +workflows, such as scheduled deployment of algorithms to sensor streams, updating visualizations based on new data, processing large |
| 14 | +datasets, model versioning and performance benchmarking. GitHub Actions can particularly empower Python scientific programmers who are not |
| 15 | +willing to build fully-fledged applications or set up complex computational infrastructure, but would like to increase the impact of their |
| 16 | +work. The goal is that participants will leave with their own ideas of how to integrate Github Actions in their own work. |
| 17 | + |
| 18 | +## Description |
| 19 | + |
| 20 | +GitHub Actions are quite popular within the software engineering community, but a scientific Python programmer may not have seen their use |
| 21 | +beyond a continuous integration framework for unit testing. We would like to increase their visibility through a scientific workflow lens. |
| 22 | +We will use examples that are relevant to the community: wrangling a messy realtime hydrophone data stream to display noise sounds from the |
| 23 | +Puget Sound (not far from the conference venue!) or processing hundreds of satellite radar images over glacial lakes in High-Mountain Asia |
| 24 | +to study flood hazards. We assume no knowledge on GitHub Actions and will start slowly with a “Hello World” step, but build quickly to |
| 25 | +create complex and exciting workflows. We will also showcase their value for scientific collaborations across institutions as a means to |
| 26 | +share reproducible workflows and computing infrastructure. |
| 27 | + |
| 28 | +## Prerequisites |
| 29 | +GitHub account, familiarity with git (commits, versioning), GitHub (push, pull requests), and Python (conda, scipy, matplotlib), some maturity in manipulating scientific data and |
| 30 | +exposure to the challenges associated with it, ability to read code (our examples may use libraries not familiar to the audience, but the |
| 31 | +focus will be on the steps these libraries accomplish rather than the details) |
| 32 | + |
| 33 | +## Installation Instructions |
| 34 | +Participants can make edits from the GitHub interface, but if they are willing to make updates locally, they need to have a functioning git |
| 35 | +([set up instructions](https://swcarpentry.github.io/git-novice/#installing-git)) |
| 36 | + |
| 37 | +## Outline |
| 38 | + |
| 39 | +### Short Version |
3 | 40 | ```{tableofcontents}
|
4 | 41 | ```
|
| 42 | + |
| 43 | +### Long Version (with approximate schedule) |
| 44 | +* Overview of GitHub Actions and Workflows and their popular uses in Python software development (examples of testing, listing, |
| 45 | +packaging)(20 min) |
| 46 | + * We will explain the main components of GitHub Actions and associated terminology |
| 47 | + * We will summarize their typical uses in software development |
| 48 | + * We will point to popular GitHub Actions used in Python software development and packaging (the focus of this tutorial will not be |
| 49 | +on them but rather on scientific pipelines) |
| 50 | + |
| 51 | +* Setting up your first workflow: a scientific Python environment (20 min) |
| 52 | + * participants will update a workflow `.yml` file to create an environment with their favorite Python libraries |
| 53 | + * participants will inspect the github interface to see the workflow runs |
| 54 | + |
| 55 | +* Scheduled algorithm deployment to a realtime stream (30 min) |
| 56 | + * we will deploy a typical scientific workflow: reading data, converting to a new format, and making a visualization |
| 57 | + * participants will update the deployment schedule to trigger a new workflow and will monitor the progress in the GitHub interface |
| 58 | + |
| 59 | +* Break (15 min) |
| 60 | + |
| 61 | +* Exporting results (30 min) |
| 62 | + * participants will learn about various ways to store the results: |
| 63 | + * caching |
| 64 | + * committing to GitHub |
| 65 | + * creating GitHub artifacts |
| 66 | + * storing to personal storage |
| 67 | + * they will modify the code to make a new plot which will be automatically updated |
| 68 | + * they will use either matplotlib or an interactive library such as plotly |
| 69 | + |
| 70 | +* Update results on a webpage (30 min) |
| 71 | + * we will overview different ways to display scientific results on a webpage |
| 72 | + * we will demonstrate the workflow to deploy the webpage |
| 73 | + * participants will rerender the webpage based on the updates in GitHub |
| 74 | + |
| 75 | +* Large-scale data processing (45 min) |
| 76 | + * we will demonstrate a use-case of processing large data sets with GitHub Actions |
| 77 | + * participants will fiddle with problem size to understand the power and limits of the computational infrastructure |
| 78 | + * we will discuss connections to cluster/cloud computing |
| 79 | + |
| 80 | +* Break (10 min) |
| 81 | + |
| 82 | +* Model Versioning and Benchmarking (20 min) |
| 83 | + * we will introduce how to leverage GitHub’s version control to version different models and performance |
| 84 | + * participants can contribute a new model and check its performance |
| 85 | + * we will discuss how this can be used as a community network to share methods and results |
| 86 | + |
| 87 | +* Recap and Discussion (or buffer time) (20 min) |
| 88 | + * we will have a discussion on potential uses of GitHub Actions within the work of the participants |
| 89 | + |
| 90 | + |
| 91 | +# References |
| 92 | +* [*GitHub Actions for Scientific Data Workflows*](https://github.com/valentina-s/GithubActionsTutorial-USRSE23), Valentina Staneva, |
| 93 | +[US-RSE 2023 Tutorial](https://us-rse.org/usrse23/program/tutorials/) |
| 94 | +* [*Characterizing glacial lake outburst flood hazard at regional scale using fused InSAR-speckle tracking surface displacement time |
| 95 | +series*](https://escience.washington.edu/2024-incubator-projects/), Quinn Brencher and Scott Henderson, eScience Institute Data Incubator |
| 96 | +Project, 2024, [[repo](https://github.com/relativeorbit/actions-batch-demo)] |
| 97 | +* [*GitHub Actions Workflows for Scheduled Algorithm |
| 98 | +Deployment*](https://summerofcode.withgoogle.com/archive/2021/projects/5026942771789824), Dmitry Volodin, Jesse Lopez, Scott Veirs, Val |
| 99 | +Veirs, Valentina Staneva, Orcasound Google Summer Of Code 2021 Project, [[repo]](https://github.com/orcasound/orca-action-workflow) |
| 100 | +* [*GitHub Actions Documentation*](https://docs.github.com/en/actions/learn-github-actions) |
| 101 | + |
| 102 | + |
| 103 | + |
0 commit comments