Skip to content

Commit 720c85b

Browse files
authored
Merge pull request #16 from uwescience/lesson_content
Split sections in jupyter book
2 parents 89ceacc + 334b961 commit 720c85b

12 files changed

+264
-266
lines changed

docs/_config.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Book settings
22
# Learn more at https://jupyterbook.org/customize/config.html
33

4-
title: My sample book
5-
author: The Jupyter Book Community
4+
title: GitHub Actions for Scientific Workflows (SciPy 2024)
5+
author: Valentina Staneva, George (Quinn) Brencher, Scott Henderson
66
logo: logo.png
77

88
# Force re-execution of notebooks on each build.

docs/_toc.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,11 @@
44
format: jb-book
55
root: intro
66
chapters:
7-
- file: lesson
7+
- file: getting-started
8+
- file: python-environment-workflow
9+
- file: scheduled-algorithm-deployment-workflow
10+
- file: caching
11+
- file: exporting-results
12+
- file: visualizing-results-webpage
13+
- file: ../glacier_image_correlation/README
14+
title: Batch Computing

docs/caching.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Caching
2+
3+
Dependency reinstalls between consecutive workflow runs are time consuming, and usually unnecessary. The process can be sped up by caching the builds of the packages. Caches are removed automatically if not accessed for 7 days, and their size can be up to 10GB. One can also manually remove a cache, if they want to reset the installation.
4+
5+
## Caching `pip` installs
6+
7+
`pip` packages can be cached by adding the `cache: 'pip'` setting to the Python setup action. If one is not using the default `requirements.txt` file for installation, they should also provide a `dependency-path`.
8+
9+
![alt text](https://raw.githubusercontent.com/uwescience/SciPy2024-GitHubActionsTutorial/main/img/pip-caching.png)
10+
11+
## Caching `conda` installs
12+
13+
Conda packages can be similarly cached withing the conda setup action.
14+
15+
![alt text](https://raw.githubusercontent.com/uwescience/SciPy2024-GitHubActionsTutorial/main/img/conda-caching.png)
16+
17+
## Caching `apt-get` installs
18+
19+
Packages such as `ffmpeg` can take long time to install. There is no official action to cache apt-get packages but they can be cached with the [walsh128/cache-apt-pkgs-action](https://github.com/marketplace/actions/cache-apt-packages).
20+
21+
```yaml
22+
- uses: walsh128/cache-apt-pkgs-action@latest
23+
with:
24+
packages: ffmpeg
25+
```
26+
27+
## Caching any data
28+
29+
The general [`cache`](https://github.com/marketplace/actions/cache) action allows to cache data at any path. Apart from builds of packages, one can use this option to not regenerate results while testing.
30+
31+
```yaml
32+
- uses: actions/cache@v4
33+
id: cache
34+
with:
35+
path: img/
36+
key: img
37+
38+
- name: Get all files
39+
if: steps.cache.outputs.cache-hit != 'true'
40+
run: …
41+
```
42+
43+
[Caching Documentation](https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows)
44+

docs/exporting-results.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Exporting Results
2+
3+
We will discuss several different ways to export results.
4+
5+
## Uploading to the GitHub Repository
6+
7+
One of the easiest ways to display results is to store them in the GitHub repository. This can be a quick solution, for example, to display a small plot or a table within the `Readme.md` of the repository and update it as the workflow is rerun. This is not a practical solution for big outputs as the GitHub repositories are recommended to not exceed more than 1GB, and all versions of the files will be preserved in the repository's history (thus slowing down cloning).
8+
9+
It is possible to execute all steps to add, commit, and push a file to GitHub, but there is already an [GitHub Auto Commit Action](https://github.com/marketplace/actions/git-auto-commit) to achieve that.
10+
11+
![alt text](https://raw.githubusercontent.com/uwescience/SciPy2024-GitHubActionsTutorial/main/img/auto-commit-action.png)
12+
13+
14+
## Uploading as a GitHub Workflow Artifact
15+
16+
GitHub provides an option for temporary storage of GitHub Action data as Workflow Artifacts. These are kept on the GitHub website as zipped files and can downloaded within 90 days for public repositories, or 400 days for private repositories.
17+
18+
There is a GitHub Action which can upload file/s as GitHub Artifacts.
19+
20+
![alt text](https://raw.githubusercontent.com/uwescience/SciPy2024-GitHubActionsTutorial/main/img/artifact-upload-action.png)
21+
22+
The artifact can be found by clicking on the workflow run and scrolling down to a section Artifacts.
23+
24+
![alt text](https://raw.githubusercontent.com/uwescience/SciPy2024-GitHubActionsTutorial/main/img/artifact_github_interface.png)
25+
26+
27+
The artifact can be downloaded directly from the interface but also can be downloaded through the GitHub client.
28+
29+
```
30+
gh run download
31+
```
32+
33+
The workflow run also provides a publicly available link to the download artifact:
34+
35+
Artifact download URL: [https://github.com/uwescience/SciPy2024-
36+
GitHubActionsTutorial/actions/runs/9591972369/artifacts/1619380017](https://github.com/uwescience/SciPy2024-
37+
GitHubActionsTutorial/actions/runs/9591972369/artifacts/1619380017)
38+
39+
There is a `download-artifact` action to download the artifacts and share between jobs within a workflow run (note this is limited to the inidividual workflow run, for downloading across runs use the other options).
40+
41+
[Here](Artifact download URL: https://github.com/uwescience/SciPy2024-
42+
GitHubActionsTutorial/actions/runs/9591972369/artifacts/1619380017) is more detailed documentation on GitHub Artifacts.
43+
44+
45+
46+
## Uploading to Personal Storage
47+
48+
A more long-term solution is to store outputs to personal storage. This could be for example Google Drive or a Cloud Provider Object Storage such as an AWS S3 bucket. To have a write access to these storage systems one will need to provide the credential information securely to GitHub Actions. This can be achieved through storing the credential information as Action Secrets.
49+
50+
The write operation can be performed directly from the Python code or from the GitHub Action configuration. Here will demonstrate how to upload data to Google Drive with `rclone`, a tool for transferring data between storage system which is quite provide agnostic.
51+
52+
The approach consists of a few steps:
53+
54+
1. use an `rclone` GitHub Action to avoid installing `rclone` manually
55+
* we will use [AnimMouse/setup-rclone](https://github.com/marketplace/actions/setup-rclone-action)
56+
* configure a Google Drive remote locally
57+
* encode the text in the config file and save it as a secret `RCLONE_CONFIG`
58+
* MacOX: `openssl base64 -in ~/.config/rclone/rclone_drive.conf`
59+
* run the `rclone` command to upload the plots to Google Drive
60+
* `rclone copy ambient_sound_analysis/img/broadband.png mydrive:rclone_uploads/`
61+
62+
63+
![alt txt](https://raw.githubusercontent.com/uwescience/SciPy2024-GitHubActionsTutorial/main/img/rclone_upload.png)
64+
65+
[Secrets Documentation](https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions)
66+
67+
68+

docs/getting-started.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Setup
2+
* Fork this repo
3+
* Enable Github Actions:
4+
* Settings -> Actions -> Allow actions and reusable workflows
5+
* [Managing Permissions
6+
Documentation](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/managing-github-actions-settings-for-a-repository)
7+
8+
9+
All workflow configurations are stored in the [`.github/workflows`](https://github.com/uwescience/SciPy2024-GitHubActionsTutorial/tree/main/.github/workflows) and will go through them in the following order:
10+
11+
1. [`python_env.yml`](https://github.com/uwescience/SciPy2024-GitHubActionsTutorial/blob/main/.github/workflows/python_env.yml)
12+
2. [`conda_env.yml`](https://github.com/uwescience/SciPy2024-GitHubActionsTutorial/blob/main/.github/workflows/conda_env.yml)
13+
3. [`noise_processing.yml`](https://github.com/uwescience/SciPy2024-GitHubActionsTutorial/blob/main/.github/workflows/noise_processing.yml)
14+
4. [`create_website_spectrogram.yml`](https://github.com/uwescience/SciPy2024-GitHubActionsTutorial/blob/main/.github/workflows/create_website_spectrogram.yml)
15+
5. [`create_website.yml`](https://github.com/uwescience/SciPy2024-GitHubActionsTutorial/blob/main/.github/workflows/create_website.yml)
16+
6. ...
17+
18+
19+
20+
21+
22+
23+
24+
25+
26+
27+
28+
29+
30+
31+
32+
33+
34+
35+
36+
37+
38+
39+
40+
41+
42+
43+
44+
45+
46+
47+
48+

docs/intro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Welcome to SciPy 2024 GitHub Actions for Scientific Workflows Tutorial
1+
# Welcome to GitHub Actions for Scientific Workflows
22

33
```{tableofcontents}
44
```

0 commit comments

Comments
 (0)