Skip to content

Split sections in jupyter book #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/_config.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Book settings
# Learn more at https://jupyterbook.org/customize/config.html

title: My sample book
author: The Jupyter Book Community
title: GitHub Actions for Scientific Workflows (SciPy 2024)
author: Valentina Staneva, George (Quinn) Brencher, Scott Henderson
logo: logo.png

# Force re-execution of notebooks on each build.
Expand Down
9 changes: 8 additions & 1 deletion docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,11 @@
format: jb-book
root: intro
chapters:
- file: lesson
- file: getting-started
- file: python-environment-workflow
- file: scheduled-algorithm-deployment-workflow
- file: caching
- file: exporting-results
- file: visualizing-results-webpage
- file: ../glacier_image_correlation/README
title: Batch Computing
44 changes: 44 additions & 0 deletions docs/caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Caching

Dependency reinstalls between consecutive workflow runs are time consuming, and usually unnecessary. The process can be sped up by caching the builds of the packages. Caches are removed automatically if not accessed for 7 days, and their size can be up to 10GB. One can also manually remove a cache, if they want to reset the installation.

## Caching `pip` installs

`pip` packages can be cached by adding the `cache: 'pip'` setting to the Python setup action. If one is not using the default `requirements.txt` file for installation, they should also provide a `dependency-path`.

![alt text](https://raw.githubusercontent.com/uwescience/SciPy2024-GitHubActionsTutorial/main/img/pip-caching.png)

## Caching `conda` installs

Conda packages can be similarly cached withing the conda setup action.

![alt text](https://raw.githubusercontent.com/uwescience/SciPy2024-GitHubActionsTutorial/main/img/conda-caching.png)

## Caching `apt-get` installs

Packages such as `ffmpeg` can take long time to install. There is no official action to cache apt-get packages but they can be cached with the [walsh128/cache-apt-pkgs-action](https://github.com/marketplace/actions/cache-apt-packages).

```yaml
- uses: walsh128/cache-apt-pkgs-action@latest
with:
packages: ffmpeg
```

## Caching any data

The general [`cache`](https://github.com/marketplace/actions/cache) action allows to cache data at any path. Apart from builds of packages, one can use this option to not regenerate results while testing.

```yaml
- uses: actions/cache@v4
id: cache
with:
path: img/
key: img

- name: Get all files
if: steps.cache.outputs.cache-hit != 'true'
run: …
```

[Caching Documentation](https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows)

68 changes: 68 additions & 0 deletions docs/exporting-results.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Exporting Results

We will discuss several different ways to export results.

## Uploading to the GitHub Repository

One of the easiest ways to display results is to store them in the GitHub repository. This can be a quick solution, for example, to display a small plot or a table within the `Readme.md` of the repository and update it as the workflow is rerun. This is not a practical solution for big outputs as the GitHub repositories are recommended to not exceed more than 1GB, and all versions of the files will be preserved in the repository's history (thus slowing down cloning).

It is possible to execute all steps to add, commit, and push a file to GitHub, but there is already an [GitHub Auto Commit Action](https://github.com/marketplace/actions/git-auto-commit) to achieve that.

![alt text](https://raw.githubusercontent.com/uwescience/SciPy2024-GitHubActionsTutorial/main/img/auto-commit-action.png)


## Uploading as a GitHub Workflow Artifact

GitHub provides an option for temporary storage of GitHub Action data as Workflow Artifacts. These are kept on the GitHub website as zipped files and can downloaded within 90 days for public repositories, or 400 days for private repositories.

There is a GitHub Action which can upload file/s as GitHub Artifacts.

![alt text](https://raw.githubusercontent.com/uwescience/SciPy2024-GitHubActionsTutorial/main/img/artifact-upload-action.png)

The artifact can be found by clicking on the workflow run and scrolling down to a section Artifacts.

![alt text](https://raw.githubusercontent.com/uwescience/SciPy2024-GitHubActionsTutorial/main/img/artifact_github_interface.png)


The artifact can be downloaded directly from the interface but also can be downloaded through the GitHub client.

```
gh run download
```

The workflow run also provides a publicly available link to the download artifact:

Artifact download URL: [https://github.com/uwescience/SciPy2024-
GitHubActionsTutorial/actions/runs/9591972369/artifacts/1619380017](https://github.com/uwescience/SciPy2024-
GitHubActionsTutorial/actions/runs/9591972369/artifacts/1619380017)

There is a `download-artifact` action to download the artifacts and share between jobs within a workflow run (note this is limited to the inidividual workflow run, for downloading across runs use the other options).

[Here](Artifact download URL: https://github.com/uwescience/SciPy2024-
GitHubActionsTutorial/actions/runs/9591972369/artifacts/1619380017) is more detailed documentation on GitHub Artifacts.



## Uploading to Personal Storage

A more long-term solution is to store outputs to personal storage. This could be for example Google Drive or a Cloud Provider Object Storage such as an AWS S3 bucket. To have a write access to these storage systems one will need to provide the credential information securely to GitHub Actions. This can be achieved through storing the credential information as Action Secrets.

The write operation can be performed directly from the Python code or from the GitHub Action configuration. Here will demonstrate how to upload data to Google Drive with `rclone`, a tool for transferring data between storage system which is quite provide agnostic.

The approach consists of a few steps:

1. use an `rclone` GitHub Action to avoid installing `rclone` manually
* we will use [AnimMouse/setup-rclone](https://github.com/marketplace/actions/setup-rclone-action)
* configure a Google Drive remote locally
* encode the text in the config file and save it as a secret `RCLONE_CONFIG`
* MacOX: `openssl base64 -in ~/.config/rclone/rclone_drive.conf`
* run the `rclone` command to upload the plots to Google Drive
* `rclone copy ambient_sound_analysis/img/broadband.png mydrive:rclone_uploads/`


![alt txt](https://raw.githubusercontent.com/uwescience/SciPy2024-GitHubActionsTutorial/main/img/rclone_upload.png)

[Secrets Documentation](https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions)



48 changes: 48 additions & 0 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Setup
* Fork this repo
* Enable Github Actions:
* Settings -> Actions -> Allow actions and reusable workflows
* [Managing Permissions
Documentation](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/managing-github-actions-settings-for-a-repository)


All workflow configurations are stored in the [`.github/workflows`](https://github.com/uwescience/SciPy2024-GitHubActionsTutorial/tree/main/.github/workflows) and will go through them in the following order:

1. [`python_env.yml`](https://github.com/uwescience/SciPy2024-GitHubActionsTutorial/blob/main/.github/workflows/python_env.yml)
2. [`conda_env.yml`](https://github.com/uwescience/SciPy2024-GitHubActionsTutorial/blob/main/.github/workflows/conda_env.yml)
3. [`noise_processing.yml`](https://github.com/uwescience/SciPy2024-GitHubActionsTutorial/blob/main/.github/workflows/noise_processing.yml)
4. [`create_website_spectrogram.yml`](https://github.com/uwescience/SciPy2024-GitHubActionsTutorial/blob/main/.github/workflows/create_website_spectrogram.yml)
5. [`create_website.yml`](https://github.com/uwescience/SciPy2024-GitHubActionsTutorial/blob/main/.github/workflows/create_website.yml)
6. ...
































2 changes: 1 addition & 1 deletion docs/intro.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Welcome to SciPy 2024 GitHub Actions for Scientific Workflows Tutorial
# Welcome to GitHub Actions for Scientific Workflows

```{tableofcontents}
```
Loading
Loading