Skip to content

Commit 285fb28

Browse files
xuzhao9facebook-github-bot
authored andcommitted
Add A10g workflow to gain access to A10G GPU (#2558)
Summary: As the title says. To gain SSH access: 1. Add label `with-ssh` to the PR 2. Login as the instructions (Corp-VPN connection required) Pull Request resolved: #2558 Reviewed By: davidberard98 Differential Revision: D67227271 Pulled By: xuzhao9 fbshipit-source-id: d385ffbcf23580ca664451b0ecc31ec666664e7c
1 parent 6f191e9 commit 285fb28

File tree

2 files changed

+55
-0
lines changed

2 files changed

+55
-0
lines changed

.ci/torchbench/check-ssh.sh

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/bin/bash
2+
set -eou pipefail
3+
4+
echo "Holding runner for 2 hours until all ssh sessions have logged out"
5+
for _ in $(seq 1440); do
6+
# Break if no ssh session exists anymore
7+
if [ "$(who)" = "" ]; then
8+
break
9+
fi
10+
echo "."
11+
sleep 5
12+
done

.github/workflows/linux-test-a10g.yml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
name: TorchBench PR Test on A10G
2+
on:
3+
workflow_dispatch:
4+
pull_request:
5+
6+
jobs:
7+
linux-test-a10g:
8+
# Don't run on forked repos
9+
# Only run on PR labeled 'with-ssh'
10+
if: github.repository_owner == 'pytorch' && contains(github.event.pull_request.labels.*.name, 'with-ssh')
11+
runs-on: linux.g5.4xlarge.nvidia.gpu
12+
timeout-minutes: 240
13+
environment: docker-s3-upload
14+
env:
15+
CONDA_ENV: "pr-test-cuda"
16+
TEST_CONFIG: "cuda"
17+
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
18+
steps:
19+
- name: Checkout TorchBench
20+
uses: actions/checkout@v3
21+
- name: Setup SSH (Click me for login details)
22+
uses: pytorch/test-infra/.github/actions/setup-ssh@main
23+
with:
24+
github-secret: ${{ secrets.TORCHBENCH_ACCESS_TOKEN }}
25+
- name: Install Conda
26+
run: |
27+
bash ./.ci/torchbench/install-conda.sh
28+
- name: Install TorchBench
29+
run: |
30+
bash ./.ci/torchbench/install.sh
31+
- name: Wait for SSH session to end
32+
if: always()
33+
run: |
34+
bash ./.ci/torchbench/check-ssh.sh
35+
- name: Clean up Conda env
36+
if: always()
37+
run: |
38+
. ${HOME}/miniconda3/etc/profile.d/conda.sh
39+
conda remove -n "${CONDA_ENV}" --all
40+
41+
concurrency:
42+
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
43+
cancel-in-progress: true

0 commit comments

Comments
 (0)