Skip to content

hisplan/seqc-docker

Repository files navigation

docker-seqc

Dockerized SEQC

Prerequisite

Docker

Install Docker version 2 (Engine version 18+). You need at least macOS Sierra 10.12 or newer macOS such as Mojave.

Python / Dependencies

Have Python 3 on your computer. Install yaml if you don't have it already:

pip install pyyaml

AWS Credentials

Skip if you're not going to have the SEQC to automatically launch the SEQC on an AWS EC2 instance. Otherwise, configure AWS credentials:

$ aws configure

Ensure the .aws directory (which contains your AWS credentials and configuration) is located at your home directory (e.g. /home/john/.aws)

Make sure your EC2 key pair file (*.pem) is NOT accessible by others. You can do this by running this command:

$ chmod 400 /path/my-key.pem

How to Install

Note that the steps described here are only tested on Mac.

Pull the docker image from Docker Hub:

docker pull quay.io/hisplan/seqc:0.2.11
$ docker images
REPOSITORY                            TAG                       IMAGE ID       CREATED         SIZE
quay.io/hisplan/seqc                  0.2.11                     604ba1ae0d17   3 minutes ago   2.75GB

Run the following commands from your Bash terminal:

aws s3 cp s3://dp-lab-home/software/install-seqc-0.2.11.sh - | bash

If you run tree, you should see something like this:

$ tree
.
├── config
│   └── jobs.template.yml
├── seqc-progress.sh
├── seqc-submit.sh
├── seqc_submit_mjobs.py
├── show-ami-list.sh
└── config.sh

How to Submit Multiple Jobs to AWS (Multiple Samples)

Input Configuration

Jump start by duplicating the template:

$ cp config/jobs.template.yml config/jobs.yml

Edit jobs.yml:

$ nano config/jobs.yml
jobs:
  - job: 1
    ami-id: ${PLACE_AMI_ID_HERE}
    platform: ten_x_v2
    user-tags:
      Job: 1
      Project: 10178
      Sample: DEV_IGO_00001
    index: s3://seqc-public/genomes/hg38_long_polya/
    barcode-files: s3://seqc-public/barcodes/ten_x_v2/flat/
    genomic-fastq: s3://seqc-public/test/ten_x_v2/genomic/
    barcode-fastq: s3://seqc-public/test/ten_x_v2/barcode/
    upload-prefix: s3://dp-lab-home/chunj/seqc-test/ten_x_v2/seqc-results/
    output-prefix: test1
    email: [email protected]
    star-args: "runRNGseed=0"
  - job: 2
    ami-id: ${PLACE_AMI_ID_HERE}
    platform: ten_x_v2
    user-tags:
      Job: 2
      Project: 10178
      Sample: DEV_IGO_00002
    index: s3://seqc-public/genomes/hg38_long_polya/
    barcode-files: s3://seqc-public/barcodes/ten_x_v2/flat/
    genomic-fastq: s3://seqc-public/test/ten_x_v2/genomic/
    barcode-fastq: s3://seqc-public/test/ten_x_v2/barcode/
    upload-prefix: s3://dp-lab-home/chunj/seqc-test/ten_x_v2/seqc-results/
    output-prefix: test2
    email: [email protected]
    star-args: "runRNGseed=0"

Note that you must specify which SEQC AMI (Amazon Machine Image) to use via ami-id. If you do not know the AMI ID, you can run show-ami-list.sh. The recommended AMI (as of Aug 26, 2021) is ami-0fa8f038a73ccd865.

$ ./show-ami-list.sh
[
    {
        "ID": "ami-02f92579154b6edf8",
        "Name": "seqc-v0.2.11_a1"
    },
    {
        "ID": "ami-0530a8e9d69e60500",
        "Name": "seqc-v0.2.4_a1"
    },
    {
        "ID": "ami-05fd54e8d80f2665f",
        "Name": "seqc-v0.2.3-alpha.5_a1"
    },
    {
        "ID": "ami-0a4d2955fe21dee72",
        "Name": "seqc-v0.2.5_a2"
    },
    {
        "ID": "ami-0c97def6c08694a9a",
        "Name": "seqc-v0.2.9_a1"
    },
    {
        "ID": "ami-0f7bddb56c574069c",
        "Name": "seqc-v0.2.7_a3"
    },
    {
        "ID": "ami-0fa8f038a73ccd865",
        "Name": "seqc-v0.2.10_a1"
    }
]

If you want to specify any of the SEQC parameters, you can add a new line to the job description using the same format. For example, to specify --min-poly-t=0 and --no-filter-low-coverage, add the following two lines:

min-poly-t: "0"
no-filter-low-coverage: ""

Job Submission

$ python seqc_submit_mjobs.py --help
usage: seqc_submit_mjobs.py [-h] --config PATH_YAML_INPUT --pem
                            PATH_EC2_KEYPAIR [--key-name EC2_KEYPAIR_NAME]
                            [--dry-run]

optional arguments:
  -h, --help            show this help message and exit
  --config PATH_YAML_INPUT, -c PATH_YAML_INPUT
                        path to jobs.yaml
  --pem PATH_EC2_KEYPAIR, -k PATH_EC2_KEYPAIR
                        path to AWS EC key pair file (*.pem)
  --key-name EC2_KEYPAIR_NAME, -n EC2_KEYPAIR_NAME
                        the name of your AWS EC2 key pair
  --dry-run             Dry run (i.e. don't actually submit the job)
$ python seqc_submit_mjobs.py \
    --pem ~/dpeerlab-chunj.pem \
    --config config/jobs.yml
2020-10-07 20:09:10,083 - INFO - Starting...
2020-10-07 20:09:10,086 - INFO - JOB NAME=PBMC1k-10x-v3, LOG FILE=./logs/PBMC1k-10x-v3.log
SEQC run ten_x_v3 \
  --ami-id ami-07ef40419e641a43c \
  --user-tags Job:PBMC1k-10x-v3,Project:v0.2.7,Sample:PBMC1k-10x-v3 \
  --index s3://seqc-public/genomes/hg38_long_polya/ \
  --barcode-files s3://seqc-public/barcodes/ten_x_v3/flat/ \
  --genomic-fastq s3://dp-lab-test/seqc/datasets/PBMC-1k-10x-v3/genomic/ \
  --barcode-fastq s3://dp-lab-test/seqc/datasets/PBMC-1k-10x-v3/barcode/ \
  --upload-prefix s3://dp-lab-test/seqc/datasets/PBMC-1k-10x-v3/seqc-results/ \
  --output-prefix v0.2.7 \
  --no-filter-low-coverage \
  --min-poly-t 0 \
  --email [email protected] \
  --star-args runRNGseed=0
2020-10-07 20:09:10,086 - INFO - Submitting a job...
2020-10-07 20:09:18,550 - INFO - Cleaning up the unused security groups:
2020-10-07 20:09:18,573 - INFO - SEQC: 2020-10-08 00:09:18: writing script to file:
2020-10-07 20:09:18,573 - INFO - #!/bin/bash -x
2020-10-07 20:09:18,573 - INFO -
2020-10-07 20:09:18,573 - INFO - SEQC run ten_x_v3 --ami-id ami-07ef40419e641a43c --user-tags Job:PBMC1k-10x-v3,Project:v0.2.7,Sample:PBMC1k-10x-v3 --index s3://seqc-public/genomes/hg38_long_polya/ --barcode-files s3://seqc-public/barcodes/ten_x_v3/flat/ --genomic-fastq s3://dp-lab-test/seqc/datasets/PBMC-1k-10x-v3/genomic/ --barcode-fastq s3://dp-lab-test/seqc/datasets/PBMC-1k-10x-v3/barcode/ --upload-prefix s3://dp-lab-test/seqc/datasets/PBMC-1k-10x-v3/seqc-results/ --output-prefix v0.2.7 --no-filter-low-coverage --min-poly-t 0 --email [email protected] --star-args runRNGseed=0 --local --terminate
2020-10-07 20:09:18,573 - INFO -
2020-10-07 20:09:18,752 - INFO - SEQC: 2020-10-08 00:09:18: Created new security group: sg-0e8190b78d1e639bf (name=SEQC-1941258).
2020-10-07 20:09:19,593 - INFO - SEQC: 2020-10-08 00:09:19: Enabled ssh access via port 22 for security group sg-0e8190b78d1e639bf
2020-10-07 20:09:21,255 - INFO - SEQC: 2020-10-08 00:09:21: Instance i-08a1eff31d49c1631 created, waiting until running
2020-10-07 20:09:36,528 - INFO - SEQC: 2020-10-08 00:09:36: Instance i-08a1eff31d49c1631 in running state
2020-10-07 20:09:36,715 - INFO - SEQC: 2020-10-08 00:09:36: Connecting to instance i-08a1eff31d49c1631 via ssh
2020-10-07 20:10:08,049 - INFO - SEQC: 2020-10-08 00:10:08: Formatting and mounting /dev/xvdf to /home/ec2-user
2020-10-07 20:10:10,329 - INFO - SEQC: 2020-10-08 00:10:10: Successfully mounted new volume onto /home/ec2-user.
2020-10-07 20:10:10,330 - INFO - SEQC: 2020-10-08 00:10:10: Setting aws credentials.
2020-10-07 20:10:32,778 - INFO - SEQC: 2020-10-08 00:10:32: SEQC setup complete.
2020-10-07 20:10:32,854 - INFO - SEQC: 2020-10-08 00:10:32: Instance login: ssh -i <path to your key file> [email protected]
2020-10-07 20:10:32,854 - INFO - SEQC: 2020-10-08 00:10:32: Connecting to instance i-08a1eff31d49c1631 via ssh
2020-10-07 20:10:33,871 - INFO -
2020-10-07 20:10:33,871 - INFO - DONE.

Single-Nucleus RNA Sequencing

Everything is the same except the following three lines in YAML:

For human (hg38):

  index: s3://seqc-public/genomes/hg38_long_polya_snRNAseq/
  filter-mode: snRNA-seq
  max-insert-size: 2304700

For mouse (mm38):

  index: s3://seqc-public/genomes/mm38_long_polya_snRNAseq/
  filter-mode: snRNA-seq
  max-insert-size: 4434881

How to Submit a Single Job to AWS (Single Sample)

$ ./seqc-submit.sh ~/dpeerlab-chunj.pem run ten_x_v2 \
    --index s3://seqc-public/genomes/hg38_long_polya/ \
    --barcode-files s3://seqc-public/barcodes/ten_x_v2/flat/ \
    --genomic-fastq s3://seqc-public/test/ten_x_v2/genomic/ \
    --barcode-fastq s3://seqc-public/test/ten_x_v2/barcode/ \
    --upload-prefix s3://dp-lab-home/chunj/seqc-test/ten_x_v2/seqc-results/ \
    --output-prefix test \
    --ami ami-05fd54e8d80f2665f \
    --email [email protected]

Checking Progress

Run the following command to see the log message in real time:

$ ./seqc-progress.sh ~/dpeerlab-chunj.pem i-0fbffa334be875092

If the instance has already been stopped/terminated, you will see:

socket.gaierror: [Errno -2] Name or service not known

If the instance is not fully up and running, you will see:

ChildProcessError: cat: ./seqc_log.txt: No such file or directory

Development

Building Container Image

./build.sh

Pushing to Docker Registry

Either you can use the docker push command or run push.sh (requires SCING):

./push.sh

Debugging through Console

Specify your own AWS EC2 keypair file for the -k parameter:

$ ./console.sh -k ~/dpeerlab-chunj.pem -d

Inside the container, run the following command to spawn a new EC2 instance:

$ SEQC start

Testing

In Drop v2

Inside the container

$ SEQC run in_drop_v2 \
    --index s3://seqc-public/genomes/hg38_chr19/ \
    --barcode-files s3://seqc-public/barcodes/in_drop_v2/flat/ \
    --genomic-fastq s3://dp-lab-home/chunj/seqc-test/in_drop_v2/genomic/ \
    --barcode-fastq s3://dp-lab-home/chunj/seqc-test/in_drop_v2/barcode/ \
    --upload-prefix s3://dp-lab-home/chunj/seqc-test/in_drop_v2/seqc-results/ \
    --output-prefix test \
    --email [email protected]

10x v2 Chemistry

$ SEQC run ten_x_v2 \
    --index s3://seqc-public/genomes/hg38_long_polya/ \
    --barcode-files s3://seqc-public/barcodes/ten_x_v2/flat/ \
    --genomic-fastq s3://seqc-public/test/ten_x_v2/genomic/ \
    --barcode-fastq s3://seqc-public/test/ten_x_v2/barcode/ \
    --upload-prefix s3://dp-lab-home/chunj/seqc-test/ten_x_v2/seqc-results/ \
    --output-prefix test \
    --email [email protected]

10x v3 Chemsitry

$ SEQC run ten_x_v3 \
    --index s3://seqc-public/genomes/hg38_long_polya/ \
    --barcode-files s3://seqc-public/barcodes/ten_x_v3/flat/ \
    --genomic-fastq s3://dp-lab-home/chunj/seqc-test/ten_x_v3/genomic/ \
    --barcode-fastq s3://dp-lab-home/chunj/seqc-test/ten_x_v3/barcode/ \
    --upload-prefix s3://dp-lab-home/chunj/seqc-test/ten_x_v3/seqc-results/ \
    --output-prefix test \
    --email [email protected]

Local Unit Testing

$ docker run \
    -it --rm \
    --mount source=~/.aws,target=/root/.aws,type=bind \
    --entrypoint bash \
    seqc

Once you're inside the container, run the following command:

$ nose2 seqc.test.TestSEQC.test_local