Skip to content

Commit 80df440

Browse files
authored
Merge pull request #259 from drpatelh/aspera
Add Aspera CLI download support to pipeline
2 parents 7018fc3 + e33eef4 commit 80df440

20 files changed

+293
-18
lines changed

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
3030
- [PR #253](https://github.com/nf-core/fetchngs/pull/253) - Add implicit tags in nf-test files for simpler testing strategy
3131
- [PR #257](https://github.com/nf-core/fetchngs/pull/257) - Template update for nf-core/tools v2.12
3232
- [PR #258](https://github.com/nf-core/fetchngs/pull/258) - Fixes for [PR #253](https://github.com/nf-core/fetchngs/pull/253)
33+
- [PR #259](https://github.com/nf-core/fetchngs/pull/259) - Add Aspera CLI download support to pipeline ([#68](https://github.com/nf-core/fetchngs/issues/68))
3334

3435
### Software dependencies
3536

@@ -43,6 +44,16 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
4344
>
4445
> **NB:** Dependency has been **removed** if new version information isn't present.
4546
47+
### Parameters
48+
49+
| Old parameter | New parameter |
50+
| ------------- | ---------------------- |
51+
| | `--force_ftp_download` |
52+
53+
> **NB:** Parameter has been **updated** if both old and new parameter information is present.
54+
> **NB:** Parameter has been **added** if just the new parameter information is present.
55+
> **NB:** Parameter has been **removed** if new parameter information isn't present.
56+
4657
## [[1.11.0](https://github.com/nf-core/fetchngs/releases/tag/1.11.0)] - 2023-10-18
4758

4859
### Credits

CITATIONS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010
1111
## Pipeline tools
1212

13+
- [Aspera CLI](https://github.com/IBM/aspera-cli)
14+
1315
- [Python](http://www.python.org)
1416

1517
- [Requests](https://docs.python-requests.org/)

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,10 @@ Via a single file of ids, provided one-per-line (see [example input file](https:
6666
1. Resolve database ids back to appropriate experiment-level ids and to be compatible with the [ENA API](https://ena-docs.readthedocs.io/en/latest/retrieval/programmatic-access.html)
6767
2. Fetch extensive id metadata via ENA API
6868
3. Download FastQ files:
69-
- If direct download links are available from the ENA API, fetch in parallel via `curl` and perform `md5sum` check
70-
- Otherwise use [`sra-tools`](https://github.com/ncbi/sra-tools) to download `.sra` files and convert them to FastQ
69+
- If direct download links are available from the ENA API:
70+
- Fetch in parallel via `aspera-cli` and perform `md5sum` check (default)
71+
- Fetch in parallel via `wget` and perform `md5sum` check. Use `--force_ftp_download` to force this behaviour.
72+
- Otherwise use [`sra-tools`](https://github.com/ncbi/sra-tools) to download `.sra` files and convert them to FastQ. Use `--force_sratools_download` to force this behaviour.
7173
4. Collate id metadata and paths to FastQ files in a single samplesheet
7274

7375
### Synapse ids

conf/test_full.config

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,6 @@ params {
1414
config_profile_name = 'Full test profile'
1515
config_profile_description = 'Full test dataset to check pipeline function'
1616

17-
// Input data for full size test
18-
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bb634bcfef520552e8314dfa3f8a764e1d62f7dc/testdata/v1.12.0/sra_ids_test_full.csv'
17+
// File containing SRA ids from nf-core/rnaseq -profile test_full for full-sized test
18+
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/100736c99d87667fb7c247c267bc8acfac647bed/testdata/v1.12.0/sra_ids_rnaseq_test_full.csv'
1919
}

docs/usage.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,33 @@ You can use the `--nf_core_pipeline` parameter to customise this behaviour e.g.
6666

6767
From v1.9 of this pipeline the default `strandedness` in the output samplesheet will be set to `auto` when using `--nf_core_pipeline rnaseq`. This will only work with v3.10 onwards of nf-core/rnaseq which permits the auto-detection of strandedness during the pipeline execution. You can change this behaviour with the `--nf_core_rnaseq_strandedness` parameter which is set to `auto` by default.
6868

69+
### Accessions with more than 2 FastQ files
70+
71+
Using `SRR9320616` as an example, if we run the pipeline with default options to download via Aspera/FTP the ENA API indicates that this sample is associated with a single FastQ file:
72+
73+
```
74+
run_accession experiment_accession sample_accession secondary_sample_accession study_accession secondary_study_accession submission_accession run_alias experiment_alias sample_alias study_alias library_layout library_selection library_source library_strategy library_name instrument_model instrument_platform base_count read_count tax_id scientific_name sample_title experiment_title study_title sample_description fastq_md5 fastq_bytes fastq_ftp fastq_galaxy fastq_aspera
75+
SRR9320616 SRX6088086 SAMN12086751 SRS4989433 PRJNA549480 SRP201778 SRA900583 GSM3895942_r1 GSM3895942 GSM3895942 GSE132901 PAIRED cDNA TRANSCRIPTOMIC RNA-Seq Illumina HiSeq 2500 ILLUMINA 11857688850 120996825 10090 Mus musculus Old 3 Kidney Illumina HiSeq 2500 sequencing: GSM3895942: Old 3 Kidney Mus musculus RNA-Seq A murine aging cell atlas reveals cell identity and tissue-specific trajectories of aging Old 3 Kidney 98c939bbae1a1fcf9624905516485b67 7763114613 ftp.sra.ebi.ac.uk/vol1/fastq/SRR932/006/SRR9320616/SRR9320616.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR932/006/SRR9320616/SRR9320616.fastq.gz fasp.sra.ebi.ac.uk:/vol1/fastq/SRR932/006/SRR9320616/SRR9320616.fastq.gz
76+
```
77+
78+
However, this sample actually has 2 additional FastQ files that are flagged as technical and can only be obtained by running sra-tools. This is particularly important for certain preps like 10x and others using UMI barcodes.
79+
80+
```
81+
$ fasterq-dump --threads 6 --split-files --include-technical SRR9320616 --outfile SRR9320616.fastq --progress
82+
83+
SRR9320616_1.fastq
84+
SRR9320616_2.fastq
85+
SRR9320616_3.fastq
86+
```
87+
88+
This highlights that there is a discrepancy between the read data hosted on the ENA API and what can actually be fetched from sra-tools, where the latter seems to be the source of truth. If you anticipate that you may have more than 2 FastQ files per sample, it is recommended to use this pipeline with the `--force_sratools_download` parameter.
89+
90+
See [issue #260](https://github.com/nf-core/fetchngs/issues/260) for more details.
91+
92+
### Bypass Aspera data download
93+
94+
If the appropriate download links are available, the pipeline uses the Aspera CLI by default to download FastQ files. If you are having issues and prefer to use FTP or sra-tools instead, you can use the [`--force_ftp_download`](https://nf-co.re/fetchngs/parameters#force_ftp_download) and [`--force_sratools_download`](https://nf-co.re/fetchngs/parameters#force_sratools_download) parameters, respectively.
95+
6996
### Bypass `FTP` data download
7097

7198
If FTP connections are blocked on your network use the [`--force_sratools_download`](https://nf-co.re/fetchngs/parameters#force_sratools_download) parameter to force the pipeline to download data using sra-tools instead of the ENA FTP.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
name: aspera_cli
2+
channels:
3+
- conda-forge
4+
- bioconda
5+
- defaults
6+
dependencies:
7+
- bioconda::aspera-cli=4.14.0

modules/local/aspera_cli/main.nf

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
process ASPERA_CLI {
2+
tag "$meta.id"
3+
label 'process_medium'
4+
5+
conda "${moduleDir}/environment.yml"
6+
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
7+
'https://depot.galaxyproject.org/singularity/aspera-cli:4.14.0--hdfd78af_1' :
8+
'biocontainers/aspera-cli:4.14.0--hdfd78af_1' }"
9+
10+
input:
11+
tuple val(meta), val(fastq)
12+
val user
13+
14+
output:
15+
tuple val(meta), path("*fastq.gz"), emit: fastq
16+
tuple val(meta), path("*md5") , emit: md5
17+
path "versions.yml" , emit: versions
18+
19+
script:
20+
def args = task.ext.args ?: ''
21+
if (meta.single_end) {
22+
"""
23+
ascp \\
24+
$args \\
25+
-i \$CONDA_PREFIX/etc/aspera/aspera_bypass_dsa.pem \\
26+
${user}@${fastq[0]} \\
27+
${meta.id}.fastq.gz
28+
29+
echo "${meta.md5_1} ${meta.id}.fastq.gz" > ${meta.id}.fastq.gz.md5
30+
md5sum -c ${meta.id}.fastq.gz.md5
31+
32+
cat <<-END_VERSIONS > versions.yml
33+
"${task.process}":
34+
aspera_cli: \$(ascli --version)
35+
END_VERSIONS
36+
"""
37+
} else {
38+
"""
39+
ascp \\
40+
$args \\
41+
-i \$CONDA_PREFIX/etc/aspera/aspera_bypass_dsa.pem \\
42+
${user}@${fastq[0]} \\
43+
${meta.id}_1.fastq.gz
44+
45+
echo "${meta.md5_1} ${meta.id}_1.fastq.gz" > ${meta.id}_1.fastq.gz.md5
46+
md5sum -c ${meta.id}_1.fastq.gz.md5
47+
48+
ascp \\
49+
$args \\
50+
-i \$CONDA_PREFIX/etc/aspera/aspera_bypass_dsa.pem \\
51+
${user}@${fastq[1]} \\
52+
${meta.id}_2.fastq.gz
53+
54+
echo "${meta.md5_2} ${meta.id}_2.fastq.gz" > ${meta.id}_2.fastq.gz.md5
55+
md5sum -c ${meta.id}_2.fastq.gz.md5
56+
57+
cat <<-END_VERSIONS > versions.yml
58+
"${task.process}":
59+
aspera_cli: \$(ascli --version)
60+
END_VERSIONS
61+
"""
62+
}
63+
}
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
process {
2+
withName: 'ASPERA_CLI' {
3+
ext.args = '-QT -l 300m -P33001'
4+
publishDir = [
5+
[
6+
path: { "${params.outdir}/fastq" },
7+
mode: params.publish_dir_mode,
8+
pattern: "*.fastq.gz"
9+
],
10+
[
11+
path: { "${params.outdir}/fastq/md5" },
12+
mode: params.publish_dir_mode,
13+
pattern: "*.md5"
14+
]
15+
]
16+
}
17+
}
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
nextflow_process {
2+
3+
name "Test process: ASPERA_CLI"
4+
script "../main.nf"
5+
process "ASPERA_CLI"
6+
7+
tag "ASPERA_CLI"
8+
9+
test("Should run without failures") {
10+
11+
when {
12+
params {
13+
outdir = "$outputDir"
14+
}
15+
16+
process {
17+
"""
18+
input[0] = [
19+
[ id:'SRX9626017_SRR13191702', single_end:false, md5_1: '89c5be920021a035084d8aeb74f32df7', md5_2: '56271be38a80db78ef3bdfc5d9909b98' ], // meta map
20+
[
21+
'fasp.sra.ebi.ac.uk:/vol1/fastq/SRR131/002/SRR13191702/SRR13191702_1.fastq.gz',
22+
'fasp.sra.ebi.ac.uk:/vol1/fastq/SRR131/002/SRR13191702/SRR13191702_2.fastq.gz'
23+
]
24+
]
25+
input[1] = 'era-fasp'
26+
"""
27+
}
28+
}
29+
30+
then {
31+
assertAll(
32+
{ assert process.success },
33+
{ assert snapshot(process.out).match() }
34+
)
35+
}
36+
}
37+
}
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
{
2+
"Should run without failures": {
3+
"content": [
4+
{
5+
"0": [
6+
[
7+
{
8+
"id": "SRX9626017_SRR13191702",
9+
"single_end": false,
10+
"md5_1": "89c5be920021a035084d8aeb74f32df7",
11+
"md5_2": "56271be38a80db78ef3bdfc5d9909b98"
12+
},
13+
[
14+
"SRX9626017_SRR13191702_1.fastq.gz:md5,baaaea61cba4294ec696fdfea1610848",
15+
"SRX9626017_SRR13191702_2.fastq.gz:md5,8e43ad99049fabb6526a4b846da01c32"
16+
]
17+
]
18+
],
19+
"1": [
20+
[
21+
{
22+
"id": "SRX9626017_SRR13191702",
23+
"single_end": false,
24+
"md5_1": "89c5be920021a035084d8aeb74f32df7",
25+
"md5_2": "56271be38a80db78ef3bdfc5d9909b98"
26+
},
27+
[
28+
"SRX9626017_SRR13191702_1.fastq.gz.md5:md5,055a6916ec9ee478e453d50651f87997",
29+
"SRX9626017_SRR13191702_2.fastq.gz.md5:md5,c30ac785f8d80ec563fabf604d8bf945"
30+
]
31+
]
32+
],
33+
"2": [
34+
"versions.yml:md5,a51a1dfc6308d71058ddc12c46101dd3"
35+
],
36+
"fastq": [
37+
[
38+
{
39+
"id": "SRX9626017_SRR13191702",
40+
"single_end": false,
41+
"md5_1": "89c5be920021a035084d8aeb74f32df7",
42+
"md5_2": "56271be38a80db78ef3bdfc5d9909b98"
43+
},
44+
[
45+
"SRX9626017_SRR13191702_1.fastq.gz:md5,baaaea61cba4294ec696fdfea1610848",
46+
"SRX9626017_SRR13191702_2.fastq.gz:md5,8e43ad99049fabb6526a4b846da01c32"
47+
]
48+
]
49+
],
50+
"md5": [
51+
[
52+
{
53+
"id": "SRX9626017_SRR13191702",
54+
"single_end": false,
55+
"md5_1": "89c5be920021a035084d8aeb74f32df7",
56+
"md5_2": "56271be38a80db78ef3bdfc5d9909b98"
57+
},
58+
[
59+
"SRX9626017_SRR13191702_1.fastq.gz.md5:md5,055a6916ec9ee478e453d50651f87997",
60+
"SRX9626017_SRR13191702_2.fastq.gz.md5:md5,c30ac785f8d80ec563fabf604d8bf945"
61+
]
62+
]
63+
],
64+
"versions": [
65+
"versions.yml:md5,a51a1dfc6308d71058ddc12c46101dd3"
66+
]
67+
}
68+
],
69+
"timestamp": "2024-01-29T13:00:29.847293"
70+
}
71+
}

nextflow.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ params {
1717
ena_metadata_fields = null
1818
sample_mapping_fields = 'experiment_accession,run_accession,sample_accession,experiment_alias,run_alias,sample_alias,experiment_title,sample_title,sample_description'
1919
synapse_config = null
20+
force_ftp_download = false
2021
force_sratools_download = false
2122
skip_fastq_download = false
2223
dbgap_key = null

nextflow_schema.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,12 @@
5454
"help_text": "The default is 'auto' which can be used with nf-core/rnaseq v3.10 onwards to auto-detect strandedness during the pipeline execution.",
5555
"default": "auto"
5656
},
57+
"force_ftp_download": {
58+
"type": "boolean",
59+
"fa_icon": "fas fa-tools",
60+
"description": "Force download FASTQ files via FTP instead of via the Aspera CLI.",
61+
"help_text": "If the Aspera CLI is not working on your infrastructure use this flag to force the pipeline to download data via FTP."
62+
},
5763
"force_sratools_download": {
5864
"type": "boolean",
5965
"fa_icon": "fas fa-tools",

0 commit comments

Comments
 (0)