Skip to content

Commit 996b81b

Browse files
committed
GH copilot, fix my sh*t
1 parent 699fb18 commit 996b81b

File tree

1 file changed

+10
-11
lines changed

1 file changed

+10
-11
lines changed

docs/anglerfish-run.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# Run demultiplexing, the standard method
22

3-
This describes the the `anglerfish run` mode of running this tool, which expects an input samplesheet specifying the expected Illumina barcodes found in the sequenced ONT flowcell or barcode. Here we discuss some subjects in further details than the overview given in [Usage](#usage)
3+
This describes the `anglerfish run` mode of running this tool, which expects an input samplesheet specifying the expected Illumina barcodes found in the sequenced ONT flowcell or barcode. Here we discuss some subjects in further details than the overview given in [Usage](#usage)
44

55
## Use cases
66

77
The primary use of anglerfish would be to detect issues in Illumina sequencing pools using a method (ONT sequencing) independent from Illumina. Specific use cases:
88

99
- When samples are pooled evenly, detect outliers
10-
- Detect baroding issues or potential sample-mixups
10+
- Detect barcoding issues or potential sample-mixups
1111
- And downstream of anglerfish: identify samples by mapping
1212

1313
```{figure} figure1.png
@@ -22,7 +22,7 @@ Bottom row: Downstream uses cases of Anglerfish. Detecting library insert sizes,
2222

2323
## Output
2424

25-
Example of file output from `angerfish run` with a single setup pool (as opposed the a [complex](#mixed-setup-pools) one) and without specifying an `--out_fastq` option thus generating a default name for the output folder.
25+
Example of file output from `anglerfish run` with a single setup pool (as opposed to a [complex](#mixed-setup-pools) one). Without specifying an `--out_fastq` option, it generates a default name for the output folder.
2626

2727
```
2828
anglerfish_run_YYYY_MM_DD_HHMMSS
@@ -44,10 +44,10 @@ minimap2 - output as an alignment file to `index_len(indexlength).paf`.
4444
:width: 70%
4545
:align: center
4646
47-
Figure 2. And example of how read mapping in anglerfish works. Adapter templates for I7 and I5 map to the ends of the read "5b42", the Illumina barcodes are read from the "N" gap in the templates.
47+
Figure 2. An example of how read mapping in Anglerfish works. Adapter templates for I7 and I5 map to the ends of the read "5b42", the Illumina barcodes are read from the "N" gap in the templates.
4848
```
4949

50-
Anglerfish reports the stats of the run to a report called `anglerfish_stats.txt`, with the same number found in a machine readable JSON format (`anglerfish_stats.json`) Let's look at a few field from this report and number the lines:
50+
Anglerfish reports the stats of the run to a report called `anglerfish_stats.txt`, with the same number found in a machine readable JSON format (`anglerfish_stats.json`). Let's look at a few fields from this report and number the lines:
5151

5252
```
5353
01: Anglerfish v. 0.7.0 (run: anglerfish_2024_10_28_153312, 5c98ad62-784d-4b27-8dd4-a69bbfe553ac)
@@ -66,8 +66,7 @@ Anglerfish reports the stats of the run to a report called `anglerfish_stats.txt
6666
- 06: Reads matching the template (even partially) adapter1-insert-adapter2
6767
- 08-09: Any reads falling outside of the adapter1-insert-adapter2 expectation. One reason for this could be incomplete [splitting](https://web.archive.org/web/20250207143034/https://nanoporetech.com/document/kit-14-device-and-informatics#introduction-to-read-splitting) of chimeric reads by the sequencing software. Anglerfish will not resolve such reads, and these cases have not been studied by the anglerfish authors.
6868

69-
`anglerfish_dataframe.csv` is an attempt to summarize all index level stats (samplesheet samples and unknown indexes) into a
70-
single "flat" table.
69+
The `anglerfish_dataframe.csv` file summarizes all index level stats (samplesheet samples and unknown indexes) into a single "flat" table.
7170
And finally, the DNA inserts of each demultiplexed read will be output into fastq files according the samplesheet in `sample1.fastq.gz`, `sample2.fastq.gz`, etc.
7271

7372
## Mixed setup pools
@@ -86,7 +85,7 @@ single2,truseq,CTGACTGA,/path/to/ONTreads.fastq.gz
8685
single3,truseq,TCTCAGTG,/path/to/ONTreads.fastq.gz
8786
```
8887

89-
The these are handled are, for each adapter-type and index length combination present, seperate minimap runs and read clustering is performed, then the results are aggregated in the report.
88+
The way these are handled are, for each adapter-type and index length combination present, separate minimap runs and read clustering is performed, then the results are aggregated in the report.
9089
The path the fastq files supports glob'ing, e.g. you can specify multiple files like `/path/to/flowcell/fastq_passed/*.fastq.gz`
9190

9291
## Multiple ONT barcodes
@@ -112,10 +111,10 @@ single2,truseq,CTGACTGA,/path/to/20250207_1125_1F_NNN12345_fa78ca0f/barcode02/*.
112111

113112
## Unknown indexes
114113

115-
A list of indexes that do not match (within a set edit distance) of the indexes in the samplesheet will be listed in descending order at the bottom of the [report](#output).
116-
These unknown matches are not clustered by sequence, such that each read error will get its' own entry therefore the list is truncated at `# samples in samplesheet` + 10. The column `closest_match` lists the samples(s) which have the shortest edit distance to this sequence.
114+
Indexes that do not match (within a set edit distance) the indexes in the samplesheet will be listed in descending order at the bottom of the [report](#output).
115+
These unknown matches are not clustered by sequence, such that each read error will get its own entry therefore the list is truncated at `# samples in samplesheet` + 10. The column `closest_match` lists the samples(s) which have the shortest edit distance to this sequence.
117116

118-
The results might be distorted when the input fastq file(s) contain a [mixed adapter setup](#mixed-setup-pools). For the samplesheet given [above](#mixed-setup-pools) there unknown index list of both group of adaptor-index groups will be combined in the report. So the single-index group will contain hits to the "real", known dual index indices. E.g. "TAATGCGC" from "dual1" and "dual2" will be listed.
117+
The results might be distorted when the input fastq file(s) contain a [mixed adapter setup](#mixed-setup-pools). For the samplesheet given [above](#mixed-setup-pools), the unknown index list of both groups of adapter-index groups will be combined in the report. So the single-index group will contain hits to the "real", known dual index indices. E.g. "TAATGCGC" from "dual1" and "dual2" will be listed.
119118

120119
## Lenient mode
121120

0 commit comments

Comments
 (0)