Skip to content

Commit fa268a8

Browse files
authored
Merge pull request #26 from johandahlberg/validation_testing
Fixing minor things found when preparing for validation testing
2 parents 880a934 + d0c7a54 commit fa268a8

File tree

9 files changed

+131
-95
lines changed

9 files changed

+131
-95
lines changed

README.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,13 @@ checkQC
33
[![Build Status](https://travis-ci.org/Molmed/checkQC.svg?branch=master)](https://travis-ci.org/Molmed/checkQC)
44
[![codecov](https://codecov.io/gh/Molmed/checkQC/branch/master/graph/badge.svg)](https://codecov.io/gh/Molmed/checkQC)
55

6-
**NOTICE**<br>
7-
This is is pre-alpha stage software, it is not yet ready for any kind of real usage.
8-
Please return once we have a release. :D
6+
CheckQC is a program designed to check a set of quality criteria against an Illumina runfolder.
97

10-
`checkQC` is a program designed to check a set of quality criteria against an Illumina runfolder. It has been designed
11-
to be modular, and exactly which "qc handlers" are executed with which parameters for a specific run type (i.e. machine
8+
This is useful as part of a pipeline, where one needs to evaluate a set of quality criteria after demultiplexing. CheckQC is fast, and
9+
should finish a few seconds. It will warn if there are problems breaching warning criteria, and will emit a non-zero exit status if it finds
10+
any errors, thus making it easy to stop further processing if the run that is being evaluated needs troubleshooting.
11+
12+
CheckQC has been designed to be modular, and exactly which "qc handlers" are executed with which parameters for a specific run type (i.e. machine
1213
type and run length) is determined by a configuration file.
1314

1415
Instrument types supported in checkQC are the following:
@@ -19,31 +20,32 @@ Instrument types supported in checkQC are the following:
1920

2021
Install instructions
2122
--------------------
23+
Right now the Illumina Interop library needs to be installed separately before moving on to
24+
installing checkqc.
2225

23-
TODO: Note that this is still a work in progress description
2426
```
25-
pip install -f https://github.com/Illumina/interop/releases/latest interop
26-
pip install checkQC
27+
pip install -f https://github.com/Illumina/interop/releases/tag/v1.1.1 interop
28+
pip install checkqc
2729
```
2830

2931
Running checkQC
3032
---------------
3133

32-
After installing `checkQC` you can run it by specifying the path to the runfolder you want to
34+
After installing CheckQC you can run it by specifying the path to the runfolder you want to
3335
analyze like this:
3436

3537
```
3638
checkqc <RUNFOLDER>
3739
```
3840

39-
This will use the default configuration file packaged with `checkQC` if you want to specify
41+
This will use the default configuration file packaged with CheckQC if you want to specify
4042
your own custom file, you can do so by adding a path to the config like this:
4143

4244
```
4345
checkqc --config_file <path to your config> <RUNFOLDER>
4446
```
4547

46-
When `checkQC` starts and no path to the config file is specified it will give you
48+
When CheckQC starts and no path to the config file is specified it will give you
4749
the path to where the default file is located on your system, if you want a template
4850
that you can customize according to your own needs.
4951

@@ -52,7 +54,7 @@ Running in a Singularity container
5254
----------------------------------
5355

5456
[Singularity](http://singularity.lbl.gov/index.html) is a container system focusing on scientific use cases.
55-
`checkQC` can be run in a Singularity container by first creating a container using the following:
57+
CheckQC can be run in a Singularity container by first creating a container using the following:
5658

5759
```
5860
singularity create checkQC.img
@@ -69,10 +71,10 @@ singularity run checkQC.img tests/resources/MiSeqDemo/
6971
General architecture notes
7072
--------------------------
7173

72-
`checkQC` attempts to be as modular as possible, with respect to adding support for reading more file types (via
74+
CheckQC attempts to be as modular as possible, with respect to adding support for reading more file types (via
7375
the implementation of new parsers) and QC criteria (via the implementation of new handlers).
7476

75-
Once `checkQC` starts it will read the configuration file provided, and based on the run type of being analyzed, it will
77+
Once CheckQC starts it will read the configuration file provided, and based on the run type of being analyzed, it will
7678
determine which handlers should be run, and with which parameters. The handlers specify which parser they require
7779
and a single instance of each such parser will be instantiated (i.e. if multiple handlers use the same parser, there
7880
should still only be a single parser instance of that type present). The handlers are then subscribed to events

checkQC/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11

2-
__version__ = "0.1"
2+
__version__ = "1.0.0"

checkQC/default_config/config.yaml

Lines changed: 63 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,37 @@
11

2-
# First letter in instrument name is indicative of model
3-
# i.e. SN7001335 is a HiSeq3500 since its name begins with
4-
# SN
5-
instrument_type_mappings:
6-
M: miseq
7-
D: hiseq2500
8-
ST: hiseqx
9-
A: novaseq
102

11-
# Please note that intervals for read lengths are specified as: min < x <= max (i.e. upper inclusive, lower exclusive)
3+
# Usage instruction for config
4+
# -----------------------------
5+
# - Please note that intervals for read lengths are specified as: min < x <= max (i.e. upper inclusive, lower exclusive)
6+
# - All other intervals are exclusive.
7+
# - Values that are specified under each handler, e.g.
8+
#
9+
# - name: ClusterPFHandler
10+
# warning: 180 # Millons of clusters
11+
# error: unknown
12+
#
13+
# are specific to that partiular handler, but in general any value can be substituted with "unknown", in which case
14+
# this will not be evaluated.
15+
#
16+
# - Handlers specified under "default_handlers" will be run regardless of instrument type. For all other cases
17+
# it is possible to specify handlers per instrument and read length interval.
18+
#
19+
1220

1321
default_handlers:
1422
- name: UndeterminedPercentageHandler
1523
warning: 10
1624
error: 30
17-
# - name: PoolingEvennessHandler
18-
# warning: 50
19-
# error: unknown # Each sample should have >= x % of the clusters divded by the number of samples in the pool
2025

2126
hiseq2500_rapidhighoutput_v4:
2227
50-70:
2328
handlers:
2429
- name: ClusterPFHandler
25-
warning: 180
30+
warning: 180 # Millons of clusters
2631
error: unknown
2732
- name: Q30Handler
28-
warning: 7 # Give lowest nbr in Gbp
29-
error: unknown # Give lowest nbr in Gbp
33+
warning: 80 # Give percentage for reads greater than Q30
34+
error: unknown # Give percentage for reads greater than Q30
3035
- name: ErrorRateHandler
3136
warning: 1.5
3237
error: unknown
@@ -36,11 +41,11 @@ hiseq2500_rapidhighoutput_v4:
3641
100-110:
3742
handlers:
3843
- name: ClusterPFHandler
39-
warning: 180
44+
warning: 180 # Millons of clusters
4045
error: unknown
4146
- name: Q30Handler
42-
warning: 14 # Give lowest nbr in Gbp
43-
error: unknown # Give lowest nbr in Gbp
47+
warning: 80 # Give percentage for reads greater than Q30
48+
error: unknown # Give percentage for reads greater than Q30
4449
- name: ErrorRateHandler
4550
warning: 2
4651
error: unknown
@@ -50,11 +55,11 @@ hiseq2500_rapidhighoutput_v4:
5055
120-130:
5156
handlers:
5257
- name: ClusterPFHandler
53-
warning: 180
58+
warning: 180 # Millons of clusters
5459
error: unknown
5560
- name: Q30Handler
56-
warning: 18 # Give lowest nbr in Gbp
57-
error: unknown # Give lowest nbr in Gbp
61+
warning: 80 # Give percentage for reads greater than Q30
62+
error: unknown # Give percentage for reads greater than Q30
5863
- name: ErrorRateHandler
5964
warning: 2
6065
error: unknown
@@ -66,11 +71,11 @@ hiseq2500_rapidrun_v2:
6671
50-70:
6772
handlers:
6873
- name: ClusterPFHandler
69-
warning: 110
74+
warning: 110 # Millons of clusters
7075
error: unknown
7176
- name: Q30Handler
72-
warning: 4.4 # Give lowest nbr in Gbp
73-
error: uknown # Give lowest nbr in Gbp
77+
warning: 80 # Give percentage for reads greater than Q30
78+
error: unknown # Give percentage for reads greater than Q30
7479
- name: ErrorRateHandler
7580
warning: 1.5
7681
error: unknown
@@ -80,11 +85,11 @@ hiseq2500_rapidrun_v2:
8085
100-125:
8186
handlers:
8287
- name: ClusterPFHandler
83-
warning: 110
88+
warning: 110 # Millons of clusters
8489
error: unknown
8590
- name: Q30Handler
86-
warning: 8.8 # Give lowest nbr in Gbp
87-
error: unknown # Give lowest nbr in Gbp
91+
warning: 80 # Give percentage for reads greater than Q30
92+
error: unknown # Give percentage for reads greater than Q30
8893
- name: ErrorRateHandler
8994
warning: 2
9095
error: unknown
@@ -94,25 +99,25 @@ hiseq2500_rapidrun_v2:
9499
150-175:
95100
handlers:
96101
- name: ClusterPFHandler
97-
warning: 110
102+
warning: 110 # Millons of clusters
98103
error: unknown
99104
- name: Q30Handler
100-
warning: unknown # Give lowest nbr in Gbp
101-
error: 12.3 # Give lowest nbr in Gbp
105+
warning: unknown # Give percentage for reads greater than Q30
106+
error: 12.3 # Give percentage for reads greater than Q30
102107
- name: ErrorRateHandler
103108
warning: unknown
104-
error: 3
109+
error: 80
105110
- name: ReadsPerSampleHandler
106111
warning: 55 # 50 % of threshold for clusters pass filter
107112
error: unknown
108113
250-265:
109114
handlers:
110115
- name: ClusterPFHandler
111-
warning: 110
116+
warning: 110 # Millons of clusters
112117
error: unknown
113118
- name: Q30Handler
114-
warning: 20 # Give lowest nbr in Gbp
115-
error: unknown # Give lowest nbr in Gbp
119+
warning: 80 # Give percentage for reads greater than Q30
120+
error: unknown # Give percentage for reads greater than Q30
116121
- name: ErrorRateHandler
117122
warning: 5
118123
error: unknown
@@ -124,11 +129,11 @@ hiseqx_v2:
124129
150:
125130
handlers:
126131
- name: ClusterPFHandler
127-
warning: 400
132+
warning: 400 # Millons of clusters
128133
error: unknown
129134
- name: Q30Handler
130-
warning: 39 # Give lowest nbr in Gbp
131-
error: unknown # Give lowest nbr in Gbp
135+
warning: 80 # Give percentage for reads greater than Q30
136+
error: unknown # Give percentage for reads greater than Q30
132137
- name: ErrorRateHandler
133138
warning: 5
134139
error: unknown
@@ -142,24 +147,27 @@ novaseq_v1:
142147
150:
143148
handlers:
144149
- name: ClusterPFHandler
145-
warning: 400
150+
warning: 400 # Millons of clusters
146151
error: unknown
147152
- name: Q30Handler
148-
warning: 39 # Give lowest nbr in Gbp
149-
error: unknown # Give lowest nbr in Gbp
153+
warning: 80 # Give percentage for reads greater than Q30
154+
error: unknown # Give percentage for reads greater than Q30
150155
- name: ErrorRateHandler
151156
warning: 5
152157
error: unknown
158+
- name: ReadsPerSampleHandler
159+
warning: 200 # 50 % of threshold for clusters pass filter
160+
error: unknown
153161

154162
miseq_v2:
155163
25-50:
156164
handlers:
157165
- name: ClusterPFHandler
158-
warning: 10
166+
warning: 10 # Millons of clusters
159167
error: unknown
160168
- name: Q30Handler
161-
warning: 0.2 # Give lowest nbr in Gbp
162-
error: unknown # Give lowest nbr in Gbp
169+
warning: 80 # Give percentage for reads greater than Q30
170+
error: unknown # Give percentage for reads greater than Q30
163171
- name: ErrorRateHandler
164172
warning: 1
165173
error: unknown
@@ -169,11 +177,11 @@ miseq_v2:
169177
150:
170178
handlers:
171179
- name: ClusterPFHandler
172-
warning: 10
180+
warning: 10 # Millons of clusters
173181
error: unknown
174182
- name: Q30Handler
175-
warning: 1.1 # Give lowest nbr in Gbp
176-
error: unknown # Give lowest nbr in Gbp
183+
warning: 80 # Give percentage for reads greater than Q30
184+
error: unknown # Give percentage for reads greater than Q30
177185
- name: ErrorRateHandler
178186
warning: 2
179187
error: uknown
@@ -183,11 +191,11 @@ miseq_v2:
183191
250:
184192
handlers:
185193
- name: ClusterPFHandler
186-
warning: 10
194+
warning: 10 # Millons of clusters
187195
error: unknown
188196
- name: Q30Handler
189-
warning: 1.8 # Give lowest nbr in Gbp
190-
error: unknown # Give lowest nbr in Gbp
197+
warning: 80 # Give percentage for reads greater than Q30
198+
error: unknown # Give percentage for reads greater than Q30
191199
- name: ErrorRateHandler
192200
warning: 5
193201
error: unknown
@@ -199,11 +207,11 @@ miseq_v3:
199207
75:
200208
handlers:
201209
- name: ClusterPFHandler
202-
warning: 18
210+
warning: 18 # Millons of clusters
203211
error: unknown
204212
- name: Q30Handler
205-
warning: 1.1 # Give lowest nbr in Gbp
206-
error: unknown # Give lowest nbr in Gbp
213+
warning: 80 # Give percentage for reads greater than Q30
214+
error: unknown # Give percentage for reads greater than Q30
207215
- name: ErrorRateHandler
208216
warning: 1.5
209217
error: unknown
@@ -213,11 +221,11 @@ miseq_v3:
213221
300:
214222
handlers:
215223
- name: ClusterPFHandler
216-
warning: 18
224+
warning: 18 # Millons of clusters
217225
error: unknown
218226
- name: Q30Handler
219-
warning: 3.7 # Give lowest nbr in Gbp
220-
error: unknown # Give lowest nbr in Gbp
227+
warning: 80 # Give percentage for reads greater than Q30
228+
error: unknown # Give percentage for reads greater than Q30
221229
- name: ErrorRateHandler
222230
warning: 5
223231
error: unknown

checkQC/handlers/cluster_pf_handler.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,13 @@ def check_qc(self):
2424
lane_nbr = lane_dict["LaneNumber"]
2525
lane_pf = lane_dict["TotalClustersPF"]
2626

27-
if self.error() != self.UNKNOWN and lane_pf <= float(self.error())*pow(10, 6):
28-
yield QCErrorFatal("Clusters PF was to low on lane {}, it was: {}".format(lane_nbr, lane_pf),
27+
if self.error() != self.UNKNOWN and lane_pf < float(self.error())*pow(10, 6):
28+
yield QCErrorFatal("Clusters PF was to low on lane {}, "
29+
"it was: {:.2f} M".format(lane_nbr, lane_pf/pow(10, 6)),
2930
ordering=int(lane_nbr))
30-
elif self.warning() != self.UNKNOWN and lane_pf <= float(self.warning())*pow(10, 6):
31-
yield QCErrorWarning("Cluster PF was to low on lane {}, it was: {}".format(lane_nbr, lane_pf),
31+
elif self.warning() != self.UNKNOWN and lane_pf < float(self.warning())*pow(10, 6):
32+
yield QCErrorWarning("Cluster PF was to low on lane {}, "
33+
"it was: {:.2f} M".format(lane_nbr, lane_pf/pow(10, 6)),
3234
ordering=int(lane_nbr))
3335
else:
3436
continue

checkQC/handlers/q30_handler.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,14 @@ def check_qc(self):
2525
percent_q30 = error_dict["percent_q30"]
2626

2727
if self.error() != self.UNKNOWN and percent_q30 < self.error():
28-
yield QCErrorFatal("% Q30 {} was too low on lane: {} for read: {}".format(percent_q30,
29-
lane_nbr,
30-
read),
28+
yield QCErrorFatal("%Q30 {:.2f} was too low on lane: {} for read: {}".format(percent_q30,
29+
lane_nbr,
30+
read),
3131
ordering=int(lane_nbr))
3232
elif self.warning() != self.UNKNOWN and percent_q30 < self.warning():
33-
yield QCErrorWarning("% Q30 {} was too low on lane: {} for read: {}".format(percent_q30,
34-
lane_nbr,
35-
read),
33+
yield QCErrorWarning("%Q30 {:.2f} was too low on lane: {} for read: {}".format(percent_q30,
34+
lane_nbr,
35+
read),
3636
ordering=int(lane_nbr))
3737
else:
3838
continue

0 commit comments

Comments
 (0)