Negative GQ values after vcf_to_hdf5

Hi, I was comparing some different VCF extraction packages and have been really impressed by the extraction speed in the h5 output of vcf_to_hdf5. However, I was doing some basic tests just to check outputs were the same from each method and have found some strange behaviour in the scikit-allel output.

Specifically I found that my GQ values >128 appear to have a negative offset...

Here is an example plotting GQ against sample DP and you can see the (orange) scikit-allel result has this strange offset:
![image](https://user-images.githubusercontent.com/30181254/93878709-6c634c00-fcd2-11ea-93b3-62937622ef79.png)

For example, row 265 of the VCF is:
_chr17	1005289	.	A	G	431133	PASS	AC=6;AF=0.823;AN=6;DP=109787;FS=0;MQ=249.96;MQRankSum=5.615;QD=3.94;ReadPosRankSum=3.074;SOR=0.7	GT:AD:AF:DP:GQ:FT:F1R2:F2R1:PL:GP	1/1:0,53:1:53:156:PASS:.:.:244,159,0:206,156,0 1/1:0,40:1:40:117:PASS:.:.:205,120,0:167.3,117.3,0	1/1:0,50:1:50:147:PASS:.:.:235,150,0:197.4,147.4,0_

You can see that the correct GQ for sample 1 is 156, but the value I get through scikit-allel is -100:
```
allel.vcf_to_hdf5('fpath.vcf.gz', 'fpath.h5', fields='*', overwrite=True)
callset = h5py.File(fpath.h5,mode='r')
GQ = callset['calldata/GQ']
GQ_0 = GQs[:,0]
print(GQ_0[265])

>>> -100
```
I found 271 cases where the GQ is offset and in each cases it is offset by either 28 (256) or 211 (2048) which can't be a coincidence.

Any idea what might be causing this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Negative GQ values after vcf_to_hdf5 #341

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Negative GQ values after vcf_to_hdf5 #341

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions