|
| 1 | +## README |
| 2 | + |
| 3 | +Problem: |
| 4 | + |
| 5 | +> However, current predictors analyse variants as isolated events, which can |
| 6 | +lead to incorrect predictions when adjacent variants alter the same codon, or |
| 7 | +when a frame-shifting indel is followed by a frame-restoring indel. |
| 8 | + |
| 9 | +BCFtools/csq is a fast program for haplotype-aware consequence calling which |
| 10 | +can take into account known phase. |
| 11 | + |
| 12 | +There are several popular existing programs for variant annotation including: |
| 13 | + |
| 14 | +1. Ensembl Variant Effect Predictor (VEP) |
| 15 | +2. SnpEff |
| 16 | +3. ANNOVAR |
| 17 | + |
| 18 | +but they do not take phasing into account. |
| 19 | + |
| 20 | +## BCFtools/csq |
| 21 | + |
| 22 | +`bcftools csq` requires a phased VCF, a GFF3 file with gene predictions, and a |
| 23 | +reference FASTA file. |
| 24 | + |
| 25 | +``` |
| 26 | +About: Haplotype-aware consequence caller. |
| 27 | +Usage: bcftools csq [OPTIONS] in.vcf |
| 28 | +
|
| 29 | +Required options: |
| 30 | + -f, --fasta-ref FILE Reference file in fasta format |
| 31 | + -g, --gff-annot FILE GFF3 annotation file |
| 32 | +
|
| 33 | +CSQ options: |
| 34 | + -B, --trim-protein-seq INT Abbreviate protein-changing predictions to max INT aminoacids |
| 35 | + -c, --custom-tag STRING Use this tag instead of the default BCSQ |
| 36 | + -l, --local-csq Localized predictions, consider only one VCF record at a time |
| 37 | + -n, --ncsq INT Maximum number of per-haplotype consequences to consider for each site [15] |
| 38 | + -p, --phase a|m|r|R|s How to handle unphased heterozygous genotypes: [r] |
| 39 | + a: take GTs as is, create haplotypes regardless of phase (0/1 -> 0|1) |
| 40 | + m: merge *all* GTs into a single haplotype (0/1 -> 1, 1/2 -> 1) |
| 41 | + r: require phased GTs, throw an error on unphased het GTs |
| 42 | + R: create non-reference haplotypes if possible (0/1 -> 1|1, 1/2 -> 1|2) |
| 43 | + s: skip unphased hets |
| 44 | +Options: |
| 45 | + -e, --exclude EXPR Exclude sites for which the expression is true |
| 46 | + --force Run even if some sanity checks fail |
| 47 | + -i, --include EXPR Select sites for which the expression is true |
| 48 | + --no-version Do not append version and command line to the header |
| 49 | + -o, --output FILE Write output to a file [standard output] |
| 50 | + -O, --output-type b|u|z|v|t[0-9] b: compressed BCF, u: uncompressed BCF, z: compressed VCF |
| 51 | + v: uncompressed VCF, t: plain tab-delimited text output, 0-9: compression level [v] |
| 52 | + -r, --regions REGION Restrict to comma-separated list of regions |
| 53 | + -R, --regions-file FILE Restrict to regions listed in a file |
| 54 | + --regions-overlap 0|1|2 Include if POS in the region (0), record overlaps (1), variant overlaps (2) [1] |
| 55 | + -s, --samples -|LIST Samples to include or "-" to apply all variants and ignore samples |
| 56 | + -S, --samples-file FILE Samples to include |
| 57 | + -t, --targets REGION Similar to -r but streams rather than index-jumps |
| 58 | + -T, --targets-file FILE Similar to -R but streams rather than index-jumps |
| 59 | + --targets-overlap 0|1|2 Include if POS in the region (0), record overlaps (1), variant overlaps (2) [0] |
| 60 | + --threads INT Use multithreading with <int> worker threads [0] |
| 61 | + -v, --verbose INT Verbosity level 0-2 [1] |
| 62 | +
|
| 63 | +Example: |
| 64 | + bcftools csq -f hs37d5.fa -g Homo_sapiens.GRCh37.82.gff3.gz in.vcf |
| 65 | +
|
| 66 | + # GFF3 annotation files can be downloaded from Ensembl. e.g. for human: |
| 67 | + ftp://ftp.ensembl.org/pub/current_gff3/homo_sapiens/ |
| 68 | + ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/ |
| 69 | +``` |
| 70 | + |
| 71 | +The program begins by parsing gene predictions in the GFF3 file, then streams |
| 72 | +through the VCF file using a fast region lookup at each site to find overlaps |
| 73 | +with regions of supported genomic types (exons, CDS, UTRs or general |
| 74 | +transcripts). For more details read the paper (see [Further |
| 75 | +reading](#further-reading). |
| 76 | + |
| 77 | +## Further reading |
| 78 | + |
| 79 | +* [BCFtools/csq: haplotype-aware variant consequences |
| 80 | +](https://academic.oup.com/bioinformatics/article/33/13/2037/3000373) |
0 commit comments