Home

Installing MerCat2

There are two different ways of installing MerCat2: Bioconda Installer and Source Installer

Note

Installing Conda should be done first before running any of these options on the terminal, download Conda to install the latest version to your device

Option 1: Bioconda Installer

Install mamba using conda

conda activate base
conda install mamba

Important

Make sure you install mamba in your base conda environment. We have found that mamba is faster than conda for installing packages and creating environments. Using conda might fail to resolve dependencies.

Install MerCat2

This step is done via Bioconda

mamba create -n mercat2 -c conda-forge -c bioconda mercat2
conda activate mercat2

Option 2: Source Installer

Clone mercat2 from github
Run install_mercat2.sh to install all required dependencies

This script creates a conda environment for you

git clone https://github.com/raw-lab/mercat2.git
cd mercat2
bash install_mercat2.sh
conda activate mercat2

Dependencies

MerCat2 runs on python version 3.9 and higher.

External dependencies

MerCat2 can run without external dependencies based on the options used.

Required dependencies:

When a raw read .fastq file is given
- Fastqc
- Fastp
For bacteria/archaea rich samples (-prod option)
- Prodigal
For eukaryote rich samples or general applications (-fgs option)
- FragGeneScanRs
Use this script for FragGeneScanRS

conda install -c bioconda fastqc fastp prodigal

Note

These are available through BioConda, except FragGeneScanRS, which is included in the MerCat2 distribution.

Usage

usage: mercat2.py [-h] [-i I [I ...]] [-f F] -k K [-n N] [-c C] [-prod] [-fgs] [-s S] [-o O] [-replace] [-lowmem LOWMEM] [-skipclean] [-toupper] [-pca] [--version]

Example: mercat2.py -h

Options:	Description
`-h, --help`	Shows this help message and exit
`-i I [I ...]`	Path to input file(s)
`-f F`	Path to folder containing input files
`-k K`	kmer length
`-n N`	No of cores [auto detect]
`-c C`	Minimum kmer count [10]
`-prod`	Run Prodigal on fasta files
`-fgs`	Run FragGeneScanRS on fasta files
`-s S`	Split into x MB files. [100]
`-o O`	Output folder, default = 'mercat_results' in current directory
`-replace`	Replace existing output directory [False]
`-lowmem LOWMEM`	Flag to use incremental PCA when low memory is available. [auto]
`-skipclean`	Skip trimming of fastq files
`-toupper`	Convert all input sequences to uppercase
`-pca`	Create interactive PCA plot of the samples (minimum of 4 fasta files required)
`--version, -v`	Show the version number and exit

Mercat2 assumes the input file format based on the extension provided

Raw fastq file: ['.fastq', '.fq']
Nucleotide fasta: ['.fa', '.fna', '.ffn', '.fasta']
Amino acid fasta: ['.faa']
It also accepts gzipped versions of these filetypes with the added '.gz' suffix

Usage examples

These are different ways MerCat2 can be implemented, running on samples based the following

Type	Script
Protein file (protein fasta - '.faa')	`mercat2.py -i file-name.faa -k 3 -c 10`
Nucleotide file (nucleotide fasta - '.fa', '.fna', '.ffn', '.fasta')	`mercat2.py -i file-name.fna -k 3 -n 8 -c 10`
Nucleotide file raw data (nucleotide fastq - '.fastq')	`mercat2.py -i file-name.fastq -k 3 -n 8 -c 10`
Many samples within a folder	`mercat2.py -f /path/to/input-folder -k 3 -n 8 -c 10`
Sample with prodigal option (raw reads or nucleotide contigs - '.fa', '.fna', '.ffn', '.fasta', '.fastq')	`mercat2.py -i /path/to/input-file -k 3 -n 8 -c 10 -prod`
Sample with FragGeneScanRS option (raw reads or nucleotide contigs - '.fa', '.fna', '.ffn', '.fasta', '.fastq')	`mercat2.py -i /path/to/input-file -k 3 -n 8 -c 10 -fgs`

Note

The prodigal and FragGeneScanRS options run the k-mer counter on both contigs and produced amino acids

Outputs

Results are stored in the output folder (default 'mercat_results' of the current working directory)

The 'report' folder contains an html report with interactive plotly figures
- If at least 4 samples are provided a PCA plot will be included in the html report
The 'tsv' folder contains counts tables in tab separated format
- If protein files are given, or the -prod option, a .tsv file is created for each sample containing k-mer count, pI, Molecular Weight, and Hydrophobicity metrics
- If nucleotide files are given a .tsv file is created for each sample containing k-mer count and GC content
If .fastq raw reads files are used, a 'clean' folder is created with the clean fasta file.
If the -prod option is used, a 'prodigal' folder is created with the amino acid .faa and .gff files
If the -fgs option is used, a 'fgs' folder is created with the amino acid .faa file

GitHub Logo

Diversity estimation

Alpha and Beta diversity metrics provided by MerCat2 are experimental. We are currently working on the robustness of these measures.

Alpha diversity metrics provided	Beta diversity metrics provided
shannon	euclidean
simpson	cityblock
simpson_e	braycurtis
goods_coverage	canberra
fisher_alpha	chebyshev
dominance	correlation
chao1	cosine
chao1_ci	dice
ace	hamming.
	jaccard
	mahalanobis
	manhattan (same as City Block in this case)
	matching
	minkowski
	rogerstanimoto
	russellrao
	seuclidean
	sokalmichener
	sokalsneath
	sqeuclidean
	yule

Notes on memory usage and speed

MerCat2 uses a substantial amount of memory when the k-mer is high
Running MerCat2 on a personal computer using a k-mer length of ~4 should be OK
Total memory usage can be reduced using the Chunker feature (-s option), but keep in mind that in testing when the chunk size is too small (1MB) some of the least significant k-mers will get lost.
This does not seem to affect the overall results, but it is something to keep in mind. Using the chunker and reducing the number of CPUs available (-noption) can help reduce memory requirements.

Note

The speed of MerCat2 can be increased when more memory or computer nodes are available on a cluster and using a chunk size of about 100Mb.

Copyright

This is copyrighted by University of North Carolina at Charlotte, Jose L. Figueroa III, Andrew Redinbo, and Richard Allen White III. All rights reserved. DeGenPrime is a bioinformatic tool that can be distributed freely for academic use only. Please contact us for commerical use. The software is provided “as is” and the copyright owners or contributors are not liable for any direct, indirect, incidental, special, or consequential damages including but not limited to, procurement of goods or services, loss of use, data or profits arising in any way out of the use of this software.

Citing Mercat

If you are publishing results obtained using MerCat2, please cite:
Figueroa JL*, Redinbo A*, Panyala A, Colby S, Friesen M, Tiemann L, White III RA. 2024.
MerCat2: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from omics data
Bioinformatics Advances, vbae061 Bioinformatics Advances
*Co-first authors

BioRxiv pre-print
Figueroa JL, Panyala A, Colby S, Friesen M, Tiemann L, White III RA. 2022.
MerCat2: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from omics data.
bioRxiv

CONTACT

Contents
Installing Mercat2
Running Mercat2
Using Mercat 2
Output Files
Citations and Copywrite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

Installing MerCat2

Option 1: Bioconda Installer

Option 2: Source Installer

Dependencies

External dependencies

Usage

Usage examples

Outputs

Diversity estimation

Notes on memory usage and speed

Copyright

Citing Mercat

CONTACT

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally