Description
Current behavior:
Recently, I have been working on evaluating essential genes. I've found that there are issues with the current evaluation workflow (also in auto-tasks in github) in estimateEssentialGenes
.
ihuman = readYAMLmodel('model/Human-GEM.yml');
taskStruct = parseTaskList('data/metabolicTasks/metabolicTasks_Essential.txt');
[eGenes, INIT_output] = estimateEssentialGenes(ihuman, 'Hart2015_RNAseq.txt', taskStruct);
results = evaluateHart2015Essentiality(eGenes);
I found that the output context-specific models were very strange, with only a small amount of content as you can see below.
cell_type | DLD1 | GBM | HCT116 | HELA | RPE1 |
---|---|---|---|---|---|
genes | 475 | 475 | 475 | 475 | 475 |
rxns | 250 | 250 | 250 | 250 | 250 |
mets | 339 | 339 | 339 | 339 | 339 |
Further investigation revealed that the reason for this result is due to the fourth parameter useGeneSymbol
of the estimateEssentialGenes
function defaulting as true
, which then converts the genes in the template model into geneSymbol
format. However, in reality, the genes in the Hart2015_RNAseq.txt
data are in the 'ENSG0000' format, leading to no gene matches and thus no gene expression being detected by default.
So, I manually tried changing the fourth parameter to false, and while the content of the resulting model was much more normal.
cell_type | DLD1 | GBM | HCT116 | HELA | RPE1 |
---|---|---|---|---|---|
genes | 1734 | 1731 | 1772 | 1743 | 1669 |
rxns | 6870 | 6265 | 6888 | 6902 | 6097 |
mets | 5680 | 4986 | 5649 | 5665 | 4845 |
However, the result of essential gene evaluation turned out to be all zeros because the genes in in Hart2015_TableS2.xlsx
(Experimental result) are geneSymbol
format. So, I believe that after the model is generated, all genes in the model (include template model) need to be converted into GeneSymbol
format before performing the essential gene evaluation.