Skip to content

A bioinformatics tool that integrates gene expression data with metabolic models for pathway-level disease analysis. Extends Metabolitics for more comprehensive systems biology analysis.

License

Notifications You must be signed in to change notification settings

AmmarRashed/Genobolitics

Repository files navigation

Genobolitics

Genobolitics is a bioinformatics tool that extends Metabolitics to include gene expression data for pathway-level analysis of diseases. This project aims to bridge the gap between genome-scale modeling and pathway-level analysis by integrating metabolic data with gene expression data.

Metabolitics is described in the paper:

Cakmak A, Celik MH. Personalized Metabolic Analysis of Diseases. IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):1014-1025. doi: 10.1109/TCBB.2020.3008196. PubMed

Project Overview

Genobolitics uses gene expression data to predict reactions' flux distribution and analyze pathway-level alterations in diseased cells. The approach includes:

  • Flux Variability Analysis (FVA) with dynamically built linear programming models
  • Translation of boolean relations among genes to reaction coefficients
  • Calculation of Diff values at reaction and pathway levels
  • Machine Learning-based classification for disease association

The project has been tested with breast cancer and lung cancer datasets, achieving classification accuracy between 82-90%.

Repository Structure

.
├── classes/
│   ├── ParsingDataset.pyc
│   ├── geno_classifier.py       # Contains classification algorithms
│   ├── geno_utils.py            # Utility functions for gene processing
│   └── genobolitics.py          # Core implementation
├── dataset/                     # Contains the datasets used
├── notebooks/                   # Jupyter notebooks with analysis
├── results/                     # Analysis results
├── requirements.txt             # Project dependencies
└── final_presentation.pdf       # Project presentation

Installation

git clone https://github.com/yourusername/Genobolitics.git
cd Genobolitics
pip install -r requirements.txt

Dependencies

The project depends on several Python libraries including:

  • NumPy
  • Pandas
  • scikit-learn
  • COBRApy
  • JobLib
  • GEOparse
  • PyHGNC
  • Metabolitics

See requirements.txt for the complete list.

Features

  • Integration of gene expression data with metabolic models
  • Dynamic Linear Programming for flux analysis
  • Pathway-level differential analysis with Diff values
  • Machine learning classification for disease prediction
  • Hierarchical clustering for disease ontology construction
  • Statistical significance analysis with ANOVA

Project Context

This project was developed as a class project for the "Bioinformatics" course at Istanbul Şehir University, taught by Prof. Ali Çakmak in 2018.

Istanbul Şehir University Logo

Note: Istanbul Şehir University was placed under the control of Marmara University in December 2019 and later had its permit to operate revoked in June 2020, with students transferred to Marmara University. This project was completed in 2018, prior to these events.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Thanks to Prof. Ali Çakmak and Muhammed Hasan Çelik for their support throughout the research cycle.

How to Contribute

Contributions to this project are welcome. Please feel free to submit a pull request or open an issue for bugs, questions, or feature requests.

About

A bioinformatics tool that integrates gene expression data with metabolic models for pathway-level disease analysis. Extends Metabolitics for more comprehensive systems biology analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •