GitHub - korem-lab/ICU_Analysis: This repository contains code for a study evaluating whether gut microbiome profiles can improve prediction of infection and mortality in critically ill ICU patients. Data sourced with the help of Dr. Daniel Freedberg at CUMC

Project Overview

This repository contains code and results for the analysis of gut microbiome profiles and their ability to predict ICU outcomes, including infection and mortality. The analysis includes two cohorts of ICU patients profiled with 16S rRNA sequencing, combined with clinical metadata.

Summary of Findings

Microbiome-based models did not consistently improve prediction of ICU-acquired infection or mortality compared to standard clinical scores (e.g., SOFA). External validation revealed limited generalizability, with predictive performance varying by outcome, cohort, and time horizon. These results suggest that, in their current form, gut microbiome features offer limited incremental value for clinical risk prediction in the ICU.

Reproduction Steps

Access Raw Data

Original cohort: /manitou/pmg/projects/korem_lab/Data/Freedberg_inulin_trial/
Validation cohort: /manitou/pmg/projects/korem_lab/Data/Freedberg_inulin_trial/validation_data/

Preprocess 16S Sequencing Data (on manitou)

Follow the first two steps from the pipeline in /burg/pmg/users/se2481/scripts/16S_pipeline/README.md:

(a) Human genome filtering

Remove human reads with the MMMBP pipeline:

pybatch run_mmmbp.py

Output in tmp/HGF2 contains filtered FASTQs and a df_path table of human-read counts.

(b) Primer trimming

conda activate shared
python /burg/pmg/users/se2481/scripts/16S_pipeline/trim_primers.py \
  --reads /manitou/pmg/users/mc5672/orig_data/hgf2_filtered/tmp/HGF2 \
  --fwd CCTACGGGNGGCWGCAG \
  --rev GACTACHVGGGTATCTAATCC \
  --batch 20 \
  --out /manitou/pmg/users/mc5672/orig_data/primer_trimmed \
  --exclude m014,m015 \
  --paired

Verify trimming:

cat *.log | grep 'with adapter'

(should be high, i.e. 98-99%)

(Optionally) Transfer Trimmed Files to Local Machine (& gunzip)
Process Data

(a) DADA2 + Taxonomy + SCRuB

Run Data_Processing.ipynb

This notebook:

Performs denoising with DADA2
Assigns taxonomy
Removes contaminants using SCRuB
Computes α- and β-diversity

(b) Enrich with Metadata

Run Data_Enriching.ipynb

This joins ASV tables with clinical reference data, performs CLR-transformation, and adds derived features (e.g., SOFA scores, infection timing).

Run Predictive Models:

Choose one of the models in prediction_models/, e.g:

pybatch Death_Next_7_SOFA.py

Model Naming Convention: Models are named using the format {target}{timepoint}{features}, where target refers to the outcome (e.g., infection, death), timepoint specifies the prediction window (0 for ICU admission samples only, any for samples from the full ICU stay, next_7 or next_10 for predicting events within 7 or 10 days of sample collection), and features indicates the input data used (asv, sofa, or sofaasv for combined).

Evaluate and Plot Results

To evaluate model AUROCs and plot ROC curves, run Evaluate_Model.ipynb

(Helpful utility) To generate plots for all models in prediction_models/, run: Generate_Plots.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Overview

Summary of Findings

Reproduction Steps

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
prediction_models		prediction_models
Data_Enriching.ipynb		Data_Enriching.ipynb
Data_Processing.ipynb		Data_Processing.ipynb
Evaluate_Model.ipynb		Evaluate_Model.ipynb
Generate_Plots.ipynb		Generate_Plots.ipynb
README.md		README.md

korem-lab/ICU_Analysis

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Summary of Findings

Reproduction Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages