Skip to content

Running CellFie in Matlab

Anne Richelle edited this page Aug 1, 2020 · 22 revisions

System Requirement

  • A Matlab version newer than R2014b

Installation

Download CellFie in the bash

# in the command bash
git clone https://github.com/LewisLabUCSD/CellFie.git <desired path to cellfie>/CellFie

Add CellFie to the Matlab path

% in matlab
cd PathToCellfie % Example - cd C:\User\CellFie-master
addpath(genpath('PathToCellfie')) % Example - addpath(genpath('C:\User\CellFie-master'))

Quick Start

%% Load an example dataset (expression matrix: entrez ids x samples)
load 'test/suite/dataTest.mat'
%% Define the number of samples (equal to the column number of the expression matrix)
SampleNumber=3;

%% Define the reference genome-scale models you want to use (all listed in the test/suite)
ref='MT_recon_2_2_entrez.mat';

%% Define the type parameters of the method
param.ThreshType='local';
param.LocalThresholdType='minmaxmean';
param.percentile_or_value='percentile';
param.percentile_low=25;
param.percentile_high=75;

[score, score_binary ,taskInfos, detailScoring]=CellFie(data,SampleNumber,ref,param);

You should obtain the following results for the score, the score_binary and the taskInfos

Details on the function

USAGE:

[score, score_binary ,taskInfos, detailScoring]=CellFie(data,SampleNumber,ref,param)

INPUTS:

  • data.gene - cell array containing GeneIDs in the same format as model.genes
  • data.value - mRNA expression data structure (genes x samples)associated to each gene mentioned in data.gene
  • SampleNumber - Number of samples
  • ref - Reference model used to compute the metabolic task scores (e.g.,'MT_recon_2_2_entrez.mat')

OPTIONAL INPUTS:

  • param.ThreshType - Type of thresholding approach used (i.e.,'global' or 'local') (default - local)

Parameters related to the use of a GLOBAL thresholding approach - the threshold value is the same for all the genes

  • param.percentile_or_value - the threshold can be defined using a value introduced by the user ('value') or based on a percentile of the distribution of expression value for all the genes and across all samples of your dataset ('percentile')
  • param.percentile - percentile from the distribution of expression values for all the genes and across all samples that will be used to define the threshold value
  • param.value - expression value for which a gene is considered as active or not (e.g., 5)

Parameters related to the use of a LOCAL thresholding approach - the threshold value is different for all the genes

  • param.percentile_or_value - the threshold can be defined using a value introduced by the user ('value') or based on a percentile of the distribution of expression value of a specific gene across all samples of your dataset ('percentile'-default)

  • param.LocalThresholdType - option to define the type of local thresholding approach to use

     'minmaxmean' (default options )- the threshold for a gene is determined by the mean of expression values observed for that gene among all the samples, tissues, or conditions BUT the threshold :(i) must be higher or equal to a lower bound and (ii) must be lower or equal to an upper bound.
    
     'mean' -the threshold for a gene is defined as the mean expression value of this gene across all the samples, tissues, or conditions
    
  • param.percentile_low - lower percentile used to define which gene are always inactive in the case of use 'MinMaxMean' local thresholding approach (default = 25)

  • param.percentile_high - upper percentile used to define which gene are always active in the case of use 'MinMaxMean' local thresholding approach (default= 75)

  • param.value_low - lower expression value used to define which gene are always inactive in the case of use 'MinMaxMean' local thresholding approach (e.g., 5)

  • param.value_high - upper expression value used to define which gene are always active in the case of use 'MinMaxMean' local thresholding approach (e.g., 5)

OUTPUTS:

  • score - relative quantification of the activity of a metabolic task in a specific condition based on the availability of data for multiple conditions

  • score_binary - binary version of the metabolic task score to determine whether a task is active or inactive in specific conditions

  • taskInfos - Description of the metabolic task assessed

  • detailScoring - Matrix detailing the scoring

     1st column = sample ID
    
     2nd column = task ID
    
     3th column = task score for this sample
    
     4th column = task score in binary version for this sample
    
     5th column = essential reaction associated to this task
    
     6th column = expression score associated  to the reaction listed in the 5th column
    
     7th column = gene used to determine the expression of the reaction listed in the 5th column
    
     8th column = original expression value of the gene listed in the 7th column
    
Clone this wiki locally