Skip to content

Unable to run tutorial with 3 bases with esophageal and simulated data #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jennprk opened this issue Jun 19, 2021 · 8 comments
Open

Comments

@jennprk
Copy link

jennprk commented Jun 19, 2021

Hello Zhi,

I am trying to run the HiLDA tutorial provided with 3 flanking bases instead of 5 bases on your simulated data and esophageal data. This is the code that I ran (for the simulated data, I aggregated the 5-base counts and features to 3-base):

# HiLDA simulated data
set.seed(123)
## Run hildaTest with 3-base (aggregated simulated data)
hildaGlobal <- hildaTest(inputG=newG, numSig=3, localTest=FALSE, refGroup=1:4, nIter=1000)

# Espohageal data
inputFile <- system.file("extdata/esophageal.mp.txt.gz", package="HiLDA")
G <- hildaReadMPFile(inputFile, numBases=3, trDir=TRUE)
hildaGlobal <- hildaTest(inputG=G, numSig=4, localTest=FALSE, refGroup=1:60, nIter=1000)

Both running these codes gives me errors while running JAGS:
Error in update.jags(object, n.iter, ...) : Error in node sCat1[4,76] Cannot normalize density

Error in update.jags(object, n.iter, ...) : Error in node sCat1[60,187] Cannot normalize density

Would you have any suggestions as to what might be going wrong here?

Best,
Ji-Eun

@zhiiiyang
Copy link
Collaborator

@jennprk, sorry about the delayed response.

For the simulated data, we can only read it in with 5 bases since it is saved as the .rdata data format.

set.seed(123)
load(system.file("extdata/sample.rdata", package = "HiLDA"))
hildaGlobal <- hildaTest(inputG=G, numSig=3, localTest=FALSE, refGroup=1:4, nIter=1000)

For the esophageal data, I am still trying to debug the problem by reproducing the example. Somehow, my new computer got stuck at the loading step.

library(HiLDA)
inputFile <- system.file("extdata/esophageal.mp.txt.gz", package="HiLDA")
G <- hildaReadMPFile(inputFile, numBases=3, trDir=TRUE)
hildaGlobal <- hildaTest(inputG=G, numSig=4, localTest=FALSE, refGroup=1:60, nIter=1000)

@jennprk
Copy link
Author

jennprk commented Jul 19, 2021

No worries ! And thank you for helping me out.

Actually, I aggregated the counts and features so that I can apply 3-base from your original simuated data. (you can find how I did here: https://gist.github.com/jennprk/595b68be98d75aec88c1011249f93418)

And for the esophageal data, it also took me an hour (or even more) to get the error, so I believe it might take a while for you as well.

@zhiiiyang
Copy link
Collaborator

@jennprk I have pushed a change to fix this issue.

B4910F63-4D5B-4C26-9C2E-80700943BFA2_4_5005_c

@jennprk
Copy link
Author

jennprk commented Feb 16, 2022

@zhiiiyang Hi Zhi, I am so sorry for such a late response! And thank you for updating the package. I've been reattempting the code, and it's still not working on my end. Would it be possible to get your code on how you generated the inputG?

[Update]
I was able to run the code with 3 flanking bases! I think there was some version mix-up earlier. I have some follow-up questions:

  1. I have used the code I shared via gist above, and the Rhats I get are very high. Even after initializing the parameters with pmgetSignature, my Rhat is ~1.5. Would you possibly have a fix to improve the convergence?

  2. Additionally, is there a way to extract the mutational signatures found by HiLDA in a format comparable to COSMIC signatures? I was reading through the manual but I only found a code to plot the signatures, but couldn't find a function to extract them in a table of probabilities like COSMIC.

@zhiiiyang
Copy link
Collaborator

@jennprk, please see the following code to generate the inputG

> library(HiLDA)
Loading required package: ggplot2
Need help getting started? Try the R Graphics Cookbook: https://r-graphics.org
> load(system.file("extdata/sample.rdata", package = "HiLDA"))
> G@sampleList
 [1] "sample_1"  "sample_2"  "sample_3"  "sample_4"  "sample_5"  "sample_6"  "sample_7"  "sample_8"  "sample_9"  "sample_10"
> set.seed(123)
> hildaGlobal <- hildaTest(inputG=G, numSig=2, localTest=FALSE, 
+                          refGroup=1:4, nIter=1000)
module glm loaded
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 6370
   Unobserved stochastic nodes: 1299
   Total graph size: 14880

Initializing model

  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
  |**************************************************| 100%

Also, to reply to your questions.

  1. To ensure convergence, you just need to increase the number of iterations nIter.
  2. Do you mean converting the pmsignature to COSMIC signature? Currently, there is no easy way to do it. I have to write additional functions to make it happen.

@jennprk
Copy link
Author

jennprk commented Feb 22, 2022

@zhiiiyang Thank you so much for sharing the codes! And I'll try increasing the iterations for the convergence :-)

As for my previous last question, it actually does not need to be in COSMIC format. Is there any way to extract the signatures? My guess is to use the estimated posteriors for pStates1 and pStates2 but I was not sure. Please let me know if there's any way to extract them! :)

@zhiiiyang
Copy link
Collaborator

zhiiiyang commented Mar 2, 2022

@jennprk, sorry about the late reply. Yes, you can extract them by using the code from the plotting function. https://github.com/USCbiostats/HiLDA/blob/master/R/hilda_plot.R#L19-L83

@jennprk
Copy link
Author

jennprk commented Mar 3, 2022

@zhiiiyang No worries! Thank you so much for directing me to the code :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants