-
Notifications
You must be signed in to change notification settings - Fork 12
v2: Clarification MultiModel format #622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
There is a small inconsistency, since the "PEtab" way would be to have a new
models
table, I think. -
Instead of a
modelId
column, the user could instead associate one measurement file with one model file in the YAML directly.
But the suggestion is fine for me, especially since I have no use cases currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if adding the modelId to the measurementTable is the best solution.
I was thinking about adding the modelId
column to the conditionTable
, parameterTable
and observableTable
. I.e. wherever an id is referenced, it should be possible to provide an optional modelId
to specify which model the id should be from. This would allow you to validate the table against the model. Otherwise you have to go back from the measurementTable to the other table to figure out which model was meant for which variable. This makes it very difficult to work with multiple models.
I could imagine having a problem with multiple models where parameters from each model are used, as well as observables from each model, and in the time series conditions are applied to each model. One wants to see from which model each information is coming in the respective table.
doc/v2/_static/petab_schema_v2.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but a lot of diff in the file which makes it difficult to see the actual change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
briefly, removed the problems list/array and moved all properties to the highest level.
I'd see the yaml file as kind of the "model table" here: # [...]
model_files:
model1:
language: sbml
location: model1.xml
model2:
language: sbml
location: model2.xml
# [...] |
That was the original idea behind the format_version: 2.0.0
parameter_file: all_parameters.tsv
problems:
- condition_files: [conditions_for_model1.tsv]
experiment_files: [experiments_for_model1.tsv]
measurement_files: [measurements_for_model1.tsv]
model_files:
model1:
language: sbml
location: model1.xml
observable_files: [observables_for_model1.tsv]
- condition_files: [conditions_for_model2.tsv]
experiment_files: [experiments_for_model2.tsv]
measurement_files: [measurements_for_model2.tsv]
model_files:
model2:
language: sbml
location: model2.xml
observable_files: [observables_for_model2.tsv] However, I am not sure what is preferable. |
doc/v2/documentation_data_format.rst
Outdated
introduced in the observable table may be referenced, but circular definitions | ||
must be avoided. | ||
|
||
In problems with multiple models, symbols not defined in the currently simulated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not rather throw an error?
If I for example there is in the observable formula a specie not present in the model, it feels like an error has been made by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I considered that alternative. This specification allows reuse of observables across models, the user can always introduce model specific observables. We could provide different options for linting that are more or less strict.
If |
As I understand it one will have a single optimization problem which uses two models. I.e. in the end there is a single cost function which has contributions of the different models. Having a list of problems implies these are separate optimization problems with multiple cost functions which could be optimized independently.
That is a good point. I think we have to write down clearly the use cases we want to implement with the multiple models and then figure out the best way on how to encode this. My use cases are:
For me the main reason is to 1. be able to split large models in submodels for part of the cost function calculation. This is mainly an optimization aspect and could perhaps also be done on a tool bases if there is some metadata. 2. Encode multiple problems in a single PETab problem (this is mainly convenience and reuse of information). Basically a level on top of a single optimization problem. |
Optimizing a single parameter from multiple models could require a mapping of the parameters, because the parameters do not have to be named identically in the different models (especially if existing models from different sources are used). The more I think about the multiple model problems the more complicated it gets. We really have to think this through. |
Yes.
All of the tables are now set up in a way that allows adding of extra columns/metadata. Any user can easily start with "master tables" with some metadata and programmatically generate the necessary tables, I don't think we need to formalise that process.
This is similar to the use cases that we have and what I was trying to address.
I think this could easily be solved by having multiple yaml files and reusing tables across problems.
I thought this was addressed by PEtab-select, but I am not familiar with the details. |
This would already be covered by the implementation above with the mapping performed via the condition table. Things get problematic if the same parameter IDs are reused across models but with different meanings, but at the end of the day that's just asking for trouble. |
Co-authored-by: Dilan Pathirana <[email protected]> Co-authored-by: Sebastian Persson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for me, no strong opinion for or against
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I am happy to get rid of the problems
list in the yaml file for the single-model case, I thought the editor meeting discussion ended with going back the old style of having different sets of condition, observables, ... files for each model with only the parameter table being shared. (Which would not preclude referencing the same observable/condition/experiment file from multiple sub-problems).
I can live with either and would follow those who already have a clear application in mind.
Co-authored-by: Dilan Pathirana <[email protected]>
That wasn't my takeaway. I thought we mainly discussed about how forgiving conditions/observables are in terms of validation. Thinking about it, splitting up the problem at the file level is actually simpler to implement and more verbose for the user, so that definitely is a viable alternative that is equivalent in terms of what can be expressed. But yes, would be great if we don't have a problems list for the single-model case. We could drop the model id column from the measurements table and introducing an additional sub-problems table with model id as columns and files as rows with True/False values to indicate which file applies to which model? Default would be to apply tables to all models, minimising the number of rows that have to be added. |
See #609
One could also think about adding the option to specify models for individual conditions, but that definitely sounds like extension territory.