v2: Clarification MultiModel format #622

FFroehlich · 2025-04-22T15:55:28Z

One could also think about adding the option to specify models for individual conditions, but that definitely sounds like extension territory.

dilpath

There is a small inconsistency, since the "PEtab" way would be to have a new models table, I think.
Instead of a modelId column, the user could instead associate one measurement file with one model file in the YAML directly.

But the suggestion is fine for me, especially since I have no use cases currently.

doc/v2/documentation_data_format.rst

matthiaskoenig

Not sure if adding the modelId to the measurementTable is the best solution.
I was thinking about adding the modelId column to the conditionTable, parameterTable and observableTable. I.e. wherever an id is referenced, it should be possible to provide an optional modelId to specify which model the id should be from. This would allow you to validate the table against the model. Otherwise you have to go back from the measurementTable to the other table to figure out which model was meant for which variable. This makes it very difficult to work with multiple models.

I could imagine having a problem with multiple models where parameters from each model are used, as well as observables from each model, and in the time series conditions are applied to each model. One wants to see from which model each information is coming in the respective table.

matthiaskoenig · 2025-04-22T16:57:05Z

doc/v2/_static/petab_schema_v2.yaml

Looks good, but a lot of diff in the file which makes it difficult to see the actual change.

briefly, removed the problems list/array and moved all properties to the highest level.

dweindl · 2025-04-23T06:41:56Z

There is a small inconsistency, since the "PEtab" way would be to have a new models table, I think.

I'd see the yaml file as kind of the "model table" here:

# [...]
  model_files:
    model1:
      language: sbml
      location: model1.xml
    model2:
      language: sbml
      location: model2.xml
# [...]

dweindl · 2025-04-23T06:47:55Z

Instead of a modelId column, the user could instead associate one measurement file with one model file in the YAML directly.

That was the original idea behind the problems list in the yaml file:

format_version: 2.0.0
parameter_file: all_parameters.tsv
problems:
- condition_files: [conditions_for_model1.tsv]
  experiment_files: [experiments_for_model1.tsv]
  measurement_files: [measurements_for_model1.tsv]
  model_files:
    model1:
      language: sbml
      location: model1.xml
  observable_files: [observables_for_model1.tsv]
- condition_files: [conditions_for_model2.tsv]
  experiment_files: [experiments_for_model2.tsv]
  measurement_files: [measurements_for_model2.tsv]
  model_files:
    model2:
      language: sbml
      location: model2.xml
  observable_files: [observables_for_model2.tsv]

However, I am not sure what is preferable.

sebapersson · 2025-04-23T07:17:32Z

doc/v2/documentation_data_format.rst

  introduced in the observable table may be referenced, but circular definitions
  must be avoided.

+  In problems with multiple models, symbols not defined in the currently simulated 


Should this not rather throw an error?
If I for example there is in the observable formula a specie not present in the model, it feels like an error has been made by the user.

Yes, I considered that alternative. This specification allows reuse of observables across models, the user can always introduce model specific observables. We could provide different options for linting that are more or less strict.

doc/v2/documentation_data_format.rst

sebapersson · 2025-04-23T07:23:30Z

I was thinking about adding the modelId column to the conditionTable, parameterTable and observableTable. I.e. wherever an id is referenced, it should be possible to provide an optional modelId to specify which model the id should be from. This would allow you to validate the table against the model. Otherwise you have to go back from the measurementTable to the other table to figure out which model was meant for which variable. This makes it very difficult to work with multiple models.

If modelId is added to the parameterTable, would it not then be quite involved to have the same parameter in multiple models (e.g. one would have to add multiple model-ids?). I think having the modelId only in the measurements table is doable, however, the drawback as pointed out here is that it will be quite a bit of work for the importer to figure out which parameters and species appear in each model.

matthiaskoenig · 2025-04-23T08:03:35Z

That was the original idea behind the problems list in the yaml file.

As I understand it one will have a single optimization problem which uses two models. I.e. in the end there is a single cost function which has contributions of the different models. Having a list of problems implies these are separate optimization problems with multiple cost functions which could be optimized independently.

If modelId is added to the parameterTable, would it not then be quite involved to have the same parameter in multiple models (e.g. one would have to add multiple model-ids?).

That is a good point. I think we have to write down clearly the use cases we want to implement with the multiple models and then figure out the best way on how to encode this.
Personally, I am always more in favour of having a more fine grained control (i.e. by adding model ids to the different tables) which allows in the long run much more flexibility and implement use cases nobody thought about. By adding the modelId on the measurements table we allow only to distribute measurements to different models, but nothing else. By having modelIds on the separate tables a lot of different things could be done in the future without changing the format much. E.g. adding multiple modelIds to indicate that parameters or conditions should be shared between models.

My use cases are:

Smaller models by distribution of problem in submodels (shared parameters). I have multiple models which map to different datasets, and there are some parameters which occur in the different models (shared parameters). I want to optimize some of the parameter which occur in all models as a single parameter (e.g. hepatic blood flow, ...). Allowing to split the problem makes the ODE integration of the submodels and code generation much faster. Code generation via LLVM and other C++ code optimization does not scale linearly, i.e. code generation scales > 0(n). These becomes the bottleneck for large models (we have to wait >12 hours for some code generation with roadrunner with simulation taking a few seconds if we have > 100 000 state variables).
Shared experiments between models. I have different models and want to run the same experiments over it. No shared parameters, but shared experiments. Could be encoded as separate optimization problems. Duplicating all the information is error prone. Similar as 3.
Encoding multiple optimization problems in a single file. I want to change parameters/settings of the optimization problem/model and run all the variants. Something in the direction of a sensitivity analyses to settings (hyperparameter scans). E.g. test out different rate laws, different prioris, ... I could generate a lot of different PETab problems, but it would be much nicer to encode these in a single PETab by providing the different models. This would require different cost functions for the different models/settings and goes in the direction of model selection. This does not require multiple models in a single problem, but multiple problems in a single file. If one has larger models a ton of unnecessary data is generated with the model files being the largest assets for the optimization problem.

For me the main reason is to 1. be able to split large models in submodels for part of the cost function calculation. This is mainly an optimization aspect and could perhaps also be done on a tool bases if there is some metadata. 2. Encode multiple problems in a single PETab problem (this is mainly convenience and reuse of information). Basically a level on top of a single optimization problem.

matthiaskoenig · 2025-04-23T08:13:42Z

Optimizing a single parameter from multiple models could require a mapping of the parameters, because the parameters do not have to be named identically in the different models (especially if existing models from different sources are used). The more I think about the multiple model problems the more complicated it gets. We really have to think this through.

FFroehlich · 2025-04-23T09:21:18Z

That was the original idea behind the problems list in the yaml file.

As I understand it one will have a single optimization problem which uses two models. I.e. in the end there is a single cost function which has contributions of the different models. Having a list of problems implies these are separate optimization problems with multiple cost functions which could be optimized independently.

Yes.

If modelId is added to the parameterTable, would it not then be quite involved to have the same parameter in multiple models (e.g. one would have to add multiple model-ids?).

That is a good point. I think we have to write down clearly the use cases we want to implement with the multiple models and then figure out the best way on how to encode this. Personally, I am always more in favour of having a more fine grained control (i.e. by adding model ids to the different tables) which allows in the long run much more flexibility and implement use cases nobody thought about. By adding the modelId on the measurements table we allow only to distribute measurements to different models, but nothing else. By having modelIds on the separate tables a lot of different things could be done in the future without changing the format much. E.g. adding multiple modelIds to indicate that parameters or conditions should be shared between models.

All of the tables are now set up in a way that allows adding of extra columns/metadata. Any user can easily start with "master tables" with some metadata and programmatically generate the necessary tables, I don't think we need to formalise that process.

My use cases are:

Smaller models by distribution of problem in submodels (shared parameters). I have multiple models which map to different datasets, and there are some parameters which occur in the different models (shared parameters). I want to optimize some of the parameter which occur in all models as a single parameter (e.g. hepatic blood flow, ...). Allowing to split the problem makes the ODE integration of the submodels and code generation much faster. Code generation via LLVM and other C++ code optimization does not scale linearly, i.e. code generation scales > 0(n). These becomes the bottleneck for large models (we have to wait >12 hours for some code generation with roadrunner with simulation taking a few seconds if we have > 100 000 state variables).

This is similar to the use cases that we have and what I was trying to address.

Shared experiments between models. I have different models and want to run the same experiments over it. No shared parameters, but shared experiments. Could be encoded as separate optimization problems. Duplicating all the information is error prone. Similar as 3.

I think this could easily be solved by having multiple yaml files and reusing tables across problems.

Encoding multiple optimization problems in a single file. I want to change parameters/settings of the optimization problem/model and run all the variants. Something in the direction of a sensitivity analyses to settings (hyperparameter scans). E.g. test out different rate laws, different prioris, ... I could generate a lot of different PETab problems, but it would be much nicer to encode these in a single PETab by providing the different models. This would require different cost functions for the different models/settings and goes in the direction of model selection. This does not require multiple models in a single problem, but multiple problems in a single file. If one has larger models a ton of unnecessary data is generated with the model files being the largest assets for the optimization problem.

I thought this was addressed by PEtab-select, but I am not familiar with the details.

FFroehlich · 2025-04-23T09:26:36Z

Optimizing a single parameter from multiple models could require a mapping of the parameters, because the parameters do not have to be named identically in the different models (especially if existing models from different sources are used). The more I think about the multiple model problems the more complicated it gets. We really have to think this through.

This would already be covered by the implementation above with the mapping performed via the condition table. Things get problematic if the same parameter IDs are reused across models but with different meanings, but at the end of the day that's just asking for trouble.

Co-authored-by: Dilan Pathirana <[email protected]> Co-authored-by: Sebastian Persson <[email protected]>

dilpath

Fine for me, no strong opinion for or against

doc/v2/documentation_data_format.rst

dweindl

While I am happy to get rid of the problems list in the yaml file for the single-model case, I thought the editor meeting discussion ended with going back the old style of having different sets of condition, observables, ... files for each model with only the parameter table being shared. (Which would not preclude referencing the same observable/condition/experiment file from multiple sub-problems).

I can live with either and would follow those who already have a clear application in mind.

doc/v2/documentation_data_format.rst

Co-authored-by: Dilan Pathirana <[email protected]>

FFroehlich · 2025-05-23T07:44:57Z

While I am happy to get rid of the problems list in the yaml file for the single-model case, I thought the editor meeting discussion ended with going back the old style of having different sets of condition, observables, ... files for each model with only the parameter table being shared. (Which would not preclude referencing the same observable/condition/experiment file from multiple sub-problems).

I can live with either and would follow those who already have a clear application in mind.

That wasn't my takeaway. I thought we mainly discussed about how forgiving conditions/observables are in terms of validation.

Thinking about it, splitting up the problem at the file level is actually simpler to implement and more verbose for the user, so that definitely is a viable alternative that is equivalent in terms of what can be expressed. But yes, would be great if we don't have a problems list for the single-model case.

We could drop the model id column from the measurements table and introducing an additional sub-problems table with model id as columns and files as rows with True/False values to indicate which file applies to which model? Default would be to apply tables to all models, minimising the number of rows that have to be added.

update spec

08e6bdd

FFroehlich requested a review from a team as a code owner April 22, 2025 15:55

dilpath approved these changes Apr 22, 2025

View reviewed changes

doc/v2/documentation_data_format.rst Outdated Show resolved Hide resolved

doc/v2/documentation_data_format.rst Outdated Show resolved Hide resolved

matthiaskoenig self-requested a review April 22, 2025 16:49

matthiaskoenig reviewed Apr 22, 2025

View reviewed changes

sebapersson reviewed Apr 23, 2025

View reviewed changes

doc/v2/documentation_data_format.rst Outdated Show resolved Hide resolved

sebapersson reviewed Apr 23, 2025

View reviewed changes

doc/v2/documentation_data_format.rst Outdated Show resolved Hide resolved

FFroehlich and others added 2 commits April 23, 2025 10:35

Apply suggestions from code review

975d5ae

Co-authored-by: Dilan Pathirana <[email protected]> Co-authored-by: Sebastian Persson <[email protected]>

Merge branch 'main' into v2_multimodel

6216a5b

dweindl added this to the PEtab 2.0.0 milestone May 21, 2025

FFroehlich added 2 commits May 22, 2025 10:29

update invalid conditions/observables

73d6b28

update multi-model description

0c23354

FFroehlich requested review from matthiaskoenig, sebapersson, dilpath and dweindl May 22, 2025 11:52

Merge branch 'main' into v2_multimodel

a2fc132

dilpath approved these changes May 22, 2025

View reviewed changes

doc/v2/documentation_data_format.rst Outdated Show resolved Hide resolved

doc/v2/documentation_data_format.rst Outdated Show resolved Hide resolved

dweindl reviewed May 22, 2025

View reviewed changes

doc/v2/documentation_data_format.rst Outdated Show resolved Hide resolved

FFroehlich and others added 2 commits May 23, 2025 08:28

Apply suggestions from code review

1a1a93f

Co-authored-by: Dilan Pathirana <[email protected]>

remove missing assignments

628b088

FFroehlich changed the title ~~Clarification MultiModel format~~ v2: Clarification MultiModel format May 23, 2025

v2: Clarification MultiModel format #622

Are you sure you want to change the base?

v2: Clarification MultiModel format #622

Uh oh!

Conversation

FFroehlich commented Apr 22, 2025

Uh oh!

dilpath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

matthiaskoenig left a comment

Choose a reason for hiding this comment

Uh oh!

matthiaskoenig Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

FFroehlich Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

dweindl commented Apr 23, 2025

Uh oh!

dweindl commented Apr 23, 2025

Uh oh!

sebapersson Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

FFroehlich Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sebapersson commented Apr 23, 2025

Uh oh!

matthiaskoenig commented Apr 23, 2025

Uh oh!

matthiaskoenig commented Apr 23, 2025

Uh oh!

FFroehlich commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FFroehlich commented Apr 23, 2025

Uh oh!

dilpath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dweindl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FFroehlich commented May 23, 2025

Uh oh!

Uh oh!

FFroehlich commented Apr 23, 2025 •

edited

Loading