Skip to content

Metadata fbc3 #1237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 122 commits into
base: devel
Choose a base branch
from
Open

Metadata fbc3 #1237

wants to merge 122 commits into from

Conversation

akaviaLab
Copy link
Contributor

Updated version of #988. Tried to simplify and tidy the code in addition to merging. Modified according to @cdiener requests

  • Ignored changes in notes for now
  • renamed constructors a bit
  • added eq comparisons that build classes if necessary

Original description of #988 follows

Metadata Classes: A separate directory for handling metadata information is made inside cobra/core directory. Every object derived from SBase can have meta information like annotation (CVTerms), notes, history attached to it.
The CVTerms class for storing externally linked resources to each component derived from SBase. This class is maintaining the new format annotation as well as old format annotation simultaneously, and both are kept synchronized with each other. Changing one will modify the other accordingly. This new class for annotation can handle any type of annotation data (be it the case of nested annotation or alternative annotation). It can read the old format as well as the new format annotation data from JSON and other formats. At the time of writing back the model, the new data format is used because it contains the complete data organized in the same way as SBML.
The History class used for storing the history, validating dates, etc, is now attached to each component derived from SBase.
The KeyValuePair class for storing key-value pairs, defined by fbc-v3.
The last three metadata objects (i.e CVTerms, History, KeyValuePair) are present inside a single attribute of SBase (Object) class and can be accessed via object.annotation.cvterms, object.annotation.history and object.annotation.key_value_data attributes. Calling simply the annotation attribute (object.annotation) will return the annotation data in old format (making it backward compatible).

Group to JSON: The support of the group package is extended to JSON.

JSON schema v2: The version2 of JSON schema has been added which defines the new format annotation, history, key-value pair, notes, group package data, user-defined constraints data and basic SBML info.
@akaviaLab comment - not sure if the v2 schema works or is done. Seems unclear from the code.

Issues Fixed
fixes issue #954: The support of group package has been extended to JSON and other formats.
fixes issue #856: The infinity values are also enabled for storing bounds.
fixes issue #810: The sbml_info storing basic information of SBML is written to JSON to store the basic SBML document information like packages, level, version, notes, annotation attached to the SBML component etc.
fixes issue #736: If annotation is present in the form of list of list, it is first modified and then data is read.
fixes issue #706: Single annotation resource are now put into a string. While reading from JSON also, if a single resource attribute is in the form of string, it is first fixed to list and then read into the model.
fixes issue #684: The complete metadata structure (CVTerms, Notes, History) has been redesigned with backward compatibility. Old format data is read, fixed and then written in new format metadata structures. The fbc-v3 "KeyValuePair" class is also a part of this new format annotation. Its corresponding class and its support to JSON and other format has been added. However, the SBML parser for it has to be updated when libsbml adds support for fbc-v3.
fixes issue #937: The annotation format has been updated, which is backward compatible too.
JSON schema v1: The JSON schema version 1 is modified to resolve some pre-existing issues.

Tests
Tests for all the newly implemented features are added to check the functionalities. A few old tests are also modified accordingly. Some tests which were initially marked 'xfail' are now working dew to modified formats.

sbml document history is saved
added relevant tests
@matthiaskoenig
Copy link
Contributor

Hi all,
I could review/update the code.

For clarification with the key/value store. This is basically the new fbc-version3 package which just needs a second implementation (which is this implementation) to be officially accepted. I.e. this is basically official as soon as we merge this in develop. So the key-value pair has to stay in.

A lot of discussion here. Are there any open things which have to be decided or is everything down to implementation details at this point?
Best Matthias

@akaviaLab
Copy link
Contributor Author

akaviaLab commented Jul 13, 2022 via email

@matthiaskoenig
Copy link
Contributor

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

Please don't set any current times! This will create diffs on your SBML models and model files every time you run your scripts resulting in large git diffs! I.e. don't set a current time as default. Perhaps an option for that, but please not the default behavior. The HistoryElement can be set on all core objects, so having diffs on all objects every time a script is executed is a bad idea.

@akaviaLab
Copy link
Contributor Author

akaviaLab commented Jul 13, 2022 via email

@akaviaLab
Copy link
Contributor Author

akaviaLab commented Jul 13, 2022 via email

@cdiener
Copy link
Member

cdiener commented Jul 13, 2022

@matthiaskoenig

Key-value pairs will be in there. They are super useful and a regularly requested feature.

The current date was only supposed to use current time as default value for the constructor. So it would only be set the first time the history is created and only if the user does not specify a date. EDIT: see reply below. Only set on writin gif no date is set.

The other discussion was whether we should limit the format of IDs for the v2 JSON schema. I don't think there is a good solution here, since the SBML and COBRAPY ID requirements are not compatible. So it will come down to whether we want the JSON schema 100% SBML compliant, with the limitation that not all COBRAPY models can be saved to that format.

The rest is just naming of classes etc. to make it a bit more accessible to devs and users without dep knowledge of SBML abstractions.

@akaviaLab
Copy link
Contributor Author

The current date was only supposed to use current time as default value for the constructor. So it would only be set the first time the history is created and only if the user does not specify a date. So only when explicitly doing model.annotation.history = History(...) the first time and when not providing the date explicitly in the constructor. We could skip this, but I suspect many users will do something like model.annotation.history = History(created_date=datetime.now(), ...). Right now, it will initialize to None if not specified. If you prefer that, we can leave it as is...

Right now, History() is called as part of every annotation, so modifying any default there would lead to every entity having a created date. That's a bad idea.
sbml.py has a small section when writing a document - if that document (not the model, but the document) has no creation date, it will be set to current time.

elif isinstance(sbase, libsbml.SBMLDocument): time = datetime.datetime.now() timestr = time.strftime("%Y-%m-%dT%H:%M:%S%z") date = libsbml.Date(timestr) _check(comp_history.setCreatedDate(date), "set creation date for document")

That's can be removed or stay.

@matthiaskoenig
Copy link
Contributor

matthiaskoenig commented Jul 13, 2022

I would highly prefer to have SBML SIds as valid cobrapy ids.

  • These can be serialized to all formats
  • These work in optlang and the solver interfaces
  • These are valid object identifiers (which is sometimes important)
  • These are very simple to validate with the following regex [a-zA-Z][a-zA-Z0-9_]*
  • Most models already use valid SIds because SBML is the de facto exchange standard

The only exceptions are some databases such as BiGG (no longer maintained, which also provides SBML with valid IDs).

These would allow:

  • to get rid of all code for id replacements besides on the importers for JSON/Matlab
  • to have no id replacements on SBML and valid cobrapy models (these create a ton of issues when mapping data/models, ...) and create a ton of bugs.

I strongly advocate for this solution and just enforce valid SIds and move id_replacement in helper functions which would be applied for import besides SBML (and as toolbox for people to fix there invalid models).

edit: fixed regex

@akaviaLab
Copy link
Contributor Author

@cdiener @matthiaskoenig
Is the documentation in af5f3fa for CVTerm and Qualifier better? What the documentation should be in general?

@matthiaskoenig
Copy link
Contributor

@cdiener @matthiaskoenig Is the documentation in af5f3fa for CVTerm and Qualifier better? What the documentation should be in general?

Yes, the documentation is better. But please use a string enum for such enums.
This will allow to parse strings much easier to the actual instances. I.e. something along the lines of

class MyEnum(str, Enum):
    state1 = 'state1'
    state2 = 'state2'

@cdiener
Copy link
Member

cdiener commented Jul 14, 2022

I would highly prefer to have SBML SIds as valid cobrapy ids.

* These can be serialized to all formats

* These work in optlang and the solver interfaces

* These are valid object identifiers (which is sometimes important)

* These are very simple to validate with the following regex `[a-zA-Z][a-zA-Z0-9_]*`

* Most models already use valid SIds because SBML is the de facto exchange standard

I think this is hard because of backwards compatibility as mentioned above, and it would be a bit inconsistent for instance when reading models with JSON v1 schema. But I so see your point and agree that all the SBML-like formats IDs should be read as is and there should be helpers to convert them to that format. Point 2 is not fully correct since there are some IDs that are SBML-compliant but don't work in optlang in some solver interfaces (IDs longer 255 chars in GLPK, or metabolites named "st" in Gurobi for instance). So basically we push the defaults toward SBML-compliant IDs without breaking other formats such as Matlab etc.

uri.akavia added 8 commits July 19, 2022 20:23
…ist - need examples (TODO)

Qualifiers as string enum
CVTermList will output list of dicts, not a dict
update_pickles.py
minor renaming
changed sbml.py according to previous changes. cleaned up processing of history in sbml.py
changed test_metadata.py according to these changes
improve equality of History object
some tidying of metadata.py and adding __eq__ function
added test_old_style_annotation to test_metadata.py
updated some data and tests
Some minor modifications and comments on sbml and test_sbml.py
@pascalaldo
Copy link

Hi everyone, what are the current plans with this pull request? It contains a great set of features. We're currently looking at updating the BiGG repository and having improved annotation + group support in cobrapy would benefit us greatly.
I get that the changes to the representation of annotations done here could break quite some existing code. But if there are more hurdles, I can allocate time to update and improve the code here.

@Midnighter
Copy link
Member

That would be fantastic! As I understand it, @akaviaLab has moved on to other topics and will not contribute further.

@pascalaldo
Copy link

Great, I'll start by merging devel branch changes into here and get all the code working and tests running again.

pascalaldo added a commit to pascalaldo/cobrapy that referenced this pull request Mar 28, 2025
…nt names as a single value (supported in SBML Core 3 Level 2 using vCard4) (based on comments by cdiener in opencobra#1237). Creator names given as a separate given and family name are converted to a single name value and stored as such. That means that reading given+family names (from SBML files) is still supported, but they will always be concatenated and written as a single value when saving that model. Tests were modified to reflect this change. Separate given+family name support was also removed from the JSON v2 schema, whilst organisation name was added as optional field.
@pascalaldo pascalaldo mentioned this pull request Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale The issue or pull request lacks activity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants