Skip to content

kbase/cdm-task-service

Repository files navigation

CDM Task Service

This is currently a prototype

Enables running jobs on remote compute from the KBase CDM cluster.

Nomenclature

  • CDM: Central Data Model
    • The KBase data model
  • CTS: CDM Task Service

Usage notes

The CTS uses CRC64/NVME checksums on all input and output files for data integrity checks and to ensure files do not unexpectedly change over time. As such, all input files in S3 must have the checksums stored in their metadata. A user can check this via various methods, but here we use the Minio client, mc. Assuming the alias for the S3 installation is local9002:

$ ./mc stat local9002/test-bucket/test-file-with-checksum
Name      : test-file-with-checksum
Date      : 2025-02-26 16:52:49 PST 
Size      : 8 B    
ETag      : 1b9554867d35f0d59e4705f6b2712cd1 
Type      : file 
Checksum  : CRC64NVME:S8AXmd7A3f8=
Metadata  :
  Content-Type: application/octet-stream

Files without the CRC64NVME checksum will be rejected by the service.

To include a checksum on upload with mc, use the --checksum argument:

./mc cp --checksum crc64nvme tinyfile.crap local9002/test-bucket/test-file-with-checksum
...yfile.crap: 8 B / 8 B ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 60 B/s 0s

If a file already exists in S3 but has no checksum, one can be added via a copy:

$ ./mc stat local9002/test-bucket/test-file-no-checksum
Name      : test-file-no-checksum
Date      : 2025-02-26 16:52:33 PST 
Size      : 8 B    
ETag      : 1b9554867d35f0d59e4705f6b2712cd1 
Type      : file 
Metadata  :
  Content-Type: application/octet-stream 

~$ ./mc cp --checksum crc64nvme local9002/test-bucket/test-file-no-checksum local9002/test-bucket/test-file-now-with-more-checksums
...o-checksum: 8 B / 8 B ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 37 B/s 0s

~$ ./mc stat local9002/test-bucket/test-file-now-with-more-checksums
Name      : test-file-now-with-more-checksums
Date      : 2025-02-26 16:57:06 PST 
Size      : 8 B    
ETag      : 1b9554867d35f0d59e4705f6b2712cd1 
Type      : file 
Checksum  : CRC64NVME:S8AXmd7A3f8=
Metadata  :
  Content-Type: application/octet-stream 

Further usage documentation

Service Requirements

  • Python 3.11+
  • crane
  • An S3 instance for use as a file store, but see "S3 requirements" below
  • MongoDB 7+
  • Kafka 2.1+

S3 requirements

  • Path style access is required.
  • The service does not support objects encrypted with customer supplied keys or with the AWS key management service.
  • The provided credentials must enable listing readable buckets, as the service performs that operation to check the host and credentials on startup.
  • If using Minio, the minimum version is 2025-02-07T23-21-09Z.

Powered by

JGI JAWS

Development

Adding code

  • In this alpha / prototype stage, we will be PRing (do not push directly) to main. In the future, once we want to deploy beyond CI, we will add a develop branch.
  • The PR creator merges the PR and deletes branches (after builds / tests / linters complete).

Code requirements for prototype code

  • Any code committed must at least have a test file that imports it and runs a noop test so that the code is shown with no coverage in the coverage statistics. This will make it clear what code needs tests when we move beyond the prototype stage.
  • Each module should have its own test file. Eventually these will be expanded into unit tests (or integration tests in the case of app.py)
  • Any code committed must have regular code and user documentation so that future devs converting the code to production can understand it.
  • Release notes are not strictly necessary while deploying to CI, but a concrete version (e.g. no -dev* or -prototype* suffix) will be required outside of that environment. On a case by case basis, add release notes and bump the prototype version (e.g. 0.1.0-prototype3 -> 0.1.0-prototype4) for changes that should be documented.

Running tests

Docker must be installed.

Copy test.cfg.example to test.cfg and fill it in appropriately.

pipenv sync --dev  # only the first time or when Pipfile.lock changes
pipenv shell
PYTHONPATH=. pytest test

Exit from prototype status

About

A service for enabling running jobs on remote compute from the KBase CDM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 2

  •  
  •  

Languages