Skip to content

Questions Regarding Storage Setup and Using scPRINT with Tahoe Dataset in Lamindb #28

Open
@Mimianmy

Description

@Mimianmy
 We are trying to **use scPRINT to pretrain on the Tahoe dataset**. However, we found that the Lamindb collection named "tahoe" is currently empty. To address this, we attempted to generate and preprocess the dataset ourselves using the provided adding_tahoe.ipynb notebook. During this process, we encountered several issues related to Storage configuration in Lamindb:

1. Storage Configuration Mismatch:The default Storage in Lamindb points to an S3 bucket (s3://arc-virtual-cell-atlas/), whereas our local dataset and Storage are located on a local path (/workspace/ssl_in_scg).
This mismatch causes problems when saving Artifacts linked to the local data.

2. Issues Registering and Using Local Storage: We successfully created a local Storage record with ln.Storage(root=..., type='local'). However, when creating Artifacts, Lamindb raises errors indicating it cannot find the Storage record during

3. Artifact save operations.: Attempts to programmatically override or modify the default Storage setting (e.g., ln.setup.settings.storage) fail due to attribute restrictions.

4. Artifact Creation and Workflow Challenges: Explicitly passing the storage argument when creating Artifacts is necessary to avoid errors. The current workflow and documentation do not clearly explain how to properly register and use local Storage in this context. Queries on Collections and preprocessing with LaminPreprocessor do not appear impacted by Storage issues, but file operations on Artifacts are definitely affected.

Given these challenges, we would greatly appreciate your guidance on:

  • How to successfully generate the Tahoe collection using the adding_tahoe.ipynb notebook with a local Storage setup, and then preprocess them?
  • Or alternatively, how we might directly access the existing Tahoe collection on Lamin to enable pretraining with scPRINT.

Thank you very much for your support! We look forward to your advice to continue our work~:)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions