Skip to content

[BUG] special character in values of var_registration variable leads to unexpected behavior in registration_pseudobulk #103

Open
@boyiguo1

Description

@boyiguo1

Describe the bug

Disclaimer: this is not necessarily a bug of spatialLIBD::registration_pseudobulk

I'm using the registration_pseudobulk to create pseudobulked data. The values of the variable that are fed into the var_registration parameter contains special character, e.g. spd03-L3/4 instead of spd03. This will result in the colname of the pseudobulked data to be incomplete, i.e. the colnames of sce_pseudo to miss this var_registration value (e.g., donor_ instead of donor_spd03).
This in turn creates unexpected behavior in the following filtering and normalization step in edgeR(See the bolded lines below.), and lead to incorrect pseudobulked data.

e.g.
NOTE: one pseudo-bulked sample is dropped
2025-03-17 14:53:19.498221 make pseudobulk object
2025-03-17 14:53:37.693132 dropping 1 pseudo-bulked samples that are below 'min_ncells'.
2025-03-17 14:53:37.76985 drop lowly expressed genes
Repeated column names found in count matrix
2025-03-17 14:53:38.02077 normalize expression
Repeated column names found in count matrix
Warning message:
In filterByExpr.DGEList(y, design = design, group = group, lib.size = lib.size, :
All samples appear to belong to the same group.

I just want to report this unexpected behavior or corner case.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions