Description
Describe the bug
Disclaimer: this is not necessarily a bug of spatialLIBD::registration_pseudobulk
I'm using the registration_pseudobulk
to create pseudobulked data. The values of the variable that are fed into the var_registration
parameter contains special character, e.g. spd03-L3/4
instead of spd03
. This will result in the colname of the pseudobulked data to be incomplete, i.e. the colnames of sce_pseudo
to miss this var_registration value (e.g., donor_
instead of donor_spd03
).
This in turn creates unexpected behavior in the following filtering and normalization step in edgeR
(See the bolded lines below.), and lead to incorrect pseudobulked data.
e.g.
NOTE: one pseudo-bulked sample is dropped
2025-03-17 14:53:19.498221 make pseudobulk object
2025-03-17 14:53:37.693132 dropping 1 pseudo-bulked samples that are below 'min_ncells'.
2025-03-17 14:53:37.76985 drop lowly expressed genes
Repeated column names found in count matrix
2025-03-17 14:53:38.02077 normalize expression
Repeated column names found in count matrix
Warning message:
In filterByExpr.DGEList(y, design = design, group = group, lib.size = lib.size, :
All samples appear to belong to the same group.
I just want to report this unexpected behavior or corner case.