Skip to content

Difficulty generalizing with train3d #230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
asarnow opened this issue Apr 1, 2025 · 1 comment
Open

Difficulty generalizing with train3d #230

asarnow opened this issue Apr 1, 2025 · 1 comment

Comments

@asarnow
Copy link

asarnow commented Apr 1, 2025

Hi Tristan,
I've been experimenting with train3d in the dev branch. I have been using cryoCARE or topaz denoised tomograms with ~4000 manually annotated particles (from 5 tomograms), and I'm seeing a failure to generalize the picking to certain tomograms or regions in particular tomograms (42 in the whole dataset). Generalization within the tomograms used for training to unlabeled particles, is much better.

The particles are in fairly dense membrane-anchored arrays, they look like this for example.

Image Image

I was wondering if you had any recommendations for what parameters to look at for optimization, and if you find it's better to pick a few tomograms completely or many of them with relatively few labels? I annotated ~1200 particles on the source tomogram for the example above, and topaz picks another 9000 or so, replicating that across the dataset would be a bit more challenging...relatedly, is retraining after using topaz to more completely pick individual tomograms a reasonable approach?

Thanks for any advice you can share, I understand the 3d picking is under active development.

@DarnellGranberry
Copy link
Collaborator

Hi @asarnow ,

Tristan can correct me, but I believe Topaz expects images to not be denoised. I would also double-check that your tomograms are all normalized (I suspect this is impacting the variance in performance between images). If neither of those give you better performance and there still seems to be some difference between tomograms, then you could try using training particles from more images to capture that diversity.

You can repick your images and retrain with more data, but you will need to manually curate those particles/class averages to remove false positives in between each round.

I hope that helps, but let us know if you still face issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants