Replies: 1 comment 1 reply
-
Hey @long21wt, storing per-example gradients sounds costly, try to reduce the batch size until it fits in memory.
The naive way would be to run |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi
I have an OOM problem when I try to get per example gradient with
vmap
, I wonder is there another way to do it without usingvmap
? A bit of context over it, I'm usingvmap
+ my own gradient accumulation by replacing state.Thanks
Beta Was this translation helpful? Give feedback.
All reactions