flax speed up slow torch GPU training on large doc2vec corpus #3388

chrisalvino · 2023-10-03T19:14:28Z

chrisalvino
Oct 3, 2023

I'm trying to assess use of flax and jax for a project to steer whether or not we want to use it.

I have a doc2vec corpus of about 150 million words, and I discovered that it converges much better when the batch sizes are very small (batch size <=100). However, with torch (GPU) and such small batch sizes, most of the time is spent in the Python interpreter in the flow control, even when the data and model exists fully on GPU.

In other words, the model training (gradient computation and updates) are fast on GPU in torch and I've eliminated almost all CPU <-> GPU data transfer, but my wall clock times are way to slow because of flow control being in the Python interpreter.

My question is whether flax/jax and the underlying primitives can help with this type of speedup across batches (not just speedups with a batch). Any advice on a quick "maybe"/"definitely not" would help as well as what features of whether these tools would help solve this problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

flax speed up slow torch GPU training on large doc2vec corpus #3388

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

flax speed up slow torch GPU training on large doc2vec corpus #3388

Uh oh!

Uh oh!

chrisalvino Oct 3, 2023

Replies: 0 comments

chrisalvino
Oct 3, 2023