How to use nn.scan and nn.BatchNorm in Flax? #2067

denisyarats · 2022-04-15T17:04:25Z

denisyarats
Apr 15, 2022

Hi,

I'm trying to use nn.scan together with nn.BatchNorm but have some issues with setting batch_stats as variable_carry. I have this working example which does the job:

from flax.core import Scope, Array, init, unfreeze, lift
from flax import linen as nn
import jax
from jax import random, numpy as jnp

class MLP(nn.Module):
    @nn.compact
    def __call__(self, c, x):
        h = nn.Dense(features=10)(x)
        h = nn.BatchNorm(use_running_average=False)(h)
        y = nn.Dense(features=1)(h)
        return c, y

xs = jnp.zeros((10, 2))
p = MLP().init(random.PRNGKey(1), (), xs[0])
scan_mlp = nn.scan(MLP, variable_carry='batch_stats', variable_broadcast='params', split_rngs={'params': False})()
(cs, ys), variables = scan_mlp.apply(p, (), xs, mutable='batch_stats')

But if I wrap up nn.scan into an nn.Module it suddenly stops working:

from flax.core import Scope, Array, init, unfreeze, lift
from flax import linen as nn
import jax
from jax import random, numpy as jnp

class MLP(nn.Module):
    @nn.compact
    def __call__(self, c, x):
        h = nn.Dense(features=10)(x)
        h = nn.BatchNorm(use_running_average=False)(h)
        y = nn.Dense(features=1)(h)
        return c, y

class ScanMLP(nn.Module):
    @nn.compact
    def __call__(self, c, xs):
        scan = nn.scan(MLP,  variable_carry='batch_stats', variable_broadcast='params', split_rngs={'params': False})
        return scan()(c, xs)

xs = jnp.zeros((10, 2))
scan_mlp = ScanMLP()
p = scan_mlp.init(random.PRNGKey(1), (), xs)
(cs, ys), variables = scan_mlp.apply(p, (), xs, mutable='batch_stats')

The code above returns the following error:

TypeError: scan carry output and input must have same type structure, got PyTreeDef((({'batch_stats': {'BatchNorm_0': {'mean': *, 'var': *}}},), ())) and PyTreeDef((({},), ())).

Any idea why these two similar pieces of code behave differently?

Answered by cgarciae

Apr 20, 2022

@denisyarats For now this works:

from flax.core import Scope, Array, init, unfreeze, lift
from flax import linen as nn
import jax
from jax import random, numpy as jnp


class MLP(nn.Module):
    @nn.compact
    def __call__(self, c, x):
        h = nn.Dense(features=10)(x)
        h = nn.BatchNorm(use_running_average=False)(h)
        y = nn.Dense(features=1)(h)
        return c, y

class ScanMLP(nn.Module):
    @nn.compact
    def __call__(self, c, xs):
        scan = nn.scan(
            MLP,
            variable_carry="batch_stats",
            variable_broadcast="params",
            split_rngs={"params": False},
        )

        is_initializing = "batch_stats" not in self.variables
…

View full answer

jheek · 2022-04-19T07:17:52Z

jheek
Apr 19, 2022
Maintainer

The issue here is that JAX doesn't support a carry that changes structure within the loop. For nn.scan that means all variables should already exist when you scan for the first time. In your first example you only use scan during apply and not during init. For apply the batch stats are already initialized so this is not a problem. Eventually I would like to add some functionality to nn.scan that unrolls the loop automatically so the first iteration can initialize carry variables even when using nn.scan.
The quick fix here is to disable the scan during init though. Potentially also saving some time during initialization.

0 replies

cgarciae · 2022-04-20T18:38:51Z

cgarciae
Apr 20, 2022
Maintainer

@denisyarats For now this works:

from flax.core import Scope, Array, init, unfreeze, lift
from flax import linen as nn
import jax
from jax import random, numpy as jnp


class MLP(nn.Module):
    @nn.compact
    def __call__(self, c, x):
        h = nn.Dense(features=10)(x)
        h = nn.BatchNorm(use_running_average=False)(h)
        y = nn.Dense(features=1)(h)
        return c, y

class ScanMLP(nn.Module):
    @nn.compact
    def __call__(self, c, xs):
        scan = nn.scan(
            MLP,
            variable_carry="batch_stats",
            variable_broadcast="params",
            split_rngs={"params": False},
        )

        is_initializing = "batch_stats" not in self.variables

        if is_initializing:
            return MLP(name="MLP")(c, xs)
        else:
            return scan(name="MLP")(c, xs)


xs = jnp.zeros((10, 2))
scan_mlp = ScanMLP()
p = scan_mlp.init(random.PRNGKey(1), (), xs)
print(jax.tree_map(lambda x: x.shape, p))

(cs, ys), updates = scan_mlp.apply(p, (), xs, mutable="batch_stats")
p = p.copy(updates)

print(jax.tree_map(lambda x: x.shape, p))

@jheek per #652 it would be nice to have an easier mechanism to know if we are inside init.

0 replies

denisyarats · 2022-04-20T19:00:40Z

denisyarats
Apr 20, 2022
Author

Thanks @jheek and @cgarciae, verified this on my end as well. The trick is to not initialize nn.scan and give the same name i.e. name="MLP" as @cgarciae suggested.

0 replies

lkhphuc · 2022-05-22T02:32:07Z

lkhphuc
May 22, 2022

Thank @cgarciae for the work around.

I ran into this issue today and while I understand why from a Jax's perspective, from a Flax user's perspective it feels like a bug to me. When the module to be scanned is a bit more complicated and depends on more inputs, it leads to lots of code inside the module to handles whether it's being scanned or not.

Looking forward to this feature from @jheek

Eventually I would like to add some functionality to nn.scan that unrolls the loop automatically so the first iteration can initialize carry variables even when using nn.scan.

0 replies

MRiabov · 2025-04-01T12:03:18Z

MRiabov
Apr 1, 2025

@cgarciae, for flax, is this still the solution? getting the same issue with cache for an autoregressive self-attention cache.
Error:

File "/root/Problemologist-flax/dreamerv3_flax/nets.py", line 499, in __call__
    carry, ys = self.action_sampler(state)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/dreamerv3/lib/python3.12/site-packages/flax/core/axes_scan.py", line 179, in scan_fn
    c, ys = lax.scan(
            ^^^^^^^^^
TypeError: scan body function carry input and carry output must have the same pytree structure, but they differ:

The input carry component c[0][0] is a <class 'dict'> with 0 child but the corresponding component of the carry output is a <class 'dict'> with 1 child, so the numbers of children do not match, with the symmetric difference of key sets: {'cache'}.

Scan code:

self.action_sampler = nn.scan(
            DecoderTransformerActionSingle,
            in_axes=1,
            out_axes=1,
            length=self.num_actions,
            split_rngs={"action": True, "params": False},
            variable_broadcast=["params"],
        )(
            transformer=self.transformer,
            num_heads=self.num_heads,
            num_layers=self.num_layers,
            qkv_dim=self.qkv_dim,
            mlp_dim=self.mlp_dim,
            num_actions=self.num_actions,
            vocab_size=self.vocab_size,
            out_features_per_action=self.out_features_per_action,
            dropout_rate=self.dropout_rate,
            dtype=self.dtype,
        )

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use nn.scan and nn.BatchNorm in Flax? #2067

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How to use nn.scan and nn.BatchNorm in Flax? #2067

Uh oh!

Uh oh!

denisyarats Apr 15, 2022

Replies: 5 comments

Uh oh!

jheek Apr 19, 2022 Maintainer

Uh oh!

Uh oh!

cgarciae Apr 20, 2022 Maintainer

Uh oh!

denisyarats Apr 20, 2022 Author

Uh oh!

lkhphuc May 22, 2022

Uh oh!

Uh oh!

MRiabov Apr 1, 2025

denisyarats
Apr 15, 2022

jheek
Apr 19, 2022
Maintainer

cgarciae
Apr 20, 2022
Maintainer

denisyarats
Apr 20, 2022
Author

lkhphuc
May 22, 2022

MRiabov
Apr 1, 2025