nnx.value_and_grad fail to update gradient #4678

LIANHAN3920 · 2025-04-02T21:47:40Z

LIANHAN3920
Apr 2, 2025

@jit
def train_step(state: TrainState, batch):
    x, y = batch
    def loss_fn(params):
        model = nnx.merge(state.graphdef, params)
        preds = model(x)
        loss = jnp.mean((preds - y) ** 2)
        return loss
    # grads = jax.grad(loss_fn)(state.params) <---------working 
    loss, grads = nnx.value_and_grad(loss_fn)(state.params) <------------not working
    state = state.apply_gradients(grads=grads)
    return state

In this code, when I use nnx.value_and_grad to compute gradient of loss funtion, it fails to update at every iteration, but jax.grad works

dosssman · 2025-04-21T06:17:13Z

dosssman
Apr 21, 2025

I think the batch data should also be passed to loss_fn ?

@jit
def train_step(state: TrainState, batch):
    def loss_fn(params, batch):
        x, y = batch
        model = nnx.merge(state.graphdef, params)
        preds = model(x)
        loss = jnp.mean((preds - y) ** 2)
        return loss
    # grads = jax.grad(loss_fn)(state.params, batch) <---------working 
    loss, grads = nnx.value_and_grad(loss_fn)(state.params, batch) <------------not working
    state = state.apply_gradients(grads=grads)
    return state

1 reply

DiagRisker May 3, 2025

You don't need to use state.apply_gradients to update the state, you can wrap jax.tree_util.tree_map( <update function), ), this should work:

import jax, jax.numpy as jnp; from jax.tree_util import Partial

def loss_fn(params,batch):
    x, y = batch
    model = nnx.merge(state.graphdef, params) 
    loss = jnp.mean((model(x)- y) ** 2)
    return loss

def Gjacob(f, wrapper = False, static_argnums = None, static_argnames = None ):
    """ General Gradient of respective parameters (projecting jacobian on canonical directions (tangent valued of ones)  
        Usage : 
            direct gradient:
                Gjacob(fun , *args, wrapper = False)
            Compiled gradient:
            Gjacob(fun , static_argnums=<ordered indices whom which f is not to be differentiated>)
            _!!_  not suited for training with batches! (compilation at each batch is very slow)
    """
    def Gacob(*x):
        y, vjp_fn = jax.vjp(f, *x)
        return y, vjp_fn(jnp.ones_like(y)) # tangent values at 1
    return Gacob

def GDupdate(params, dW, learnrate = 1e-3 ): # you can tweak the update rule to become any update rule such as ADAM, and others
    """ gradient update SGD compatible """  # Note that `params` is a pytree .  
    # This is useful - you can apply the optimization update using JAX pytree utilities.
    return jax.tree.map(
      lambda p, g: p - learnrate * g.conjugate(), params, dW
    )

Graph,RngState, State = nnx.split(model , nnx.RngState,...)
# Initialize my batch, THEN:
Y,G = Gjacob( Partial(loss_fn, batch = mybatch))(State)
NewState = GDupdate( State, grads=G)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nnx.value_and_grad fail to update gradient #4678

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

nnx.value_and_grad fail to update gradient #4678

Uh oh!

LIANHAN3920 Apr 2, 2025

Replies: 1 comment · 1 reply

Uh oh!

dosssman Apr 21, 2025

Uh oh!

Uh oh!

DiagRisker May 3, 2025

LIANHAN3920
Apr 2, 2025

Replies: 1 comment 1 reply

dosssman
Apr 21, 2025