Some surprises when adding Gemma2-2b to flax/examples/gemma

Looking forward to getting to grips with `nnx`, I went about implementing the apparently missing `gemma2_2b` model.

After coming up with a suitable `TransformerConfig`, I was surprised that I couldn't get a forward pass to run : So I added 'Grouped-Query Attention' (mirroring what can be seen in the GDM `gemma` library).  

Still no dice, since (although I checked all the params were getting set correctly), garbage logits were being returned.  So I labouriously went through each layer, checking for numerical differences, only to find that in [`modules.Block`](https://github.com/google/flax/blob/main/examples/gemma/modules.py#L338-L358), the ordering of the sub-modules is clearly in the wrong order (look at the `pre_ffw_norm` and the `post_attn_norm`, and the residual paths, for instance):

```python
class Block(nnx.Module):
    #...
  def __call__(self, x, ...):  
    inputs_normalized = self.pre_attention_norm(x)
    cache, attn_output = self.attn(
        inputs_normalized,
        segment_pos,
        cache,
        attn_mask,
    )
    attn_output += x
    residual = attn_output
    attn_output = self.pre_ffw_norm(attn_output)

    if self.use_post_attn_norm:
      attn_output = self.post_attn_norm(attn_output)
    self.sow_config.maybe_sow_rs_after_attention(attn_output, self)

    outputs = self.mlp(attn_output)
    if self.use_post_ffw_norm:
      outputs = self.post_ffw_norm(outputs)
    outputs = residual + outputs
    self.sow_config.maybe_sow_rs_after_ffw(outputs, self)
    return cache, outputs
```

After fixing this up, the model's final `logits` come out pretty close to the GDM `gemma` ones (small errors seem to creep in along the way due to bfloat16/float16).  This is good news - though it is slightly worrying that this example has been *unusable* over the whole period, since its actual functionality as an implementation of gemma models does not appear to have been tested...

Please let me know if you'd like a PR to cover (at minimum) adding a working `gemma2_2b`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some surprises when adding Gemma2-2b to flax/examples/gemma #4567

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some surprises when adding Gemma2-2b to flax/examples/gemma #4567

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions