Skip to content

使用最新master 分支训练DeepSeek V3训练,设置DeepseekV3MoE为叶子结点报错 #7800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
Han-Huaqiao opened this issue Apr 22, 2025 · 1 comment
Labels
bug Something isn't working pending This problem is yet to be addressed

Comments

@Han-Huaqiao
Copy link

Reminder

  • I have read the above rules and searched the existing issues.

System Info

llama-factory master分支,transformers==4.51.3

Reproduction

Put your message here.

Others

使用最新master训练时,在

if model_type in ["kimi_vl", "deepseek_v3"]:
        check_version("transformers>=4.51.1")
        from transformers.models.deepseek_v3.modeling_deepseek_v3 import DeepseekV3MoE

        _set_z3_leaf_modules(model, [DeepseekV3MoE])

代码块报错如下,这个要如何修改:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/project/LLaMA-Factory/./src/train.py", line 28, in <module>
[rank0]:     main()
[rank0]:   File "/project/LLaMA-Factory/./src/train.py", line 19, in main
[rank0]:     run_exp()
[rank0]:   File "/project/LLaMA-Factory/src/llamafactory/train/tuner.py", line 107, in run_exp
[rank0]:     _training_function(config={"args": args, "callbacks": callbacks})
[rank0]:   File "/project/LLaMA-Factory/src/llamafactory/train/tuner.py", line 69, in _training_function
[rank0]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]:   File "/project/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 52, in run_sft
[rank0]:     model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
[rank0]:   File "/project/LLaMA-Factory/src/llamafactory/model/loader.py", line 175, in load_model
[rank0]:     patch_model(model, tokenizer, model_args, is_trainable, add_valuehead)
[rank0]:   File "/project/LLaMA-Factory/src/llamafactory/model/patcher.py", line 170, in patch_model
[rank0]:     add_z3_leaf_module(model)
[rank0]:   File "/project/LLaMA-Factory/src/llamafactory/model/model_utils/moe.py", line 61, in add_z3_leaf_module
[rank0]:     _set_z3_leaf_modules(model, [DeepseekV3MoE])
[rank0]:   File "/project/LLaMA-Factory/src/llamafactory/model/model_utils/moe.py", line 33, in _set_z3_leaf_modules
[rank0]:     set_z3_leaf_modules(model, leaf_modules)
[rank0]:   File "/root/miniconda/envs/python310_torch25_cuda/lib/python3.10/site-packages/deepspeed/utils/z3_leaf_module.py", line 81, in set_z3_leaf_modules
[rank0]:     return _do_set_z3_leaf_modules(model, leaf_module_classes, True)
[rank0]:   File "/root/miniconda/envs/python310_torch25_cuda/lib/python3.10/site-packages/deepspeed/utils/z3_leaf_module.py", line 65, in _do_set_z3_leaf_modules
[rank0]:     raise ValueError(f'No modules of type {leaf_module_classes} found in model {model}')
[rank0]: ValueError: No modules of type [<class 'transformers.models.deepseek_v3.modeling_deepseek_v3.DeepseekV3MoE'>] found in model DeepseekV3ForCausalLM(
[rank0]:   (model): DeepseekV3Model(
[rank0]:     (embed_tokens): Embedding(129280, 7168)
[rank0]:     (layers): ModuleList(
[rank0]:       (0-2): 3 x DeepseekV3DecoderLayer(
[rank0]:         (self_attn): DeepseekV3FlashAttention2(
[rank0]:           (q_a_proj): Linear()
[rank0]:           (q_a_layernorm): DeepseekV3RMSNorm()
[rank0]:           (q_b_proj): Linear()
[rank0]:           (kv_a_proj_with_mqa): Linear()
[rank0]:           (kv_a_layernorm): DeepseekV3RMSNorm()
[rank0]:           (kv_b_proj): Linear()
[rank0]:           (o_proj): Linear()
[rank0]:           (rotary_emb): DeepseekV3YarnRotaryEmbedding()
[rank0]:         )
[rank0]:         (mlp): DeepseekV3MLP(
[rank0]:           (gate_proj): Linear()
[rank0]:           (up_proj): Linear()
[rank0]:           (down_proj): Linear()
[rank0]:           (act_fn): SiLU()
[rank0]:         )
[rank0]:         (input_layernorm): DeepseekV3RMSNorm()
[rank0]:         (post_attention_layernorm): DeepseekV3RMSNorm()
[rank0]:       )
[rank0]:       (3-60): 58 x DeepseekV3DecoderLayer(
[rank0]:         (self_attn): DeepseekV3FlashAttention2(
[rank0]:           (q_a_proj): Linear()
[rank0]:           (q_a_layernorm): DeepseekV3RMSNorm()
[rank0]:           (q_b_proj): Linear()
[rank0]:           (kv_a_proj_with_mqa): Linear()
[rank0]:           (kv_a_layernorm): DeepseekV3RMSNorm()
[rank0]:           (kv_b_proj): Linear()
[rank0]:           (o_proj): Linear()
[rank0]:           (rotary_emb): DeepseekV3YarnRotaryEmbedding()
[rank0]:         )
[rank0]:         (mlp): DeepseekV3MoE(
[rank0]:           (experts): ModuleList(
[rank0]:             (0-255): 256 x DeepseekV3MLP(
[rank0]:               (gate_proj): Linear()
[rank0]:               (up_proj): Linear()
[rank0]:               (down_proj): Linear()
[rank0]:               (act_fn): SiLU()
[rank0]:             )
[rank0]:           )
[rank0]:           (gate): MoEGate()
[rank0]:           (shared_experts): DeepseekV3MLP(
[rank0]:             (gate_proj): Linear()
[rank0]:             (up_proj): Linear()
[rank0]:             (down_proj): Linear()
[rank0]:             (act_fn): SiLU()
[rank0]:           )
[rank0]:         )
[rank0]:         (input_layernorm): DeepseekV3RMSNorm()
[rank0]:         (post_attention_layernorm): DeepseekV3RMSNorm()
[rank0]:       )
[rank0]:     )
[rank0]:     (norm): DeepseekV3RMSNorm()
[rank0]:   )
[rank0]:   (lm_head): Linear()
[rank0]: )
@Han-Huaqiao Han-Huaqiao added bug Something isn't working pending This problem is yet to be addressed labels Apr 22, 2025
@li199959
Copy link

I've also encountered the same problem. Have you solved it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants