We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama-factory master分支,transformers==4.51.3
Put your message here.
使用最新master训练时,在
if model_type in ["kimi_vl", "deepseek_v3"]: check_version("transformers>=4.51.1") from transformers.models.deepseek_v3.modeling_deepseek_v3 import DeepseekV3MoE _set_z3_leaf_modules(model, [DeepseekV3MoE])
代码块报错如下,这个要如何修改:
[rank0]: Traceback (most recent call last): [rank0]: File "/project/LLaMA-Factory/./src/train.py", line 28, in <module> [rank0]: main() [rank0]: File "/project/LLaMA-Factory/./src/train.py", line 19, in main [rank0]: run_exp() [rank0]: File "/project/LLaMA-Factory/src/llamafactory/train/tuner.py", line 107, in run_exp [rank0]: _training_function(config={"args": args, "callbacks": callbacks}) [rank0]: File "/project/LLaMA-Factory/src/llamafactory/train/tuner.py", line 69, in _training_function [rank0]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) [rank0]: File "/project/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 52, in run_sft [rank0]: model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train) [rank0]: File "/project/LLaMA-Factory/src/llamafactory/model/loader.py", line 175, in load_model [rank0]: patch_model(model, tokenizer, model_args, is_trainable, add_valuehead) [rank0]: File "/project/LLaMA-Factory/src/llamafactory/model/patcher.py", line 170, in patch_model [rank0]: add_z3_leaf_module(model) [rank0]: File "/project/LLaMA-Factory/src/llamafactory/model/model_utils/moe.py", line 61, in add_z3_leaf_module [rank0]: _set_z3_leaf_modules(model, [DeepseekV3MoE]) [rank0]: File "/project/LLaMA-Factory/src/llamafactory/model/model_utils/moe.py", line 33, in _set_z3_leaf_modules [rank0]: set_z3_leaf_modules(model, leaf_modules) [rank0]: File "/root/miniconda/envs/python310_torch25_cuda/lib/python3.10/site-packages/deepspeed/utils/z3_leaf_module.py", line 81, in set_z3_leaf_modules [rank0]: return _do_set_z3_leaf_modules(model, leaf_module_classes, True) [rank0]: File "/root/miniconda/envs/python310_torch25_cuda/lib/python3.10/site-packages/deepspeed/utils/z3_leaf_module.py", line 65, in _do_set_z3_leaf_modules [rank0]: raise ValueError(f'No modules of type {leaf_module_classes} found in model {model}') [rank0]: ValueError: No modules of type [<class 'transformers.models.deepseek_v3.modeling_deepseek_v3.DeepseekV3MoE'>] found in model DeepseekV3ForCausalLM( [rank0]: (model): DeepseekV3Model( [rank0]: (embed_tokens): Embedding(129280, 7168) [rank0]: (layers): ModuleList( [rank0]: (0-2): 3 x DeepseekV3DecoderLayer( [rank0]: (self_attn): DeepseekV3FlashAttention2( [rank0]: (q_a_proj): Linear() [rank0]: (q_a_layernorm): DeepseekV3RMSNorm() [rank0]: (q_b_proj): Linear() [rank0]: (kv_a_proj_with_mqa): Linear() [rank0]: (kv_a_layernorm): DeepseekV3RMSNorm() [rank0]: (kv_b_proj): Linear() [rank0]: (o_proj): Linear() [rank0]: (rotary_emb): DeepseekV3YarnRotaryEmbedding() [rank0]: ) [rank0]: (mlp): DeepseekV3MLP( [rank0]: (gate_proj): Linear() [rank0]: (up_proj): Linear() [rank0]: (down_proj): Linear() [rank0]: (act_fn): SiLU() [rank0]: ) [rank0]: (input_layernorm): DeepseekV3RMSNorm() [rank0]: (post_attention_layernorm): DeepseekV3RMSNorm() [rank0]: ) [rank0]: (3-60): 58 x DeepseekV3DecoderLayer( [rank0]: (self_attn): DeepseekV3FlashAttention2( [rank0]: (q_a_proj): Linear() [rank0]: (q_a_layernorm): DeepseekV3RMSNorm() [rank0]: (q_b_proj): Linear() [rank0]: (kv_a_proj_with_mqa): Linear() [rank0]: (kv_a_layernorm): DeepseekV3RMSNorm() [rank0]: (kv_b_proj): Linear() [rank0]: (o_proj): Linear() [rank0]: (rotary_emb): DeepseekV3YarnRotaryEmbedding() [rank0]: ) [rank0]: (mlp): DeepseekV3MoE( [rank0]: (experts): ModuleList( [rank0]: (0-255): 256 x DeepseekV3MLP( [rank0]: (gate_proj): Linear() [rank0]: (up_proj): Linear() [rank0]: (down_proj): Linear() [rank0]: (act_fn): SiLU() [rank0]: ) [rank0]: ) [rank0]: (gate): MoEGate() [rank0]: (shared_experts): DeepseekV3MLP( [rank0]: (gate_proj): Linear() [rank0]: (up_proj): Linear() [rank0]: (down_proj): Linear() [rank0]: (act_fn): SiLU() [rank0]: ) [rank0]: ) [rank0]: (input_layernorm): DeepseekV3RMSNorm() [rank0]: (post_attention_layernorm): DeepseekV3RMSNorm() [rank0]: ) [rank0]: ) [rank0]: (norm): DeepseekV3RMSNorm() [rank0]: ) [rank0]: (lm_head): Linear() [rank0]: )
The text was updated successfully, but these errors were encountered:
I've also encountered the same problem. Have you solved it?
Sorry, something went wrong.
No branches or pull requests
Reminder
System Info
llama-factory master分支,transformers==4.51.3
Reproduction
Others
使用最新master训练时,在
代码块报错如下,这个要如何修改:
The text was updated successfully, but these errors were encountered: