@@ -16,7 +16,7 @@ pip install -r requirements.txt
16
16
17
17
You will also need to run the following to install flash attention:
18
18
```
19
- pip install flash-attn --no-build-isolation
19
+ pip install flash-attn==2.1.0 --no-build-isolation
20
20
```
21
21
22
22
> For flash attention, make sure that the following command returns 0:
@@ -52,27 +52,30 @@ As defaults the scripts assume the model is under at ```./llama-v2-fused-qkv```
52
52
Run:
53
53
```bash
54
54
accelerate launch --config_file configs/default_config.yaml scripts/train.py \
55
- --model_name meta-llama/Llama-2-70b-hf \
56
- --dataset_name "tau/scrolls" --dataset_config_name "gov_report " \
55
+ --dataset_path "./dataset" \
56
+ --model_path "/software/users/ihubara/lora_clean/llama-v2-fused-qkv " \
57
57
--max_seq_len 8192 \
58
58
--bf16 True \
59
- --logging_steps 1 \
60
- --eval_steps 22 \
61
- --output_dir "/tmp /llama-70b " \
59
+ --logging_steps 24 \
60
+ --eval_steps 48 \
61
+ --output_dir "./results /llama-70b_scrolls_gov_report_r16_$1 " \
62
62
--per_device_train_batch_size 1 \
63
63
--gradient_accumulation_steps 1 \
64
- --dataset_text_field "input" \
65
64
--lr_scheduler_type "cosine" \
66
- --learning_rate 1e-3 \
67
- --warmup_ratio 0.03 \
65
+ --learning_rate 4e-4 \
66
+ --weight_decay 0.0001 \
67
+ --warmup_ratio 0 \
68
+ --max_grad_norm 0.3 \
68
69
--use_gradient_checkpointing True \
70
+ --target_eval_loss 0.925 \
69
71
--use_peft_lora True \
70
72
--lora_r 16 \
71
73
--lora_alpha 32 \
72
74
--lora_dropout 0.1 \
73
- --max_steps 440 \
75
+ --max_steps 1024 \
74
76
--use_flash_attn \
75
- --lora_target_modules "q_proj,v_proj,k_proj,o_proj"
77
+ --seed 1234 \
78
+ --lora_target_modules "qkv_proj,o_proj"
76
79
```
77
80
where the Accelerate config file is [ this one] ( https://github.com/regisss/lora/blob/main/configs/default_config.yaml ) .
78
81
0 commit comments