| 📑 Paper | 🤗 Hugging Face | 🌐 Blog |
TinyR1 Team
We introduce our first-generation reasoning model, Tiny-R1-32B-Preview, which outperforms the 70B model Deepseek-R1-Distill-Llama-70B and nearly matches the full R1 model in math.
We applied supervised fine-tuning (SFT) to Deepseek-R1-Distill-Qwen-32B across three target domains—Mathematics, Code, and Science — using the 360-LLaMA-Factory training framework to produce three domain-specific models. We used questions from open-source data as seeds. Meanwhile, responses for mathematics, coding, and science tasks were generated by R1, creating specialized models for each domain. Building on this, we leveraged the Mergekit tool from the Arcee team to combine multiple models, creating Tiny-R1-32B-Preview, which demonstrates strong overall performance. For more technical details, please refer to our technical report. Paper Link👁️
Model | Math (AIME 2024) | Coding (LiveCodeBench) | Science (GPQA-Diamond) |
---|---|---|---|
Deepseek-R1-Distill-Qwen-32B | 72.6 | 57.2 | 62.1 |
Deepseek-R1-Distill-Llama-70B | 70.0 | 57.5 | 65.2 |
Deepseek-R1 | 79.8 | 65.9 | 71.5 |
Tiny-R1-32B-Preview (Ours) | 78.1 | 61.6 | 65.0 |
All scores are reported as pass@1. For AIME 2024, we sample 16 responses, and for GPQA-Diamond, we sample 4 responses, both using average overall accuracy for stable evaluation.
We merged the models trained separately in three directions into a single model. Below are the comparison results.
Model | Math (AIME 2024) | Coding (LiveCodeBench) | Science (GPQA-Diamond) |
---|---|---|---|
Math-Model | 73.1 | - | - |
Code-Model | - | 63.4 | - |
Science-Model | - | - | 64.5 |
Merged-Model (Tiny-R1-32B-Preview) | 78.1 | 61.6 | 65.0 |
For multi-node training, please first fill in the train/hostfile
file. For single-node training, this step is not required.
Note
About
hostfile
:
Each line in thehostfile
specifies a node, formatted as<hostname> slots=<num_slots>
, where<hostname>
is the name of the node and<num_slots>
is the number of GPUs available on that node. Here is an example:worker-0 slots=8 worker-1 slots=8
For more details, please refer to the DeepSpeed official documentation.
To install the required dependencies, run:
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn
Hint: Replace BASE_MODEL with the actual path to the base model, e.g., "/model/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B".
BASE_MODEL="/path/to/base-model/"
bash train/run.sh \
--model $BASE_MODEL \
--data-id-path "data/open-r1-math-default-0223.json" \
--output-dir "model_output/branch-math-model" \
--model-max-length 16384 \
--learning-rate 1e-5 \
--lr-scheduler-type constant_with_warmup \
--num-train-epochs 5 \
--save-steps 200 \
--gradient-accumulation-steps 3 \
--template qwen \
--packing_type "packing"
BASE_MODEL="/path/to/base-model/"
bash train/run.sh \
--model $BASE_MODEL \
--data-id-path "data/OpenThoughts-science-with-wrong5k-r1,s1_science_3k-r1,s1_1k-r1" \
--output-dir "model_output/branch-science-model" \
--model-max-length 16384 \
--learning-rate 1e-5 \
--lr-scheduler-type cosine \
--num-train-epochs 5 \
--save-steps 200 \
--gradient-accumulation-steps 1 \
--packing_type "neatpacking" \
--template qwen
BASE_MODEL="/path/to/base-model/"
bash train/run.sh \
--model $BASE_MODEL \
--data-id-path "data/openthoughts-16kseq-0218.json" \
--output-dir "model_output/branch-code-model" \
--model-max-length 16384 \
--learning-rate 1e-5 \
--lr-scheduler-type constant_with_warmup \
--num-train-epochs 15 \
--save-steps 200 \
--gradient-accumulation-steps 3 \
--packing_type "neatpacking" \
--template qwen
To reproduce the merged qihoo360/TinyR1-32B-Preview model, using the script below.
git clone https://github.com/TinyR1-32B-Preview.git
cd TinyR1-32B-Preview/mergekit/
pip install -e .
If you encounter the error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
you can resolve it by following these steps:
Update the package list and install the virtual environment package:
apt-get update -y
apt-get install python3-venv -y
Create a virtual environment and activate the virtual environment:
python3.10 -m venv eval
source eval/bin/activate
After activating the virtual environment, reinstall the required packages. This approach isolates your Python environment from the global packages, thereby preventing dependency conflicts.
sh sh/tinyr1_merge.sh [/path/to/math-model] [/path/to/science-model] [/path/to/code-model] [/path/to/output-model-dir]
The following parameters are mandatory:
-
[/path/to/math-model]
: the path to the math domain model that has been fine-tuned via SFT. -
[/path/to/science-model]
: the path to the science domain model that has been fine-tuned via SFT. -
[/path/to/code-model]
: the path to the code domain model that has been fine-tuned via SFT. -
[/path/to/output-model-dir]
: the path where the fused model will be saved.
We test the resulted models on three kinds of benchmarks, including Math Reasoning, Code Reasoning , and Scientific Reasoning.
Math Reasoning
- AIME24
- AIME25
Scientific Reasoning
- GPQA-Diamond
Code Reasoning
- LiveCodeBench (2408-2502)
The evaluation code is modified from Qwen2.5-Math. In our evaluation, we set the temperature to 0.6, the top-p to 0.95 and the max_tokens to 32768. We provide the example to reproduce our results in math_evaluation.
The system prompt for evaluation is set to:
Please reason step by step, and put your final answer within \\boxed{{}}.
The evaluation code is modified from FuseO1-Preview. In our evaluation, we set the temperature to 0.6 and the max_tokens to 32768. We provide the example to reproduce our results in science_evaluation.
The system prompt for evaluation is set to:
You are a helpful and harmless assistant. You should think step-by-step.
The evaluation code is modified from FuseO1-Preview. In our evaluation, we set the temperature to 0.6, the top-p to 0.95 and the max_tokens to 32768. We provide the example to reproduce our results in code_lcb_evaluation.
The system prompt for evaluation is set to:
A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "qihoo360/TinyR1-32B-Preview"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Please reason step by step, and put your final answer within \\boxed{}. Solve the integral: \[I = \int \frac{x^2}{(x+1)^3} \,dx\]"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=4000
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
@misc{tinyr1proj,
title={TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation},
author={TinyR1 Team},
year={2025},
url={https://arxiv.org/abs/2503.04872},
}