Highlights

GRPO support for trl and verl trainers

Oumi now supports GRPO training for both the trl and verl libraries! This allows you to run GRPO training with no/low code using Oumi's configs. You can also benefit from other features of the Oumi platform, such as custom evaluation and launching remote jobs.

Running GRPO training in Oumi is as simple as:

Create a reward function, and register it to Oumi's reward function registry using @register("<my_reward_fn>", RegistryType.REWARD_FUNCTION).
Create a dataset class to process your HF dataset into the format needed for your target framework, and register it to Oumi's dataset registry using @register_dataset("@hf-org-name/my-dataset-name").
Create an Oumi training config with your model, dataset, reward function, and hyperparameters. For specific details on setting up the config for GRPO, see our documentation.
Launch the training job locally using the oumi train CLI, or launch a remote job using the oumi launch CLI.

For an end-to-end example using Oumi + trl, check out our notebook walkthrough. For verl, check out our multi-modal Geometry3K config. Finally, check out our blog post for more information.

Models built with Oumi: HallOumi and CoALM

We’re proud to announce the release of two models built with Oumi: HallOumi and CoALM! Both of these were trained on Oumi, and we provide recipes to reproduce their training from scratch.

🧀 HallOumi: A truly open-source claim verification (hallucination detection) model developed by Oumi, outperforming Claude Sonnet, OpenAI o1, DeepSeek R1, Llama 405B, and Gemini Pro at only 8B parameters. Check out the Oumi recipe to train the model here.
🤖 CoALM: Conversational Agentic Language Model (CoALM) is a a unified approach that integrates both conversational and agentic capabilities. It includes an instruction tuning dataset and three trained models (8B, 70B, 405B). The project was a partnership between the ConvAI Lab at UIUC and Oumi, and the paper was accepted to ACL. Check out the Oumi recipes to train the models here.

New model support: Llama 4, Qwen3, Falcon H1, and more

We’ve added support for many recent models to Oumi, with tested recipes that work out-of-the-box!

Vision Language Models
Text-to-text LLMs
- Falcon-H1 and Falcon-E
- Qwen3
- Phi-4-reasoning

Support for Slurm and Frontier clusters

At Oumi, we want unify and simplify the processes for running jobs on remote clusters. We have now added support for launching jobs on Slurm clusters, and on Frontier, a supercomputer at the Oak Ridge Leadership Computing Facility.

What's Changed

[bugfix] Allow prerelease when building docker image by @oelachqar in #1753
Update link to Oumi banner image in README by @wizeng23 in #1752
docs: add a badge and link to the social network Twitter by @Radovenchyk in #1751
Support OLCF (Oak Ridge Leadership Computing Facility) Frontier HPC cluster in Oumi launcher by @nikg4 in #1721
Judge API V2 | Core Functionality by @kaisopos in #1717
Update oumi distributed torchrun to fallback to oumi train -c cfg.yaml .... on a single-node with 1 GPU by @nikg4 in #1755
deps: Upgrade verl to 0.4.0 by @wizeng23 in #1749
add DCVLR logo to readme by @penfever in #1754
Judge API V2 | Few-Shots by @kaisopos in #1746
Update infer.md to fix a broken link by @ryan-arman in #1756
Judge API V2 | minor nit by @kaisopos in #1757
[Evaluation] Disabling flaky MMMU test by @kaisopos in #1758
Automatically tail SkyPilot logs by @wizeng23 in #1761
Enable vLLM for trl GRPO jobs by @wizeng23 in #1760
Judge API V2 | Implement CLI by @kaisopos in #1759
Updates to Oumi news for May, June by @stefanwebb in #1763
Additional news items by @stefanwebb in #1764
Judge API V2 | Support for built-in judges by @kaisopos in #1762
[bug] safetensors v0.6.0rc0 is causing a regression, prevent upgrading by @oelachqar in #1772
[verl] Support resuming from checkpoint by @wizeng23 in #1766
Upgrade accelerate and peft by @wizeng23 in #1774
[tiny] Pin flash-attn version by @wizeng23 in #1775
Pin the version of lm_eval to prevent a breaking change in the 4.9 release by @taenin in #1777
Update inference to resume from temporary result file when possible by @jgreer013 in #1734
[tiny] Fix gradient checkpointing for Oumi trainer by @wizeng23 in #1778
[tiny] Remove use_liger argument by @wizeng23 in #1779
Judge API V2 | Merge Judge and Inference configs by @kaisopos in #1776

Full Changelog: v0.1.14...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.2.0