Highlights
GRPO support for trl and verl trainers
Oumi now supports GRPO training for both the trl and verl libraries! This allows you to run GRPO training with no/low code using Oumi's configs. You can also benefit from other features of the Oumi platform, such as custom evaluation and launching remote jobs.
Running GRPO training in Oumi is as simple as:
- Create a reward function, and register it to Oumi's reward function registry using
@register("<my_reward_fn>", RegistryType.REWARD_FUNCTION)
. - Create a dataset class to process your HF dataset into the format needed for your target framework, and register it to Oumi's dataset registry using
@register_dataset("@hf-org-name/my-dataset-name")
. - Create an Oumi training config with your model, dataset, reward function, and hyperparameters. For specific details on setting up the config for GRPO, see our documentation.
- Launch the training job locally using the oumi train CLI, or launch a remote job using the oumi launch CLI.
For an end-to-end example using Oumi + trl, check out our notebook walkthrough. For verl, check out our multi-modal Geometry3K config. Finally, check out our blog post for more information.
Models built with Oumi: HallOumi and CoALM
We’re proud to announce the release of two models built with Oumi: HallOumi and CoALM! Both of these were trained on Oumi, and we provide recipes to reproduce their training from scratch.
- 🧀 HallOumi: A truly open-source claim verification (hallucination detection) model developed by Oumi, outperforming Claude Sonnet, OpenAI o1, DeepSeek R1, Llama 405B, and Gemini Pro at only 8B parameters. Check out the Oumi recipe to train the model here.
- 🤖 CoALM: Conversational Agentic Language Model (CoALM) is a a unified approach that integrates both conversational and agentic capabilities. It includes an instruction tuning dataset and three trained models (8B, 70B, 405B). The project was a partnership between the ConvAI Lab at UIUC and Oumi, and the paper was accepted to ACL. Check out the Oumi recipes to train the models here.
New model support: Llama 4, Qwen3, Falcon H1, and more
We’ve added support for many recent models to Oumi, with tested recipes that work out-of-the-box!
- Vision Language Models
- Text-to-text LLMs
Support for Slurm and Frontier clusters
At Oumi, we want unify and simplify the processes for running jobs on remote clusters. We have now added support for launching jobs on Slurm clusters, and on Frontier, a supercomputer at the Oak Ridge Leadership Computing Facility.
What's Changed
- [bugfix] Allow prerelease when building docker image by @oelachqar in #1753
- Update link to Oumi banner image in README by @wizeng23 in #1752
- docs: add a badge and link to the social network Twitter by @Radovenchyk in #1751
- Support OLCF (Oak Ridge Leadership Computing Facility) Frontier HPC cluster in Oumi launcher by @nikg4 in #1721
- Judge API V2 | Core Functionality by @kaisopos in #1717
- Update
oumi distributed torchrun
to fallback tooumi train -c cfg.yaml ....
on a single-node with 1 GPU by @nikg4 in #1755 - deps: Upgrade verl to 0.4.0 by @wizeng23 in #1749
- add DCVLR logo to readme by @penfever in #1754
- Judge API V2 | Few-Shots by @kaisopos in #1746
- Update infer.md to fix a broken link by @ryan-arman in #1756
- Judge API V2 | minor nit by @kaisopos in #1757
- [Evaluation] Disabling flaky MMMU test by @kaisopos in #1758
- Automatically tail SkyPilot logs by @wizeng23 in #1761
- Enable vLLM for trl GRPO jobs by @wizeng23 in #1760
- Judge API V2 | Implement CLI by @kaisopos in #1759
- Updates to Oumi news for May, June by @stefanwebb in #1763
- Additional news items by @stefanwebb in #1764
- Judge API V2 | Support for built-in judges by @kaisopos in #1762
- [bug] safetensors v0.6.0rc0 is causing a regression, prevent upgrading by @oelachqar in #1772
- [verl] Support resuming from checkpoint by @wizeng23 in #1766
- Upgrade accelerate and peft by @wizeng23 in #1774
- [tiny] Pin flash-attn version by @wizeng23 in #1775
- Pin the version of lm_eval to prevent a breaking change in the 4.9 release by @taenin in #1777
- Update inference to resume from temporary result file when possible by @jgreer013 in #1734
- [tiny] Fix gradient checkpointing for Oumi trainer by @wizeng23 in #1778
- [tiny] Remove
use_liger
argument by @wizeng23 in #1779 - Judge API V2 | Merge Judge and Inference configs by @kaisopos in #1776
Full Changelog: v0.1.14...v0.2.0