Awesome-Offline-Model-Based-Optimization

📰 Must-Read Papers on Offline Model-Based Optimization 🔥

This repository collects important papers for our latest survey: "Offline Model-Based Optimization: Comprehensive Review", which is authored by Minsu Kim, Jiayao Gu, Ye Yuan, Taeyoung Yun, Zixuan Liu, Yoshua Bengio, and Can Chen.

💻: links to the code
📖: links to the bibtex

Latest Updates

[2025/03/23] Our Survey is Publicly Accessible Now: See Our ArXiv Preprint here!
[2025/03/04] First Release of Awesome-Offline-Model-Based Optimization!

🔍 Table of Contents

🌟 What is Offline Model-Based Optimization?
🔗 Benchmark
🎯 Surrogate Modeling
🤔 Generative Modeling
📑 Citing This Survey!

🌟 What is Offline Model-Based Optimization?

In offline optimization, the goal is to discover a new design, denoted by $\boldsymbol{x}^*$, that maximizes the objective(s) $\boldsymbol{f}(\boldsymbol{x})$. This is achieved using an offline dataset $\mathcal{D}$, which consists of $N$ designs paired with their property labels. In particular, the dataset is given by where each design vector $\boldsymbol{x}_i$ belongs to a design space $\mathcal{X} \subseteq \mathbb{R}^d$, and each property label $\boldsymbol{y}_i \in \mathbb{R}^m$ contains the corresponding $m$ objective values for that design. The function $\boldsymbol{f}: \mathcal{X} \rightarrow \mathbb{R}^m$ maps a design to its $m$-dimensional objective value vector.

In offline single-objective optimization, only one objective is considered (i.e., $m=1$). For instance, the design $\boldsymbol{x}$ might represent a neural network architecture, with $f(\boldsymbol{x})$ denoting the network's accuracy on a given dataset. Offline multi-objective optimization extends the framework to simultaneously address multiple objectives. In this setting, the goal is to find solutions that balance competing objectives effectively. For instance, when designing a neural architecture, one might seek to achieve both high accuracy and high efficiency.

We review recent benchmarks, highlighting key tasks and evaluation metrics. We then categorize existing approaches into two key areas: surrogate modeling, which emphasizes accurate function approximation in out-of-distribution regions, and generative modeling, which explores high-dimensional design spaces to identify high-performing designs.

🔗 Benchmark

Task

Evaluation Metric

Usefulness

Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization (Brandon Trabucco & Xinyang Geng et al., ICML 2022) 💻 📖
Biological Sequence Design with GFlowNets (Moksh Jain et al., ICML 2022) 💻 📖
Multiobjective Optimization Using Evolutionary Algorithms — A Comparative Case Study (Eckart Zitzler and Lothar Thiele, Parallel Problem Solving from Nature — PPSN V 1998) 📖
The Balance between Proximity and Diversity in Multiobjective Evolutionary Algorithms (P.A.N. Bosman and D. Thierens, IEEE Transactions on Evolutionary Computation 2003) 📖

Diversity

Biological Sequence Design with GFlowNets (Moksh Jain et al., ICML 2022) 💻 📖
Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences (Minsu Kim et al., NeurIPS 2023) 💻 📖
Improving Protein Optimization with Smoothed Fitness Landscapes (Andrew Kirjner & Jason Yim et al., ICLR 2024) 💻 📖
Diversity By Design: Leveraging Distribution Matching for Offline Model-Based Optimization (Michael S. Yao et al., 2025) 💻 📖

Novelty

Biological Sequence Design with GFlowNets (Moksh Jain et al., ICML 2022) 💻 📖
Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences (Minsu Kim et al., NeurIPS 2023) 💻 📖
Improving Protein Optimization with Smoothed Fitness Landscapes (Andrew Kirjner & Jason Yim et al., ICLR 2024) 💻 📖

Stability

SOO-Bench: Benchmarks for Evaluating the Stability of Offline Black-Box Optimization (Hong Qian et al., ICLR 2025) 💻 📖

🎯 Surrogate Modeling

Auxiliary Loss

Conservative Objective Models for Effective Offline Model-Based Optimization (Brandon Trabucco & Aviral Kumar et al., ICML 2021) 💻 📖
RoMA: Robust Model Adaptation for Offline Model-Based Optimization (Sihyun Yu et al., NeurIPS 2021) 💻 📖
Bidirectional Learning for Offline Infinite-Width Model-Based Optimization (Can Chen et al., NeurIPS 2022) 💻 📖
Data-Driven Offline Decision-Making via Invariant Representation Learning (Han Qi & Yi Su & Aviral Kumar et al., NeurIPS 2022) 💻 📖
Bidirectional Learning for Offline Model-Based Biological Sequence Design (Can Chen et al., ICML 2023) 💻 📖
Parallel-Mentoring for Offline Model-Based Optimization (Can Chen et al., NeurIPS 2023) 💻 📖
Learning Surrogates for Offline Black-Box Optimization via Gradient Matching (Minh Hoang et al., ICML 2024) 💻 📖
Boosting Offline Optimizers with Surrogate Sensitivity (Manh Cuong Dao et al., ICML 2024) 💻 📖
Incorporating Surrogate Gradient Norm to Improve Offline Optimization Techniques (Manh Cuong Dao et al., NeurIPS 2024) 💻 📖
Offline Model-Based Optimization by Learning to Rank (Rong-Xi Tan et al., ICLR 2025) 💻 📖

Data-Drive Adaptation

Autofocused Oracles for Model-Based Design (Clara Fannjiang et al., NeurIPS 2020) 💻 📖
Conservative Objective Models for Effective Offline Model-Based Optimization (Brandon Trabucco & Aviral Kumar et al., ICML 2021) 💻 📖
Bidirectional Learning for Offline Model-Based Biological Sequence Design (Can Chen et al., ICML 2023) 💻 📖
Parallel-Mentoring for Offline Model-Based Optimization (Can Chen et al., NeurIPS 2023) 💻 📖
Importance-Aware Co-Teaching for Offline Model-Based Optimization (Ye Yuan & Can Chen et al., NeurIPS 2023) 💻 📖
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization (Jakub Grudzien Kuba et al., AISTATS 2024) 📖

Collaborative Ensembling

Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation (Justin Fu et al., ICLR 2021) 💻 📖
Conflict-Averse Gradient Optimization of Ensembles for Effective Offline Model-Based Optimization (Sathvik Kolli, 2023) 📖
Parallel-Mentoring for Offline Model-Based Optimization (Can Chen et al., NeurIPS 2023) 💻 📖
Importance-Aware Co-Teaching for Offline Model-Based Optimization (Ye Yuan & Can Chen et al., NeurIPS 2023) 💻 📖

Generative Model Integration

Autofocused Oracles for Model-Based Design (Clara Fannjiang et al., NeurIPS 2020) 💻 📖
Data-Driven Offline Decision-Making via Invariant Representation Learning (Han Qi & Yi Su & Aviral Kumar et al., NeurIPS 2022) 💻 📖
Robust Guided Diffusion for Offline Black-Box Optimization (Can Chen et al., TMLR 2024) 💻 📖

🤔 Generative Modeling

Variational Autoencoder (VAE)

Automatic Chemical Design using a Data-Driven Continuous Representation of Molecules (Rafael Gómez-Bombarelli & Jennifer N. Wei & David Duvenaud & José Miguel Hernández-Lobato et al., ACS central science 2018) 💻 📖
Conditioning by Adaptive Sampling for Robust Design (David H. Brookes et al., ICML 2019) 💻 📖
RoMA: Robust Model Adaptation for Offline Model-Based Optimization (Sihyun Yu et al., NeurIPS 2021) 💻 📖
Latent Bayesian Optimization via Autoregressive Normalizing Flows (Seunghun Lee et al., ICLR 2025) 💻 📖

Generative Adversarial Network (GAN)

Model Inversion Networks for Model-Based Optimization (Aviral Kumar et al., NeurIPS 2019) 💻 📖
Data-Driven Offline Decision-Making via Invariant Representation Learning (Han Qi & Yi Su & Aviral Kumar et al., NeurIPS 2022) 💻 📖
Generative Adversarial Model-Based Optimization via Source Critic Regularization (Michael S. Yao et al., NeurIPS 2024) 💻 📖

Autoregressive Model

Plug and Play Language Models: A Simple Approach to Controlled Text Generation (Sumanth Dathathri et al., ICLR 2020) 💻 📖
Model-Based Reinforcement Learning for Biological Sequence Design (Christof Angermueller et al., ICLR 2020) 📖
Generative Pretraining for Black-Box Optimization (Satvik Mashkaria & Siddarth Krishnamoorthy et al., ICML 2022) 💻 📖
Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences (Minsu Kim et al., NeurIPS 2023) 💻 📖
ExPT: Synthetic Pretraining for Few-Shot Experimental Design (Tung Nguyen et al., NeurIPS 2023) 💻 📖

Diffusion Model

Diffusion Models for Black-Box Optimization (Siddarth Krishnamoorthy et al., ICML 2023) 💻 📖
Exploring Chemical Space with Score-Based Out-of-Distribution Generation (Seul Lee et al., ICML 2023) 💻 📖
Robust Guided Diffusion for Offline Black-Box Optimization (Can Chen et al., TMLR 2024) 💻 📖
Guided Trajectory Generation with Diffusion Models for Offline Model-Based Optimization (Taeyoung Yun et al., NeurIPS 2024) 💻 📖
Design Editing for Offline Model-Based Optimization (Ye Yuan & Youyuan Zhang et al., 2024) 📖
Low To High-Value Designs: Offline Optimization via Generalized Diffusion (Manh Cuong Dao et al., 2025) 📖

Flow Matching

Dirichlet Flow Matching with Applications to DNA Sequence Design (Hannes Stark & Bowen Jing et al., ICML 2024) 💻 📖
ParetoFlow: Guided Flows in Multi-Objective Optimization (Ye Yuan & Can Chen et al., ICLR 2025) 💻 📖
Flow Q-Learning (Seohong Park et al., 2025) 💻 📖
AffinityFlow: Guided Flows for Antibody Affinity Maturation (Can Chen et al., 2025) 📖

Energy-Based Model

Conservative Objective Models Are a Special Kind of Contrastive Divergence-Based Energy Model (Christopher Beckham et al., 2023) 💻 📖
Protein Discovery with Discrete Walk-Jump Sampling (Nathan C. Frey & Daniel Berenberg et al., ICLR 2024) 💻 📖
Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space (Peiyu Yu & Dinghuai Zhang et al., 2024) 📖

Control by Generative Flow Network (GFlowNet)

Biological Sequence Design with GFlowNets (Moksh Jain et al., ICML 2022) 💻 📖
Multi-Objective GFlowNets (Moksh Jain et al., ICML 2023) 💻 📖
Generative Flow Networks Assisted Biological Sequence Editing (Pouya M. Ghari et al., NeurIPS GenBio 2023) 📖
Improved Off-Policy Reinforcement Learning in Biological Sequence Design (Hyeonah Kim et al., NeurIPS AI for New Drug Modalities 2024) 💻 📖
Learning to Scale Logits for Temperature-Conditional GFlowNets (Minsu Kim & Joohwan Ko et al., ICML 2024) 💻 📖
Posterior Inference with Diffusion Models for High-Dimensional Black-box Optimization (Taeyoung Yun et al., 2025) 💻 📖

Citation

If you found our survey paper is useful for your research, please consider cite our work:

@misc{kim2025offline,
      title={Offline Model-Based Optimization: Comprehensive Review}, 
      author={Minsu Kim and Jiayao Gu and Ye Yuan and Taeyoung Yun and Zixuan Liu and Yoshua Bengio and Can Chen},
      year={2025},
      eprint={2503.17286},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.17286}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
bibtex		bibtex
README.md		README.md

mila-iqia/Awesome-Offline-Model-Based-Optimization

Folders and files

Latest commit

History

Repository files navigation

Awesome-Offline-Model-Based-Optimization

Latest Updates

🔍 Table of Contents

🌟 What is Offline Model-Based Optimization?

🔗 Benchmark

Task

Synthetic Function

Real-World System

Scientific Design

Machine Learning Model

Evaluation Metric

Usefulness

Diversity

Novelty

Stability

🎯 Surrogate Modeling

Auxiliary Loss

Data-Drive Adaptation

Collaborative Ensembling

Generative Model Integration

🤔 Generative Modeling

Variational Autoencoder (VAE)

Generative Adversarial Network (GAN)

Autoregressive Model

Diffusion Model

Flow Matching

Energy-Based Model

Control by Generative Flow Network (GFlowNet)

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages