Jackpot: Optimal Budgeted Rejection Sampling for Extreme Actor–Policy Mismatch RL

Official repository for Jackpot.

Authors:
Zhuoming Chen*, Hongyi Liu*, Yang Zhou*, Haizhong Zheng, Beidi Chen
Carnegie Mellon University
(* = Equal Contributions, alphabetically ordering based on lastnames)

📄 Download Paper (PDF)

Abstract

Reinforcement learning (RL) for large language models (LLMs) remains expensive, particularly because the rollout is expensive. Decoupling rollout generation from policy optimization (e.g., leveraging a more efficient model to rollout) could enable substantial efficiency gains, yet doing so introduces severe distribution mismatch that destabilizes learning. We propose Jackpot, a framework that leverages Optimal Budget Rejection Sampling (OBRS) to directly reduce the discrepancy between the rollout model and the evolving policy. Jackpot integrates a principled OBRS procedure, a unified training objective that jointly updates the policy and rollout models, and an efficient system implementation enabled by top-k probability estimation and batch-level bias correction. Our theoretical analysis shows that OBRS consistently moves the rollout distribution closer to the target distribution under a controllable acceptance budget. Empirically, Jackpot substantially improves training stability compared to importance-sampling baselines, achieving performance comparable to on-policy RL when training Qwen3-8B-Base for up to 300 update steps. Taken together, our results show that OBRS-based alignment brings us a step closer to practical and effective decoupling of rollout generation from policy optimization for RL for LLMs.

Bibliography

If you think our work is helpful, please consider citing us using the following BibTeX.

@misc{jackpot2025github,
  title        = {Jackpot: Optimal Budgeted Rejection Sampling for Extreme Actor-Policy Mismatch RL},
  author       = {Liu, Hongyi and Chen, Zhuoming and Zhou, Yang and Zheng, Haizhong and Chen, Beidi},
  howpublished = {\url{https://github.com/Infini-AI-Lab/jackpot.git}},
  note         = {Official GitHub repository},
  year         = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
jpt.pdf		jpt.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Jackpot: Optimal Budgeted Rejection Sampling for Extreme Actor–Policy Mismatch RL

Abstract

Bibliography

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Infini-AI-Lab/jackpot

Folders and files

Latest commit

History

Repository files navigation

Jackpot: Optimal Budgeted Rejection Sampling for Extreme Actor–Policy Mismatch RL

Abstract

Bibliography

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages