VLARLKit

An elegant and researcher-friendly RL library for Vision-Language-Action (VLA) models.

VLARLKit

Github repo: VLARLKit.

VLARLKit is a PyTorch-based reinforcement learning library tailored for Vision-Language-Action (VLA) models. It is designed to be friendly and convenient for researchers, with the following features:

Simple and clear implementation, with cleanly separated policy, rollout, runner, and model layers, easy to read, modify, and extend
Environment-decoupled architecture, environments run as independent processes via ZMQ, eliminating dependency conflicts between different benchmark simulators
Async off-policy training, supports asynchronous off-policy training, enabling non-blocking data collection alongside model updates

Supported components

RL Algorithms
- Proximal Policy Optimization (PPO) (on-policy)
- Diffusion Steering via Reinforcement Learning (DSRL) (off-policy)
- RL Token (RLT) (off-policy)
Base Models
- π₀.₅
Benchmarks
- LIBERO
- ManiSkill