Publications

* indicates equal contributioncon.

Preprints

Papers

2023

ICML

Model-Bellman inconsistency for model-based offline reinforcement learning | [ Link Code ]

Yihao Sun* , Jiaji Zhang*, Chengxing Jia, Haoxin Lin, Junyin Ye, and Yang Yu.

In Proceedings of the 40th International Conference on Machine Learning (ICML’23). 2023.
ECAI

Model-based reinforcement learning with multi-step plan value estimation | [ Link Code ]

Haoxin Lin*, Yihao Sun* , Jiaji Zhang, and Yang Yu.

In Proceedings of the 26th European Conference on Artificial Intelligence (ECAI’23). 2023.

2024

AAAI

Episodic return decomposition by difference of implicitly assigned sub-trajectory reward | [ Link Code ]

Haoxin Lin, Hongqiu Wu, Jiaji Zhang, Yihao Sun , Junyin Ye, and Yang Yu.

In Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence (AAAI’24). 2024.
ICLR

Flow to better: Offline preference-based reinforcement learning via preferred trajectory generation | [ Link Code ]

Zhilong Zhang*, Yihao Sun* , Junyin Ye, Tianshuo Liu, Jiaji Zhang, and Yang Yu.

In Proceedings of the 12th International Conference on Learning Representations (ICLR’24). 2024.
ICML

Policy-conditioned environment models are more generalizable | [ Link Code ]

Ruifeng Chen*, Xiong-Hui Chen*, Yihao Sun , Siyuan Xiao, Minhui Li, and Yang Yu.

In Proceedings of the 41th International Conference on Machine Learning (ICML’24). 2024.
NeurIPS

Provably and practically efficient adversarial imitation learning with general function approximation | [ Link Code ]

Tian Xu, Zhilong Zhang, Ruishuo Chen, Yihao Sun , and Yang Yu.

In Advances in Neural Information Processing Systems 38 (NeurIPS’24). 2024.

2025

ICLR

Any-step dynamics model improves future predictions for online and offline reinforcement learning | [ Link Code ]

Haoxin Lin, Yu-Yan Xu, Yihao Sun , Zhilong Zhang, Yi-Chen Li, Chengxing Jia, Junyin Ye, Jiaji Zhang, and Yang Yu.

In Proceedings of the 13th International Conference on Learning Representations (ICLR’25). 2025.
ICML

Improving Reward Model Generalization from Adversarial Process Enhanced Preferences | [ Link Code ]

Zhilong Zhang, Tian Xu, Xinghao Du, Xingchen Cao, Yihao Sun , and Yang Yu.

In Proceedings of the 42th International Conference on Machine Learning (ICML’25). 2025.

2026

ICLR

ADM-v2: Pursuing Full-Horizon Roll-out in Dynamics Models for Offline Policy Learning and Evaluation | [ Link Code ]

Haoxin Lin, Siyuan Xiao, Yi-Chen Li, Zhilong Zhang, Yihao Sun , Chengxing Jia, and Yang Yu.

In Proceedings of the 14th International Conference on Learning Representations (ICLR’26). 2026.
ICLR

Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control | [ Link Code ]

Zhilong Zhang al.

In Proceedings of the 14th International Conference on Learning Representations (ICLR’26). 2026.