* indicates equal contributioncon.
Preprints
Papers
2023
-
Model-Bellman inconsistency for model-based offline reinforcement learning
| [
Link
Code
]
Yihao Sun*
,
Jiaji Zhang*,
Chengxing Jia,
Haoxin Lin,
Junyin Ye,
and Yang Yu.
In Proceedings of the 40th International Conference on Machine Learning (ICML’23).
2023.
-
Model-based reinforcement learning with multi-step plan value estimation
| [
Link
Code
]
Haoxin Lin*,
Yihao Sun*
,
Jiaji Zhang,
and Yang Yu.
In Proceedings of the 26th European Conference on Artificial Intelligence (ECAI’23).
2023.
2024
-
Episodic return decomposition by difference of implicitly assigned sub-trajectory reward
| [
Link
Code
]
Haoxin Lin,
Hongqiu Wu,
Jiaji Zhang,
Yihao Sun
,
Junyin Ye,
and Yang Yu.
In Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence (AAAI’24).
2024.
-
Flow to better: Offline preference-based reinforcement learning via preferred trajectory generation
| [
Link
Code
]
Zhilong Zhang*,
Yihao Sun*
,
Junyin Ye,
Tianshuo Liu,
Jiaji Zhang,
and Yang Yu.
In Proceedings of the 12th International Conference on Learning Representations (ICLR’24).
2024.
-
Policy-conditioned environment models are more generalizable
| [
Link
Code
]
Ruifeng Chen*,
Xiong-Hui Chen*,
Yihao Sun
,
Siyuan Xiao,
Minhui Li,
and Yang Yu.
In Proceedings of the 41th International Conference on Machine Learning (ICML’24).
2024.
-
Provably and practically efficient adversarial imitation learning with general function approximation
| [
Link
Code
]
Tian Xu,
Zhilong Zhang,
Ruishuo Chen,
Yihao Sun
,
and Yang Yu.
In Advances in Neural Information Processing Systems 38 (NeurIPS’24).
2024.
2025
-
Any-step dynamics model improves future predictions for online and offline reinforcement learning
| [
Link
Code
]
Haoxin Lin,
Yu-Yan Xu,
Yihao Sun
,
Zhilong Zhang,
Yi-Chen Li,
Chengxing Jia,
Junyin Ye,
Jiaji Zhang,
and Yang Yu.
In Proceedings of the 13th International Conference on Learning Representations (ICLR’25).
2025.
-
Improving Reward Model Generalization from Adversarial Process Enhanced Preferences
| [
Link
Code
]
Zhilong Zhang,
Tian Xu,
Xinghao Du,
Xingchen Cao,
Yihao Sun
,
and Yang Yu.
In Proceedings of the 42th International Conference on Machine Learning (ICML’25).
2025.
2026
-
ADM-v2: Pursuing Full-Horizon Roll-out in Dynamics Models for Offline Policy Learning and Evaluation
| [
Link
Code
]
Haoxin Lin,
Siyuan Xiao,
Yi-Chen Li,
Zhilong Zhang,
Yihao Sun
,
Chengxing Jia,
and Yang Yu.
In Proceedings of the 14th International Conference on Learning Representations (ICLR’26).
2026.
-
Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control
| [
Link
Code
]
Zhilong Zhang al.
In Proceedings of the 14th International Conference on Learning Representations (ICLR’26).
2026.