Publications

* indicates equal contributioncon.

Preprints

          Papers

          2023
          1. ICML
            Model-Bellman inconsistency for model-based offline reinforcement learning | [ Link Code ]
            Yihao Sun* , Jiaji Zhang*, Chengxing Jia, Haoxin Lin, Junyin Ye, and Yang Yu.
            In Proceedings of the 40th International Conference on Machine Learning (ICML’23). 2023.
          2. ECAI
            Model-based reinforcement learning with multi-step plan value estimation | [ Link Code ]
            Haoxin Lin*, Yihao Sun* , Jiaji Zhang, and Yang Yu.
            In Proceedings of the 26th European Conference on Artificial Intelligence (ECAI’23). 2023.
          2024
          1. AAAI
            Episodic return decomposition by difference of implicitly assigned sub-trajectory reward | [ Link Code ]
            Haoxin Lin, Hongqiu Wu, Jiaji Zhang, Yihao Sun , Junyin Ye, and Yang Yu.
            In Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence (AAAI’24). 2024.
          2. ICLR
            Flow to better: Offline preference-based reinforcement learning via preferred trajectory generation | [ Link Code ]
            Zhilong Zhang*, Yihao Sun* , Junyin Ye, Tianshuo Liu, Jiaji Zhang, and Yang Yu.
            In Proceedings of the 12th International Conference on Learning Representations (ICLR’24). 2024.
          3. ICML
            Policy-conditioned environment models are more generalizable | [ Link Code ]
            Ruifeng Chen*, Xiong-Hui Chen*, Yihao Sun , Siyuan Xiao, Minhui Li, and Yang Yu.
            In Proceedings of the 41th International Conference on Machine Learning (ICML’24). 2024.
          4. NeurIPS
            Provably and practically efficient adversarial imitation learning with general function approximation | [ Link Code ]
            Tian Xu, Zhilong Zhang, Ruishuo Chen, Yihao Sun , and Yang Yu.
            In Advances in Neural Information Processing Systems 38 (NeurIPS’24). 2024.
          2025
          1. ICLR
            Any-step dynamics model improves future predictions for online and offline reinforcement learning | [ Link Code ]
            Haoxin Lin, Yu-Yan Xu, Yihao Sun , Zhilong Zhang, Yi-Chen Li, Chengxing Jia, Junyin Ye, Jiaji Zhang, and Yang Yu.
            In Proceedings of the 13th International Conference on Learning Representations (ICLR’25). 2025.
          2. ICML
            Improving Reward Model Generalization from Adversarial Process Enhanced Preferences | [ Link Code ]
            Zhilong Zhang, Tian Xu, Xinghao Du, Xingchen Cao, Yihao Sun , and Yang Yu.
            In Proceedings of the 42th International Conference on Machine Learning (ICML’25). 2025.
          2026
          1. ICLR
            ADM-v2: Pursuing Full-Horizon Roll-out in Dynamics Models for Offline Policy Learning and Evaluation | [ Link Code ]
            Haoxin Lin, Siyuan Xiao, Yi-Chen Li, Zhilong Zhang, Yihao Sun , Chengxing Jia, and Yang Yu.
            In Proceedings of the 14th International Conference on Learning Representations (ICLR’26). 2026.
          2. ICLR
            Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control | [ Link Code ]
            Zhilong Zhang al.
            In Proceedings of the 14th International Conference on Learning Representations (ICLR’26). 2026.