Iโm currently a 3nd-year Ph.D candidate in the Department of Automation at Tsinghua University. Prior to this, I earned my B.Eng. degree from the School of Automation Science and Engineering, Xiโan Jiaotong University in 2023. I worked closely with Shuai Zhang and Zhengqi Wen. I am fortunate to collaborate with Zheng Lian, Haoran Luo, Zhengxi Lu, and Fan Zhang.
My research interests focus on LLM / MLLM reasoning and agentic reinforcement learning. From 2024 to early 2025, my work mainly studied high-quality and high-efficiency reasoning for LLMs. My current research investigates post-training techniques for general agents, including agent skills, on-policy distillation (OPD), and reinforcement learning (RL).
๐ Collaboration: I am looking for motivated collaborators interested in the above topics. If you would like to explore these directions together, feel free to contact me. UG/MSc students are also welcomed! ๐ฑ
๐ฅ News
- 2026.06: ย ๐ฅ๐ฅ We launched OPID, TACO, and Orchestra-o1 were released. OPID was featured as ๐ค HF Daily Paper #3!
- 2026.05: ย ๐ฅ๐ฅ Introducing Maestro, RobotEQ, SDAR. SDAR was featured as ๐ค HF Daily Paper #2!
- 2026.05: ย โฑ๏ธ๐ Attending VALSE 2026 at Wuhan ๐จ๐ณ
- 2026.04: ย ๐ฅ๐ฅ Our new work SKILL0 was released, featured as ๐ค HF Daily Paper #2!
- 2026.04: ย ๐๐ Six papers were accepted by ACL 2026, including one oral and best paper candidate (Double). See you in San Diego ๐บ๐ธ!
- 2026.01: ย ๐๏ธ๐ฅ Attending AAAI 2026 at Singapore ๐ธ๐ฌ
- 2025.11: ย ๐๐ Two papers were accepted by AAAI 2026 (one oral presentation).
- 2025.11: ย ๐๐ Attending EMNLP 2025 in Suzhou ๐จ๐ณ
- 2025.05: ย ๐๐ NoiserBench appeared at ACL 2025.
๐ Publications
* Equal contribution. โ Corresponding author.
๐ค Agentic Post Training

SPARK: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning
Jinyang Wu*, Shuo Yang*, Changpeng Yang, Yuhao Shen, Shuai Zhang, Zhengqi Wen, Jianhua Taoโ
- We propose a policy-aware branching framework that allocates exploration budget to critical decision states, improving sample efficiency for long-horizon agentic RL.

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning
Jinyang Wu*, Guocheng Zhai, Ruihan Jin, Jiahao Yuan, Yuhao Shen, Shuai Zhang, Zhengqi Wen, Jianhua Taoโ
- We introduce a dual-path framework for dynamic tool usage, combining cluster-based routing with RL-based multi-step routing for cross-domain reasoning.

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles
Jinyang Wu, Guocheng Zhai, Ruihan Jin, Yuhao Shen, Zhengxi Lu, Fan Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao
- We formulate model-skill orchestration as a sequential decision process and train a lightweight policy to compose frozen expert models and skills for multimodal tasks.

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning
Shuo Yang*, Jinyang Wu*, Zhengxi Lu, Yuhao Shen, Fan Zhang, Lang Feng, Shuai Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao
- We extract hierarchical skill supervision from completed on-policy trajectories and convert hindsight skill signals into dense token-level advantages for agent training.

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen
- We study skill internalization for agents, gradually withdrawing runtime skill context so the policy can acquire reusable behaviors into its parameters.
- Preprint TACO: Tool-Augmented Credit Optimization for Agentic Tool Use, Mingkuan Feng*, Jinyang Wu*,โ , Hao Gu, Fangrui Lv, Ruihan Jin, Chuyuan Zhang, Zhengqi Wen, Jianhua Tao
- Preprint SDAR: Self-Distilled Agentic Reinforcement Learning, Zhengxi Lu, Zhiyuan Yao, Zhuowen Han, Zi-Han Wang, Jinyang Wu, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen [Code]
- Preprint OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions, Fangzhi Xu, Hang Yan, Qiushi Sun, Jinyang Wu, Zixian Huang, Muye Huang, Jingyang Gong, Zichen Ding, Kanzhi Cheng, Yian Wang, Xinyu Che, Zeyi Sun, Jian Zhang, Zhangyue Yin, Haoran Luo, Xuanjing Huang, Ben Kao, Jun Liu, Qika Lin [Code]
- Preprint Orchestra-o1: Omnimodal Agent Orchestration, Fan Zhang, Vireo Zhang, Shengju Qian, Haoxuan Li, Hao Wu, Jinyang Wu, Donghao Zhou, Zhihong Zhu, Zheng Lian, Xin Wang, Pheng-Ann Heng
- Preprint RobotEQ: Transitioning from Passive Intelligence to Active Intelligence in Embodied AI, Kuofei Fang, Xinyi Che, Haomin Ouyang, Shufan Zhang, Xuehao Wang, Qi Liu, Liyi Liu, Chenqi Zhang, Wenxi Cai, Wenyu Dai, Jinyang Wu, Fan Zhang, Haoyu Chen, Bin He, Zheng Lian
๐ง LLM Reasoning

Beyond Examples: Towards Automated Thought-level In-Context Reasoning for Large Language Models
Jinyang Wu*, Mingkuan Feng*, Shuai Zhang, Feihu Che, Zhengqi Wen, Chonghua Liao, Ling Yang, Haoran Luo, Zheng Lian, Jianhua Tao
- We shift in-context reasoning from example-level imitation to reusable thought patterns, enabling automated and efficient reasoning guidance.

TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning
Jinyang Wu*, Chonghua Liao, Mingkuan Feng, Shuai Zhang, Zhengqi Wen, Haoran Luo, Ling Yang, Huazhe Xu, Jianhua Tao
- We augment policy optimization with structured templates, improving high-quality rollout generation and stabilizing RL training for reasoning.

AStar: Boosting Multimodal Reasoning with Automated Structured Thinking
Jinyang Wu, Mingkuan Feng, Guocheng Zhai, Shuai Zhangโ , Zheng Lian, Fangrui Lv, Pengpeng Shao, Ruihan Jin, Zhengqi Wen, Jianhua Taoโ
- We build a training-free structured thinking method for multimodal reasoning, retrieving reusable thought cards at test time to guide MLLMs.

Two-Stage Regularization-Based Structured Pruning for LLMs
Mingkuan Feng*, Jinyang Wu*, Siyuan Liu, Shuai Zhang, Ruihan Jin, Feihu Che, Pengpeng Shao, Zhengqi Wen, Jianhua Tao
- We introduce a two-stage regularization strategy for structured LLM pruning, preserving more knowledge while reducing model depth without retraining.

Double: Breaking the Acceleration Limit via Double Retrieval Speculative Parallelism
Yuhao Shen, Tianyu Liu, Junyi Shen, Jinyang Wu, Quan Kong, Li Huan, Cong Wang
- We bridge speculative decoding and retrieval-based guidance to push inference acceleration beyond conventional parallel speculative decoding limits.

Jinyang Wu, Shuai Zhangโ , Feihu Che, Mingkuan Feng, Pengpeng Shao, Jianhua Taoโ
- We define a linguistic taxonomy of RAG noise and build NoiserBench to study when retrieval noise harms or surprisingly helps LLM reasoning.
- ICLR 2026 Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs, Ruihan Jin, Pengpeng Shao, Zhengqi Wen, Jinyang Wuโ , Mingkuan Feng, Shuo Yang, Chu Yuan Zhang, Jianhua Tao
- ICLR 2026 Attend to the Active: Structure-Aware Dynamic Attention in LLMs for Compositional Instruction Following, Fangrui Lv, Yulei Qin, Ruixin Hong, Liang Jian, Jinyang Wu, Ke Li, Xing Sun, Changshui Zhang
- AAAI 2026 From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks, Changpeng Yang, Jinyang Wu, Yuchen Liu, Shuai Zhang, Yang Li, Qiliang Liang, Hongzhen Wang, Shuai Nie, Jiaming Xu, Runyu Shi, Ying Huang, Guoquan Zhang
- EMNLP 2025 Findings RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing, Ruihan Jin, Pengpeng Shao, Zhengqi Wen, Jinyang Wu, Mingkuan Feng, Shuai Zhang, Jianhua Tao
- Preprint DReSS: Data-driven Regularized Structured Streamlining for Large Language Models, Mingkuan Feng, Jinyang Wu, Shuai Zhang, Pengpeng Shao, Ruihan Jin, Zhengqi Wen, Jianhua Tao, Feihu Che
- Preprint AffectGPT-RL: Revealing Roles of Reinforcement Learning in Open-Vocabulary Emotion Recognition, Zheng Lian, Fan Zhang, Lan Chen, Yazhou Zhang, Rui Liu, Jinyang Wu, Haoyu Chen, Xiaobai Li, Xiaojiang Peng, Bin He, Jianhua Tao
๐งฌ Biomedical AI
- Briefings in Bioinformatics 2023 KGETCDA: An Efficient Representation Learning Framework Based on Knowledge Graph Encoder from Transformer for Predicting circRNA-Disease Associations, Jinyang Wu, Zhiwei Ning, Yidong Ding, Ying Wang, Qinke Peng, Laiyi Fu
- IEEE JBHI 2023 BertNDA: A Model Based on Graph-BERT and Multi-Scale Information Fusion for ncRNA-Disease Association Prediction, Zhiwei Ning, Jinyang Wu, Yidong Ding, Ying Wang, Qinke Peng, Laiyi Fu
๐จโ๐ซ Teaching
- Teaching Assistant, Affective Computing, graduate course.
- Teaching Assistant, Intelligent Speech Processing, undergraduate interdisciplinary innovation training course.
๐ Honors and Awards
- 2026: Outstanding Teaching Assistant, Tsinghua University.
- 2026: Merit Student, Tsinghua University.
- 2025: First-Class Scholarship of Tsinghua University.
- 2024: Second-Class Scholarship of Tsinghua University.
- 2023: Outstanding Graduate, Shaanxi Province.
- 2023: Outstanding Graduate, Xiโan Jiaotong University.
- 2022: First Prize, Chinese Mathematics Competitions.
- 2021: National Scholarship, Ministry of Education of China.
๐ Educations
- 2023.09 - now: Ph.D. student in Pattern Recognition and Machine Learning, Tsinghua University.
- 2019.09 - 2023.06: B.Eng. in Automation, Xiโan Jiaotong University.
๐งโโ๏ธ Academic Service
- Conference Reviewer: ICLR 2026, ICML 2026, NeurIPS 2026, ARR 2026, AAAI 2026, ECCV 2026, AAAI 2027.
- Journal Reviewer: Expert Systems With Applications (ESWA), ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM).
๐ฌ Invited Talks
- 2025.05: I gave an invited talk on in-context reasoning at the 7th Beijing Universities Artificial Intelligence Academic Forum.
- 2025.12: I gave an invited talk on memory usage, hosted by the Metaverse Technical Committee of the Chinese Association for Artificial Intelligence.