Iโ€™m currently a 3nd-year Ph.D candidate in the Department of Automation at Tsinghua University. Prior to this, I earned my B.Eng. degree from the School of Automation Science and Engineering, Xiโ€™an Jiaotong University in 2023. I worked closely with Shuai Zhang and Zhengqi Wen. I am fortunate to collaborate with Zheng Lian, Haoran Luo, Zhengxi Lu, and Fan Zhang.

My research interests focus on LLM / MLLM reasoning and agentic reinforcement learning. From 2024 to early 2025, my work mainly studied high-quality and high-efficiency reasoning for LLMs. My current research investigates post-training techniques for general agents, including agent skills, on-policy distillation (OPD), and reinforcement learning (RL).

๐Ÿˆ Collaboration: I am looking for motivated collaborators interested in the above topics. If you would like to explore these directions together, feel free to contact me. UG/MSc students are also welcomed! ๐ŸŒฑ

๐Ÿ”ฅ News

  • 2026.06: ย ๐Ÿ”ฅ๐Ÿ”ฅ We launched OPID, TACO, and Orchestra-o1 were released. OPID was featured as ๐Ÿค— HF Daily Paper #3!
  • 2026.05: ย ๐Ÿ”ฅ๐Ÿ”ฅ Introducing Maestro, RobotEQ, SDAR. SDAR was featured as ๐Ÿค— HF Daily Paper #2!
  • 2026.05: ย โ›ฑ๏ธ๐Ÿ‘ Attending VALSE 2026 at Wuhan ๐Ÿ‡จ๐Ÿ‡ณ
  • 2026.04: ย ๐Ÿ”ฅ๐Ÿ”ฅ Our new work SKILL0 was released, featured as ๐Ÿค— HF Daily Paper #2!
  • 2026.04: ย ๐ŸŽ‰๐ŸŽ‰ Six papers were accepted by ACL 2026, including one oral and best paper candidate (Double). See you in San Diego ๐Ÿ‡บ๐Ÿ‡ธ!
  • 2026.01: ย ๐Ÿ–๏ธ๐Ÿฅ‚ Attending AAAI 2026 at Singapore ๐Ÿ‡ธ๐Ÿ‡ฌ
  • 2025.11: ย ๐ŸŽ‰๐ŸŽ‰ Two papers were accepted by AAAI 2026 (one oral presentation).
  • 2025.11: ย ๐ŸŽ‰๐ŸŽ‰ Attending EMNLP 2025 in Suzhou ๐Ÿ‡จ๐Ÿ‡ณ
  • 2025.05: ย ๐ŸŽ‰๐ŸŽ‰ NoiserBench appeared at ACL 2025.

๐Ÿ“ Publications

* Equal contribution. โ€  Corresponding author.

๐Ÿค– Agentic Post Training

ACL 2026
SPARK

SPARK: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning

Jinyang Wu*, Shuo Yang*, Changpeng Yang, Yuhao Shen, Shuai Zhang, Zhengqi Wen, Jianhua Taoโ€ 

[Paper] | [Code]

  • We propose a policy-aware branching framework that allocates exploration budget to critical decision states, improving sample efficiency for long-horizon agentic RL.
ACL 2026
Atlas

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

Jinyang Wu*, Guocheng Zhai, Ruihan Jin, Jiahao Yuan, Yuhao Shen, Shuai Zhang, Zhengqi Wen, Jianhua Taoโ€ 

[Paper]

  • We introduce a dual-path framework for dynamic tool usage, combining cluster-based routing with RL-based multi-step routing for cross-domain reasoning.
Preprint
Maestro

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

Jinyang Wu, Guocheng Zhai, Ruihan Jin, Yuhao Shen, Zhengxi Lu, Fan Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao

[Paper] | [Code]

  • We formulate model-skill orchestration as a sequential decision process and train a lightweight policy to compose frozen expert models and skills for multimodal tasks.
Preprint
OPID

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Shuo Yang*, Jinyang Wu*, Zhengxi Lu, Yuhao Shen, Fan Zhang, Lang Feng, Shuai Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao

[Paper] | [Code]

  • We extract hierarchical skill supervision from completed on-policy trajectories and convert hindsight skill signals into dense token-level advantages for agent training.
Preprint
SKILL0

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

[Paper] | [Code]

  • We study skill internalization for agents, gradually withdrawing runtime skill context so the policy can acquire reusable behaviors into its parameters.

๐Ÿง  LLM Reasoning

ACL 2026
ThoughtICR

Beyond Examples: Towards Automated Thought-level In-Context Reasoning for Large Language Models

Jinyang Wu*, Mingkuan Feng*, Shuai Zhang, Feihu Che, Zhengqi Wen, Chonghua Liao, Ling Yang, Haoran Luo, Zheng Lian, Jianhua Tao

[Paper] | [Code]

  • We shift in-context reasoning from example-level imitation to reusable thought patterns, enabling automated and efficient reasoning guidance.
ACL 2026
TemplateRL

TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning

Jinyang Wu*, Chonghua Liao, Mingkuan Feng, Shuai Zhang, Zhengqi Wen, Haoran Luo, Ling Yang, Huazhe Xu, Jianhua Tao

[Paper]

  • We augment policy optimization with structured templates, improving high-quality rollout generation and stabilizing RL training for reasoning.
AAAI 2026 Oral
AStar

AStar: Boosting Multimodal Reasoning with Automated Structured Thinking

Jinyang Wu, Mingkuan Feng, Guocheng Zhai, Shuai Zhangโ€ , Zheng Lian, Fangrui Lv, Pengpeng Shao, Ruihan Jin, Zhengqi Wen, Jianhua Taoโ€ 

[Paper]

  • We build a training-free structured thinking method for multimodal reasoning, retrieving reusable thought cards at test time to guide MLLMs.
ACL 2026
TRSP

Two-Stage Regularization-Based Structured Pruning for LLMs

Mingkuan Feng*, Jinyang Wu*, Siyuan Liu, Shuai Zhang, Ruihan Jin, Feihu Che, Pengpeng Shao, Zhengqi Wen, Jianhua Tao

[Paper]

  • We introduce a two-stage regularization strategy for structured LLM pruning, preserving more knowledge while reducing model depth without retraining.
ACL 2026 Oral & Best Paper Candidate
Double

Double: Breaking the Acceleration Limit via Double Retrieval Speculative Parallelism

Yuhao Shen, Tianyu Liu, Junyi Shen, Jinyang Wu, Quan Kong, Li Huan, Cong Wang

[Paper]

  • We bridge speculative decoding and retrieval-based guidance to push inference acceleration beyond conventional parallel speculative decoding limits.
ACL 2025
NoiserBench

Pandoraโ€™s Box or Aladdinโ€™s Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models

Jinyang Wu, Shuai Zhangโ€ , Feihu Che, Mingkuan Feng, Pengpeng Shao, Jianhua Taoโ€ 

[Paper] | [Code]

  • We define a linguistic taxonomy of RAG noise and build NoiserBench to study when retrieval noise harms or surprisingly helps LLM reasoning.

๐Ÿงฌ Biomedical AI

๐Ÿ‘จโ€๐Ÿซ Teaching

  • Teaching Assistant, Affective Computing, graduate course.
  • Teaching Assistant, Intelligent Speech Processing, undergraduate interdisciplinary innovation training course.

๐ŸŽ– Honors and Awards

  • 2026: Outstanding Teaching Assistant, Tsinghua University.
  • 2026: Merit Student, Tsinghua University.
  • 2025: First-Class Scholarship of Tsinghua University.
  • 2024: Second-Class Scholarship of Tsinghua University.
  • 2023: Outstanding Graduate, Shaanxi Province.
  • 2023: Outstanding Graduate, Xiโ€™an Jiaotong University.
  • 2022: First Prize, Chinese Mathematics Competitions.
  • 2021: National Scholarship, Ministry of Education of China.

๐Ÿ“– Educations

  • 2023.09 - now: Ph.D. student in Pattern Recognition and Machine Learning, Tsinghua University.
  • 2019.09 - 2023.06: B.Eng. in Automation, Xiโ€™an Jiaotong University.

๐Ÿง‘โ€โš–๏ธ Academic Service

  • Conference Reviewer: ICLR 2026, ICML 2026, NeurIPS 2026, ARR 2026, AAAI 2026, ECCV 2026, AAAI 2027.
  • Journal Reviewer: Expert Systems With Applications (ESWA), ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM).

๐Ÿ’ฌ Invited Talks

  • 2025.05: I gave an invited talk on in-context reasoning at the 7th Beijing Universities Artificial Intelligence Academic Forum.
  • 2025.12: I gave an invited talk on memory usage, hosted by the Metaverse Technical Committee of the Chinese Association for Artificial Intelligence.