About Me

I’m currently a 3rd-year Ph.D. candidate in the Department of Automation at Tsinghua University. Prior to this, I earned my B.Eng. degree from the School of Automation Science and Engineering, Xi’an Jiaotong University in 2023. I worked closely with Shuai Zhang and Zhengqi Wen. I am fortunate to collaborate with Zheng Lian, Haoran Luo, Zhengxi Lu, and Fan Zhang.

My research interests focus on LLM / MLLM reasoning, planning, and agentic reinforcement learning. From 2024 to early 2025, my work mainly focus on high-quality and efficient reasoning for LLMs. My current research investigates post-training techniques for general agents, including agent skills, on-policy distillation (OPD), and reinforcement learning (RL).

🐈 Collaboration: I am looking for motivated collaborators interested in the above topics. If you would like to explore these directions together, feel free to contact me. UG/MSc students are also welcomed! 🌱

🔥 News

  • 2026.06🚀 Released OPID, TACO, Orchestra-o1. OPID featured as 🤗 HF Daily Paper #3!
  • 2026.05🎙️ Presented Maestro at AliStar Academic Open Day, Beijing 🇨🇳.
  • 2026.05🚀 Released Maestro, RobotEQ, SDAR, AffectGPT-RL. SDAR featured as 🤗 HF Daily Paper #2!
  • 2026.05✈️🏛️ Attending VALSE 2026 at Wuhan 🇨🇳
  • 2026.04🚀 Released SKILL0 on skill internalization, featured as 🤗 HF Daily Paper #2!
  • 2026.04🎉 Six papers accepted to ACL 2026, including one oral and best paper candidate (Double). See you in San Diego 🇺🇸!
  • 2026.02🎉 Two papers accepted to ICLR 2026!
  • 2026.01✈️ Attended AAAI 2026 in Singapore 🇸🇬
  • 2025.12🎤 Attended the 2025 China Metaverse Conference in Wenzhou 🇨🇳.
  • 2025.11🎉 Two papers accepted to AAAI 2026 (one oral presentation)!
  • 2025.11✈️📍 Attending EMNLP 2025 in Suzhou 🇨🇳
  • 2025.05🎉 NoiserBench accepted to ACL 2025!

📝 Selected Publication(Full List

* Equal contribution. † Corresponding author.

🤖 Agentic Post Training

ACL 2026
SPARK

SPARK: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning

Jinyang Wu, Shuo Yang, Changpeng Yang, Yuhao Shen, Shuai Zhang, Zhengqi Wen, Jianhua Tao

Paper HF Code BIB

  • We propose a policy-aware branching framework that allocates exploration budget to critical decision states, improving sample efficiency for long-horizon agentic RL.
ACL 2026
Atlas

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

Jinyang Wu, Guocheng Zhai, Ruihan Jin, Jiahao Yuan, Yuhao Shen, Shuai Zhang, Zhengqi Wen, Jianhua Tao

Paper HF BIB

  • We introduce a dual-path framework for dynamic tool usage, combining cluster-based routing with RL-based multi-step routing for cross-domain reasoning.
Preprint
Maestro

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

Jinyang Wu, Guocheng Zhai, Ruihan Jin, Yuhao Shen, Zhengxi Lu, Fan Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao

Paper HF Code BIB

  • We formulate model-skill orchestration as a sequential decision process and train a lightweight policy to compose frozen expert models and skills for multimodal tasks.
Preprint
OPID

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Shuo Yang*, Jinyang Wu*,†, Zhengxi Lu, Yuhao Shen, Fan Zhang, Lang Feng, Shuai Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao

Paper HF Code BIB

  • We extract hierarchical skill supervision from completed on-policy trajectories and convert hindsight skill signals into dense token-level advantages for agent training.
Preprint
SKILL0

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

Paper HF Code BIB

  • We study skill internalization for agents, gradually withdrawing runtime skill context so the policy can acquire reusable behaviors into its parameters.

🧠 LLM Reasoning

ACL 2026
ThoughtICR

Beyond Examples: Towards Automated Thought-level In-Context Reasoning for Large Language Models

Jinyang Wu, Mingkuan Feng, Shuai Zhang, Feihu Che, Zhengqi Wen, Chonghua Liao, Ling Yang, Haoran Luo, Zheng Lian, Jianhua Tao

Paper HF Code BIB

  • We shift in-context reasoning from example-level imitation to reusable thought patterns, enabling automated and efficient reasoning guidance.
ACL 2026
TemplateRL

TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning

Jinyang Wu, Chonghua Liao, Mingkuan Feng, Shuai Zhang, Zhengqi Wen, Haoran Luo, Ling Yang, Huazhe Xu, Jianhua Tao

Paper HF BIB

  • We augment policy optimization with structured templates, improving high-quality rollout generation and stabilizing RL training for reasoning.
AAAI 2026
AStar

AStar: Boosting Multimodal Reasoning with Automated Structured Thinking

Jinyang Wu, Mingkuan Feng, Guocheng Zhai, Shuai Zhang, Zheng Lian, Fangrui Lv, Pengpeng Shao, Ruihan Jin, Zhengqi Wen, Jianhua Tao

Paper HF BIB AAAI 2026 Oral

  • We build a training-free structured thinking method for multimodal reasoning, retrieving reusable thought cards at test time to guide MLLMs.
ACL 2026
Double

Double: Breaking the Acceleration Limit via Double Retrieval Speculative Parallelism

Yuhao Shen, Tianyu Liu, Junyi Shen, Jinyang Wu, Quan Kong, Li Huan, Cong Wang

Paper HF Code BIB ACL 2026 Oral & Best Paper Candidate

  • We bridge speculative decoding and retrieval-based guidance to push inference acceleration beyond conventional parallel speculative decoding limits.
ACL 2025
NoiserBench

Pandora’s Box or Aladdin’s Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models

Jinyang Wu, Shuai Zhang, Feihu Che, Mingkuan Feng, Pengpeng Shao, Jianhua Tao

Paper HF Code BIB

  • We define a linguistic taxonomy of RAG noise and build NoiserBench to study when retrieval noise harms or surprisingly helps LLM reasoning.

🧬 Biomedical AI

👨‍🏫 Teaching

  • Teaching Assistant, Affective Computing, graduate course.
  • Teaching Assistant, Intelligent Speech Processing, undergraduate interdisciplinary innovation training course.

🎖 Honors and Awards

  • 2026: Outstanding Student Cadre, Tsinghua University.
  • 2026: Merit Student, Tsinghua University.
  • 2025: Outstanding Teaching Assistant, Tsinghua University.
  • 2025: First-Class Scholarship of Tsinghua University.
  • 2024: Second-Class Scholarship of Tsinghua University.
  • 2023: Outstanding Graduate, Shaanxi Province.
  • 2023: Outstanding Graduate, Xi’an Jiaotong University.
  • 2022: First Prize, Chinese Mathematics Competitions.
  • 2021: National Scholarship, Ministry of Education of China.

📖 Educations

  • 2023.09 - now: Ph.D. student in Pattern Recognition and Machine Learning, Tsinghua University.
  • 2019.09 - 2023.06: B.Eng. in Automation, Xi’an Jiaotong University.

🧑‍⚖️ Academic Services

Journal Reviewer

  • Expert Systems With Applications (ESWA).
  • ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM).

Conference Reviewer

  • Top-tier ML/AI Conferences: NeurIPS (2025–2026), ICML (2026), ICLR (2026), AAAI (2026–2027).
  • Top-tier CV Conferences: CVPR (2026), ECCV (2026).
  • Top-tier NLP Conferences: ACL ARR (2026).

💬 Invited Talks

  • 2025.12: I gave an invited talk on memory usage, hosted by the Metaverse Technical Committee of the Chinese Association for Artificial Intelligence.
  • 2025.05: I gave an invited talk on in-context reasoning at the 7th Beijing Universities Artificial Intelligence Academic Forum.