I am currently a Researcher at Nex-AGI, working on LLM Agent research, with broader interests in MLLMs, LLM Agent, Agent Memory, Agent RL, and Multi-Agent Systems. Previously, I was a Senior Research Scientist of Multimodal Intelligence Team in Microsoft CoreAI , where I worked on OCR, Document Intelligence, RAG and MLLMs research, and led the development of the industry-leading Azure Layout API.

I obtained my Ph.D. degree from the joint Ph.D. program between University of Science and Technology of China (USTC) and Microsoft Research Asia (MSRA) in 2024, under the supervision of Prof. Qiang Huo and Prof. Jun Du. During my Ph.D., I interned at DeepSeek, contributing to DeepSeek OCR, DeepSeek VL2, DeepSeek V3, and DeepSeek R1, and at MSRA, working on Microsoft OneOCR and Document Intelligence. I have published 10+ papers at top international AI journals and conferences, and one of my papers received the Best Paper Award at ICDAR 2021.

If you are seeking any form of academic cooperation, please feel free to email me at kaihu.kh@gmail.com. We are hiring interns! If you’d like to have a coffee chat, please feel free to reach out. I really enjoy connecting with different people! ☕😊✨

🔥 News

2026.05: 🎉 We open-sourced Nex-N2, featuring first-tier coding and agentic capabilities. Check out Nex-AGI for more details!
2026.04: 🎉 One paper is accepted by ACL 2026 Findings.
2025.11: 🎉 We open-sourced Nex, a full-stack AI Agent Platform that connects models, frameworks, data, and infrastructure end-to-end. Check out the Nex-AGI for more details!
2025.03: 🎉 One paper is accepted by Pattern Recognition.
2024.12: 🎉 One paper is accepted by ICASSP 2025.
2024.11: 🎉 I'm thrilled to have the opportunity to make some contributions to DeepSeek-OCR, DeepSeek-VL2, DeepSeek-V3 and DeepSeek-R1.
2024.04: 🎉 Four papers are accepted by ICDAR 2024.

🔥 More News

2024.03: 🎉 One paper is accepted by Pattern Recognition.
2023.12: 🎉 One paper is accepted by Pattern Recognition.
2023.04: 🎉 One paper is accepted by ICDAR 2023.
2022.11: 🎉 One paper is accepted by AAAI 2023.
2022: 😭 This year has been the hardest of my life. I sincerely hope everyone stays healthy and well. 🙏
2021.09: 🎉 Our ViBERTgrid won the Best Paper Award of ICDAR 2021!
2021.03: 🎉 One paper is accepted by ICDAR 2021.

💻 Experiences

2025.05-Now: Researcher, Nex-AGI, Shanghai, China.
2024.12-2025.05: Senior Research Scientist, Multimodal Intelligence Team, Microsoft CoreAI , Beijing, China (Waiting for the work visa to go to Microsoft Seattle).
2024.06-2024.11: Research Intern, Multimodal LLM Team, DeepSeek , Beijing, China.
2020.06-2024.06: Research Intern, Multimodal Interaction Group, Microsoft Research Asia , Beijing, China.
2018.07-2019.09: Research Intern, Multimodal Interaction Group, Microsoft Research Asia , Beijing, China.

📖 Educations

2019.09-2024.12: Ph.D. in Information and Communication Engineering, University of Science and Technology of China, Hefei, Anhui, China.
2015.09-2019.06: Dual B.Eng. in Computer Sciences, University of Science and Technology of China, Hefei, Anhui, China.
2015.09-2019.06: B.S. in Physical Sciences, University of Science and Technology of China, Hefei, Anhui, China.

📝 Publications

✉️ means Corresponding Author; * means Equal Contribution

🤖 LLMs & MLLMs

ACL 2026 Findings MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning, Jiahang Lin*, Kai Hu*, Binghai Wang, Yuhao Zhou, Zhiheng Xi, Honglin Guo, Shichun Liu, Junzhe Wang, Shihan Dou, Enyu Zhou, Hang Yan, Zhenhua Han$^✉️$, Tao Gui$^✉️$, Qi Zhang, Xuanjing Huang
arXiv 2025 Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction, Nex-AGI Team
ICASSP 2025 DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering, Haochen Wang, Kai Hu, Liangcai Gao$^✉️$
arXiv 2025 (Cutting-edge Project) DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, DeepSeek AI
arXiv 2024 (Cutting-edge Project) DeepSeek-V3 Technical Report, DeepSeek AI
arXiv 2024 (Cutting-edge Project) DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding, Zhiyu Wu*, Xiaokang Chen*, Zizheng Pan*, Xingchao Liu*, Wen Liu*, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan$^✉️$
ICDAR 2024 DocTabQA: Answering Questions from Long Documents Using Tables, Haochen Wang, Kai Hu, Haoyu Dong, Liangcai Gao$^✉️$

📄 Document Intelligence

Pattern Recognition 2025 (SCI Q1 Journal) UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis, Jiawei Wang$^✉️$, Kai Hu, Qiang Huo
ICDAR 2024 (Oral) DLAFormer: An End-to-End Transformer For Document Layout Analysis, Jiawei Wang*$^✉️$, Kai Hu*$^✉️$, Qiang Huo
ICDAR 2024 (Oral) Dynamic Relation Transformer for Contextual Text Block Detection, Jiawei Wang*$^✉️$, Shunchi Zhang*$^✉️$, Kai Hu*$^✉️$, Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo
ICDAR 2024 (Oral) UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-Like Documents, Kai Hu$^✉️$, Jiawei Wang, Weihong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo
Pattern Recognition 2024 (SCI Q1 Journal) Mathematical formula detection in document images: A new dataset and a new approach, Kai Hu$^✉️$, Zhuoyao Zhong, Lei Sun, Qiang Huo
Pattern Recognition 2024 (SCI Q1 Journal) Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis, Jiawei Wang$^✉️$, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huo
ICDAR 2023 A Hybrid Approach to Document Layout Analysis for Heterogeneous Document Images, Zhuoyao Zhong$^✉️$, Jiawei Wang, Haiqing Sun, Kai Hu, Erhan Zhang, Lei Sun, Qiang Huo
AAAI 2023 (Oral) A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images, Kai Hu*, Zhuoyuan Wu*, Zhuoyao Zhong$^✉️$, Weihong Lin, Lei Sun, Qiang Huo
ICDAR 2021 (Best Paper Award) ViBERTgrid: A Jointly Trained Multi-modal 2D Document Representation for Key Information Extraction from Documents, Weihong Lin*$^✉️$, Qifang Gao*, Lei Sun, Zhuoyao Zhong, Kai Hu, Qin Ren, Qiang Huo

📚 Academic Services

ICDAR Reviewer (2023, 2024, 2025, 2026)
Pattern Recognition Reviewer (2025)
AAAI Reviewer (2025)

🎖 Honors and Awards

2024.10: Outstanding Doctoral Graduate (Top 5%) 📍 USTC
2019-2024: Co-Developer of Microsoft Azure AI Document Intelligent API 📍 MSRA
2024: Microsoft Research Asia Star of Tomorrow Internship (Top 5%) 📍 MSRA
2021.09: ICDAR 2021 Best Paper Award (1/400+) 📍 MSRA
2019: Microsoft Research Asia Star of Tomorrow Internship (Top 5%) 📍 MSRA
2016-2018: National Inspirational Scholarship (Top 5%) 📍 USTC
2015-2019: Zhao Zhongyao Scholarship (Top 5%) 📍 USTC
2014: First Prize in the 31st National Physics Contest for High School Students 📍 Jiangxi, China

💬 Invited Talks

2024.08: Towards Universal Visual Information Extraction. Hosted by Microsoft.

Kai Hu (胡凯)