I am currently a Researcher at Nex-AGI, working on LLM Agent research. Prior to that, I was a Senior Research Scientist of Multimodal Intelligence Team in Microsoft CoreAI , working on OCR, Document Intelligence (DI), RAG, MLLMs and LLM Agent research. My research interests include Document Intelligence, MLLMs, LLM Agents, Agent RL, and Multi-Agent Systems.
I obtained my Ph.D. degree from the joint Ph.D. program between University of Science and Technology of China (USTC) and Microsoft Research Asia (MSRA) in 2024, under the supervision of Prof. Qiang Huo at MSRA and Prof. Jun Du at USTC. During my Ph.D. studies, I interned at DeepSeek, contributing to DeepSeek OCR, DeepSeek VL2, DeepSeek V3, and DeepSeek R1, and at MSRA, working on the Microsoft OneOCR and Document Intelligence projects. After completing my Ph.D., I worked as a Senior Research Scientist at Microsoft CoreAI, where I led the development of the industry-leading Azure Layout API. I have published 10+ papers at the top international AI journals and conferences, and one of my papers received the Best Paper Award at ICDAR 2021.
If you are seeking any form of academic cooperation, please feel free to email me at kaihu.kh@gmail.com. We are hiring interns! If you’d like to have a coffee chat, please feel free to reach out. I really enjoy connecting with different people! ☕😊✨
🔥 News
- 2025.11: 🎉 We open-sourced Nex, a full-stack AI Agent Platform that connects models, frameworks, data, and infrastructure end-to-end. Check out the Nex-AGI for more details!
- 2025.03: 🎉 One paper is accepted by Pattern Recognition.
- 2024.12: 🎉 One paper is accepted by ICASSP 2025.
- 2024.11: 🎉 I'm thrilled to have the opportunity to make some contributions to DeepSeek-OCR, DeepSeek-VL2, DeepSeek-V3 and DeepSeek-R1.
- 2024.04: 🎉 Four papers are accepted by ICDAR 2024.
🔥 More News
- 2024.03: 🎉 One paper is accepted by Pattern Recognition.
- 2023.12: 🎉 One paper is accepted by Pattern Recognition.
- 2023.04: 🎉 One paper is accepted by ICDAR 2023.
- 2022.11: 🎉 One paper is accepted by AAAI 2023.
- 2022: 😭 This year has been the hardest of my life. I sincerely hope everyone stays healthy and well. 🙏
- 2021.09: 🎉 Our ViBERTgrid won the Best Paper Award of ICDAR 2021!
- 2021.03: 🎉 One paper is accepted by ICDAR 2021.
💻 Experiences
- 2025.05-Now: Researcher, Nex-AGI, Shanghai, China.
- 2024.12-2025.05: Senior Research Scientist, Multimodal Intelligence Team, Microsoft CoreAI
, Beijing, China (Waiting for the work visa to go to Microsoft Seattle).
- 2024.06-2024.11: Research Intern, Multimodal LLM Team, DeepSeek
, Beijing, China. - 2020.06-2024.06: Research Intern, Multimodal Interaction Group, Microsoft Research Asia
, Beijing, China.
- 2018.07-2019.09: Research Intern, Multimodal Interaction Group, Microsoft Research Asia
, Beijing, China.
📖 Educations
- 2019.09-2024.12: Ph.D. in Information and Communication Engineering, University of Science and Technology of China, Hefei, Anhui, China.
- 2015.09-2019.06: Dual B.Eng. in Computer Sciences, University of Science and Technology of China, Hefei, Anhui, China.
- 2015.09-2019.06: B.S. in Physical Sciences, University of Science and Technology of China, Hefei, Anhui, China.
📝 Publications
- ✉️ means Corresponding Author; * means Equal Contribution
🤖 LLMs & MLLMs
arXiv 2025Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction, Nex-AGI TeamICASSP 2025DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering, Haochen Wang, Kai Hu, Liangcai Gao$^✉️$arXiv 2025(Cutting-edge Project) DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, DeepSeek AIarXiv 2024(Cutting-edge Project) DeepSeek-V3 Technical Report, DeepSeek AIarXiv 2024(Cutting-edge Project) DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding, Zhiyu Wu*, Xiaokang Chen*, Zizheng Pan*, Xingchao Liu*, Wen Liu*, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan$^✉️$ICDAR 2024DocTabQA: Answering Questions from Long Documents Using Tables, Haochen Wang, Kai Hu, Haoyu Dong, Liangcai Gao$^✉️$
📄 Document Intelligence
Pattern Recognition 2025(SCI Q1 Journal) UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis, Jiawei Wang$^✉️$, Kai Hu, Qiang HuoICDAR 2024(Oral) DLAFormer: An End-to-End Transformer For Document Layout Analysis, Jiawei Wang*$^✉️$, Kai Hu*$^✉️$, Qiang HuoICDAR 2024(Oral) Dynamic Relation Transformer for Contextual Text Block Detection, Jiawei Wang*$^✉️$, Shunchi Zhang*$^✉️$, Kai Hu*$^✉️$, Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang HuoICDAR 2024(Oral) UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-Like Documents, Kai Hu$^✉️$, Jiawei Wang, Weihong Lin, Zhuoyao Zhong, Lei Sun, Qiang HuoPattern Recognition 2024(SCI Q1 Journal) Mathematical formula detection in document images: A new dataset and a new approach, Kai Hu$^✉️$, Zhuoyao Zhong, Lei Sun, Qiang HuoPattern Recognition 2024(SCI Q1 Journal) Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis, Jiawei Wang$^✉️$, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang HuoICDAR 2023A Hybrid Approach to Document Layout Analysis for Heterogeneous Document Images, Zhuoyao Zhong$^✉️$, Jiawei Wang, Haiqing Sun, Kai Hu, Erhan Zhang, Lei Sun, Qiang HuoAAAI 2023(Oral) A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images, Kai Hu*, Zhuoyuan Wu*, Zhuoyao Zhong$^✉️$, Weihong Lin, Lei Sun, Qiang HuoICDAR 2021(Best Paper Award) ViBERTgrid: A Jointly Trained Multi-modal 2D Document Representation for Key Information Extraction from Documents, Weihong Lin*$^✉️$, Qifang Gao*, Lei Sun, Zhuoyao Zhong, Kai Hu, Qin Ren, Qiang Huo
📚 Academic Services
- ICDAR Reviewer (2023, 2024, 2025, 2026)
- Pattern Recognition Reviewer (2025)
- AAAI Reviewer (2025)
🎖 Honors and Awards
- 2024.10: Outstanding Doctoral Graduate (Top 5%) 📍 USTC
- 2019-2024: Co-Developer of Microsoft Azure AI Document Intelligent API 📍 MSRA
- 2024: Microsoft Research Asia Star of Tomorrow Internship (Top 5%) 📍 MSRA
- 2021.09: ICDAR 2021 Best Paper Award (1/400+) 📍 MSRA
- 2019: Microsoft Research Asia Star of Tomorrow Internship (Top 5%) 📍 MSRA
- 2016-2018: National Inspirational Scholarship (Top 5%) 📍 USTC
- 2015-2019: Zhao Zhongyao Scholarship (Top 5%) 📍 USTC
- 2014: First Prize in the 31st National Physics Contest for High School Students 📍 Jiangxi, China
💬 Invited Talks
- 2024.08: Towards Universal Visual Information Extraction. Hosted by Microsoft.
X (Twitter)