I am currently a Researcher at Nex-AGI, working on LLM Agent research. Prior to that, I was a Senior Research Scientist of Multimodal Intelligence Team in Microsoft CoreAI , working on OCR, Document Intelligence (DI), RAG, MLLMs and LLM Agent research. My research interests include Document Intelligence, MLLMs, LLM Agents, Agent RL, and Multi-Agent Systems.
I obtained my Ph.D. degree from the joint Ph.D. program between University of Science and Technology of China (USTC) and Microsoft Research Asia (MSRA) in 2024, under the supervision of Prof. Qiang Huo at MSRA and Prof. Jun Du at USTC. During my Ph.D. studies, I interned at DeepSeek, contributing to DeepSeek OCR, DeepSeek VL2, DeepSeek V3, and DeepSeek R1, and at MSRA, working on the Microsoft OneOCR and Document Intelligence projects. After completing my Ph.D., I worked as a Senior Research Scientist at Microsoft CoreAI, where I led the development of the industry-leading Azure Layout API. I have published 10+ papers at the top international AI journals and conferences, and one of my papers received the Best Paper Award at ICDAR 2021.
If you are seeking any form of academic cooperation, please feel free to email me at kaihu.kh@gmail.com. We are hiring interns! If you’d like to have a coffee chat, please feel free to reach out. I really enjoy connecting with different people! ☕😊✨
X (Twitter)