I’m a second-year PhD student in Computer Science at Northwestern University, advised by Manling Li.
I earned my master’s degree working with Mengyue Wu at Shanghai Jiao Tong University in the CSE Department, where I also completed a B.E. (Honors).
My research spans multimodal learning, foundation models, and spatial reasoning.
Early on, I studied representation learning for depression detection.
I then focused on contrastive learning for robust, cross-modal representations.
Recently, I’ve been exploring cross-modal (“X-modality”) interaction, where X can be vision, audio, and beyond. I have also been working on agentic training frameworks (e.g., RAGEN and VAGEN) for multi-turn reasoning.
My current primary focus is spatial cognition in foundation models.
For more information about me, please refer to my CV.
Please feel free to contact me via email.