Publications

2025

  1. NeurIPS
    Toward Human Deictic Gesture Target Estimation
    Cao, Xu, Virupaksha, Pranav, Lai, Bolin, Lee, Sangmin, Jia, Wenqi, Chen, Jintai, and Rehg, James M
    In Advances in Neural Information Processing Systems 2025
  2. NeurIPS
    Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
    Shen, Yifan, Liu, Yuanzhe, Zhu, Jingyuan, Cao, Xu, Zhang, Xiaofeng, He, Yixiao, Ye, Wenming, Rehg, James M, and Lourentzou, Ismini
    In Advances in Neural Information Processing Systems 2025
  3. Preprint
    Agentic Large-Language-Model Systems in Medicine: A Systematic Review and Taxonomy
    Al Radi, Abdul Mohaimen, Cao, Xu, Yu, Fanyang, Liu, Yuyuan, Liu, Fengbei, Wang, Chong, Chen, Yuanhong, Chen, Jintai, Wang, Hu, Meng, Yanda, and others,
    techrxiv preprint techrxiv.175736231.12300949 2025
  4. Scientific Data
    TrialBench: Multi-modal AI-ready datasets for clinical trial prediction
    Chen, Jintai, Hu, Yaojun, Cai, Mingchen, Lu, Yingzhou, Wang, Yue, Cao, Xu, Lin, Miao, Xu, Hongxia, Wu, Jian, Cao, Xiao, and others,
    Scientific Data 2025
  5. COLM
    What is the visual cognition gap between humans and multimodal llms?
    Cao, Xu, Shen, Yifan, Lai, Bolin, Ye, Wenqian, Ma, Yunsheng, Heintz, Joerg, Chen, Jintai, Huang, Meihuan, Cao, Jianguo, and Rehg, James M
    In Conference on Language Modeling 2025
  6. ICCV
    Proxy-Bridged Game Transformer for Multi-Person Highly Interactive Extreme Motion Prediction
    Fang, Yanwen, Jia, Wenqi, Cao, Xu, Jiang, Peng-Tao, Li, Guodong, and Chen, Jintai
    In IEEE/CVF International Conference on Computer Vision 2025
  7. IROS
    On-board vision-language models for personalized autonomous vehicle motion control: System design and real-world validation
    Cui, Can, Yang, Zichong, Zhou, Yupeng, Peng, Juntong, Park, Sung-Yeon, Zhang, Cong, Ma, Yunsheng, Cao, Xu, Ye, Wenqian, Feng, Yiheng, and others,
    In IEEE/RSJ International Conference on Intelligent Robots and Systems 2025
  8. CVPR
    SocialGesture: Delving into Multi-person Gesture Understanding
    Cao, Xu, Virupaksha, Pranav, Jia, Wenqi, Lai, Bolin, Ryan, Fiona, Lee, Sangmin, and Rehg, James M
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025
  9. ICLR
    Workshop on ai for children: Healthcare, psychology, education
    Cao, Xu, Chen, Jintai, Ye, Wenqian, Jojic, Ana, Owusu, Sheila Agyeiwaa, Li, Sheng, Coffee, Megan, Zhao, Sicheng, and Rehg, James Matthew
    In ICLR 2025 Workshop Proposals 2025
  10. Inform Fusion
    From Screens to Scenes: A Survey of Embodied AI in Healthcare
    Liu, Yihao, Cao, Xu, Chen, Tingting, Jiang, Yankai, You, Junjie, Wu, Minghua, Wang, Xiaosong, Feng, Mengling, Jin, Yaochu, and Chen, Jintai
    Information Fusion 2025

2024

  1. ML4H Proceedings
    MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection
    Cao, Xu, Ye, Wenqian, Moise, Kenny, and Coffee, Megan
    In Machine Learning for Health Symposium 2024
  2. ML4H Proceedings
    EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health Records
    Fallahpour, Adibvafa, Alinoori, Mahshid, Ye, Wenqian, Cao, Xu, Afkanpour, Arash, and Krishnan, Amrit
    In Machine Learning for Health Symposium 2024
  3. EMNLP Findings
    Learning Autonomous Driving Tasks via Human Feedbacks with Large Language Models
    Ma, Yunsheng, Cao, Xu, Ye, Wenqian, Cui, Can, Mei, Kai, and Wang, Ziran
    In Findings of the Association for Computational Linguistics 2024
  4. Preprint
    Towards social AI: A survey on understanding social interactions
    Lee, Sangmin, Li, Minzhi, Lai, Bolin, Jia, Wenqi, Ryan, Fiona, Cao, Xu, Kara, Ozgur, Boote, Bikram, Shi, Weiyan, Yang, Diyi, and others,
    arXiv preprint arXiv:2409.15316 2024
  5. CVPR
    MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding
    Cao, Xu, Zhou, Tong, Ma, Yunsheng, Ye, Wenqian, Cui, Can, Tang, Kun, Cao, Zhipeng, Liang, Kaizhao, Wang, Ziran, Rehg, James M, and others,
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
  6. CVPR
    Lampilot: An open benchmark dataset for autonomous driving with language model programs
    Ma, Yunsheng, Cui, Can, Cao, Xu, Ye, Wenqian, Liu, Peiran, Lu, Juanwu, Abdelraouf, Amr, Gupta, Rohit, Han, Kyungtae, Bera, Aniket, and others,
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
  7. WACVW
    A survey on multimodal large language models for autonomous driving
    Cui, Can, Ma, Yunsheng, Cao, Xu, Ye, Wenqian, Zhou, Yang, Liang, Kaizhao, Chen, Jintai, Lu, Juanwu, Yang, Zichong, Liao, Kuei-Da, and others,
    In IEEE/CVF Winter Conference on Applications of Computer Vision Workshops 2024
  8. WACVW
    Drive as you speak: Enabling human-like interaction with large language models in autonomous vehicles
    Cui, Can, Ma, Yunsheng, Cao, Xu, Ye, Wenqian, and Wang, Ziran
    In IEEE/CVF Winter Conference on Applications of Computer Vision Workshops 2024
  9. WACV
    MACP: Efficient model adaptation for cooperative perception
    Ma, Yunsheng, Lu, Juanwu, Cui, Can, Zhao, Sicheng, Cao, Xu, Ye, Wenqian, and Wang, Ziran
    In IEEE/CVF Winter Conference on Applications of Computer Vision 2024

2023

  1. ICASSP
    Vitasd: Robust vision transformer baselines for autism spectrum disorder facial diagnosis
    Cao, Xu, Ye, Wenqian, Sizikova, Elena, Bai, Xue, Coffee, Megan, Zeng, Hongwu, and Cao, Jianguo
    In IEEE International Conference on Acoustics, Speech, and Signal Processing 2023
  2. JCPP
    Commentary: Machine learning for autism spectrum disorder diagnosis–challenges and opportunities
    Cao, Xu, and Cao, Jianguo
    Journal of Child Psychology and Psychiatry 2023
  3. AI Magazine
    High-definition map automatic annotation system based on active learning
    Zheng, Chao, Cao, Xu, Tang, Kun, Cao, Zhipeng, Sizikova, Elena, Zhou, Tong, Li, Erlong, Liu, Ao, Zou, Shengtao, Yan, Xinrui, and others,
    AI Magazine 2023
  4. UAI
    Mitigating transformer overconfidence via Lipschitz regularization
    Ye, Wenqian, Ma, Yunsheng, Cao, Xu, and Tang, Kun
    In Conference on Uncertainty in Artificial Intelligence 2023
  5. AAAI Oral in IAAI
    THMA: Tencent hd map ai system for creating hd map annotations
    Tang, Kun, Cao, Xu, Cao, Zhipeng, Zhou, Tong, Li, Erlong, Liu, Ao, Zou, Shengtao, Liu, Chang, Mei, Shuqi, Sizikova, Elena, and others,
    In AAAI Conference on Artificial Intelligence 2023

2022

  1. IJCAI Oral
    Aggpose: Deep aggregation vision transformer for infant pose estimation
    Cao, Xu, Li, Xiaoye, Ma, Liya, Huang, Yi, Feng, Xuan, Chen, Zening, Zeng, Hongwu, and Cao, Jianguo
    In International Joint Conference on Artificial Intelligence 2022