Publications
2025
- NeurIPSToward Human Deictic Gesture Target EstimationIn Advances in Neural Information Processing Systems 2025
- NeurIPSFine-Grained Preference Optimization Improves Spatial Reasoning in VLMsIn Advances in Neural Information Processing Systems 2025
- PreprintAgentic Large-Language-Model Systems in Medicine: A Systematic Review and Taxonomytechrxiv preprint techrxiv.175736231.12300949 2025
- Scientific DataTrialBench: Multi-modal AI-ready datasets for clinical trial predictionScientific Data 2025
- COLMWhat is the visual cognition gap between humans and multimodal llms?In Conference on Language Modeling 2025
- ICCVProxy-Bridged Game Transformer for Multi-Person Highly Interactive Extreme Motion PredictionIn IEEE/CVF International Conference on Computer Vision 2025
- IROSOn-board vision-language models for personalized autonomous vehicle motion control: System design and real-world validationIn IEEE/RSJ International Conference on Intelligent Robots and Systems 2025
- CVPR
- ICLRWorkshop on ai for children: Healthcare, psychology, educationIn ICLR 2025 Workshop Proposals 2025
2024
- ML4HProceedings MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus InfectionIn Machine Learning for Health Symposium 2024
- ML4HProceedings EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health RecordsIn Machine Learning for Health Symposium 2024
- EMNLPFindings Learning Autonomous Driving Tasks via Human Feedbacks with Large Language ModelsIn Findings of the Association for Computational Linguistics 2024
- PreprintTowards social AI: A survey on understanding social interactionsarXiv preprint arXiv:2409.15316 2024
- CVPRMAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene UnderstandingIn IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
- CVPRLampilot: An open benchmark dataset for autonomous driving with language model programsIn IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
- WACVWA survey on multimodal large language models for autonomous drivingIn IEEE/CVF Winter Conference on Applications of Computer Vision Workshops 2024
- WACVWDrive as you speak: Enabling human-like interaction with large language models in autonomous vehiclesIn IEEE/CVF Winter Conference on Applications of Computer Vision Workshops 2024
- WACVMACP: Efficient model adaptation for cooperative perceptionIn IEEE/CVF Winter Conference on Applications of Computer Vision 2024
2023
- ICASSPVitasd: Robust vision transformer baselines for autism spectrum disorder facial diagnosisIn IEEE International Conference on Acoustics, Speech, and Signal Processing 2023
- JCPPCommentary: Machine learning for autism spectrum disorder diagnosis–challenges and opportunitiesJournal of Child Psychology and Psychiatry 2023
- AI MagazineHigh-definition map automatic annotation system based on active learningAI Magazine 2023
- UAIMitigating transformer overconfidence via Lipschitz regularizationIn Conference on Uncertainty in Artificial Intelligence 2023
- AAAIOral in IAAI THMA: Tencent hd map ai system for creating hd map annotationsIn AAAI Conference on Artificial Intelligence 2023
2022
- IJCAIOral Aggpose: Deep aggregation vision transformer for infant pose estimationIn International Joint Conference on Artificial Intelligence 2022