What Data Do Robots Really Need?
Natural language processing and computer vision have recently undergone a paradigm shift toward data-driven intelligence, highlighted by the success of large language and vision models trained on massive internet-scale datasets. Robotics is at an analogous inflection point: progress in robot learning is increasingly bottlenecked not only by model architectures and compute, but by the availability, quality, diversity, and structure of robot data. Yet unlike the digital world, the physical world still lacks an “Internet for Robots”—a shared, scalable ecosystem of data, tooling, and evaluation that can reliably support general-purpose physical intelligence.
This workshop will bring together researchers and practitioners to examine a core question: What kinds of data matter most for training robots, and at what scale is data “enough”? We will focus on the key aspects of robot data—data sources, collection paradigms, scaling laws, dataset composition, curation and weighting, evaluation protocols, and post-deployment data flywheels—highlighting both complementary perspectives and unresolved tensions. The workshop is designed to be highly discussion-driven, using short talks and panels to identify practical bottlenecks and propose actionable research directions.
We welcome topics related but not limited to:
All accepted papers will be presented at an in-person poster session. A small number of selected papers will additionally give a 5-minute spotlight talk. We will also recognize outstanding work with a Best Paper Award, announced during the closing remarks. Camera-ready versions of all accepted papers will be made available on the workshop website.
| 9:00 - 9:05 | Opening Remarks |
| 9:05 - 9:35 | Invited Talk 1: Ashwin Balakrishna |
| 9:35 - 10:05 | Invited Talk 2: Haozhi Qi |
| 10:05 - 10:35 | Coffee Break and Poster Session |
| 10:35 - 11:05 | Invited Talk 3: Simar Kareer |
| 11:05 - 11:35 | Invited Talk 4: Hao Su |
| 11:35 - 11:50 | Spotlight Talks |
| 11:50 - 12:20 | Panel Discussion |
| 12:20 - 12:25 | Closing Remarks |
Hao Su is "Haoqing" Distinguished Professor at Fudan University and Inaugural Dean of the Fudan Institute of General Physical Intelligence. Prior to Fudan, he was a tenured associate professor at UC San Diego. He holds a Ph.D. in Computer Science from Stanford and a Ph.D. in Mathematics from Beihang University. His twenty-year research arc has been guided by a single thesis: perceptual concepts are defined through physical interaction, and intelligence emerges from the closed loop between agent and environment. His work has built foundational infrastructure across every layer of that loop — from perception(ImageNet) to 3D spatial understanding (ShapeNet, PointNet, TensoRF, One-2-3-45), to physical simulation (SAPIEN, PlasticineLab), to embodied control (ManiSkill, TD-MPC). He has received the IEEE PAMI Young Researcher Award, NSF CAREER Award, ACM SIGGRAPH Outstanding Doctoral Dissertation Honorable Mention, and the 2025 ICBS Innovation Award. He serves as Program Committee Co-Chair of CVPR 2025.
Ashwin Balakrishna is a Researcher at Physical Intelligence working on building scalable robot data collection and evaluation pipelines for large-scale robot learning. Previously, he was a Senior Research Scientist at Google Deepmind on the Gemini Robotics team where he worked on VLA post-training and improving instruction following capabilities. He did his PhD in Computer Science in the AUTOLAB at UC Berkeley and completed his bachelor's degree at Caltech in Electrical Engineering.
Haozhi Qi is a Member of Technical Staff at Amazon FAR and an incoming Assistant Professor at the University of Chicago. His research spans robotics and AI, with a particular focus on algorithms and systems for dexterous manipulation. He is a recipient of the Lotfi A. Zadeh Prize, was named an RSS Pioneer 2026, and received the EECS Evergreen Award for Excellence in Undergraduate Research Mentoring.
Simar Kareer is a PhD student at Georgia Tech advised by Danfei Xu and Judy Hoffman. His research focuses on scaling robot learning from egocentric human experience, including EgoMimic and EgoVerse. He previously led Human2Robot at Physical Intelligence and is currently interning with NVIDIA GEAR. His work has been featured in the Meta AI Blog and The Washington Post.
Contact us at: rssworkshop2026-official@outlook.com