What Data Do Robots Really Need?
Natural language processing and computer vision have recently undergone a paradigm shift toward data-driven intelligence, highlighted by the success of large language and vision models trained on massive internet-scale datasets. Robotics is at an analogous inflection point: progress in robot learning is increasingly bottlenecked not only by model architectures and compute, but by the availability, quality, diversity, and structure of robot data. Yet unlike the digital world, the physical world still lacks an “Internet for Robots”—a shared, scalable ecosystem of data, tooling, and evaluation that can reliably support general-purpose physical intelligence.
This workshop will bring together researchers and practitioners to examine a core question: What kinds of data matter most for training robots, and at what scale is data “enough”? We will focus on the key aspects of robot data—data sources, collection paradigms, scaling laws, dataset composition, curation and weighting, evaluation protocols, and post-deployment data flywheels—highlighting both complementary perspectives and unresolved tensions. The workshop is designed to be highly discussion-driven, using short talks and panels to identify practical bottlenecks and propose actionable research directions.
We welcome topics related but not limited to:
All accepted papers will be presented at an in-person poster session. A small number of selected papers will additionally give a 5-minute spotlight talk. We will also recognize outstanding work with a Best Paper Award, announced during the closing remarks. Camera-ready versions of all accepted papers will be made available on the workshop website.
| 9:00 - 9:05 | Opening Remarks |
| 9:05 - 9:35 | Invited Talk 1: Hao Su |
| 9:35 - 10:05 | Invited Talk 2: Ashwin Balakrishna |
| 10:05 - 10:35 | Coffee Break and Poster Session |
| 10:35 - 11:05 | Invited Talk 3: Haozhi Qi |
| 11:05 - 11:35 | Invited Talk 4: Simar Kareer |
| 11:35 - 11:50 | Spotlight Talks |
| 11:50 - 12:20 | Panel Discussion |
| 12:20 - 12:25 | Closing Remarks |
Hao Su is "Haoqing" Distinguished Professor at Fudan University and Inaugural Dean of the Fudan Institute of General Physical Intelligence. His research spans perception, 3D understanding, and embodied control, with foundational contributions including ImageNet, ShapeNet, PointNet, and the ManiSkill benchmark. He is a recipient of the IEEE PAMI Young Researcher Award, NSF CAREER Award, and serves as Program Committee Co-Chair of CVPR 2025. He holds a Ph.D. in Computer Science from Stanford and a Ph.D. in Mathematics from Beihang University.
Ashwin Balakrishna is a Researcher at Physical Intelligence working on building scalable robot data collection and evaluation pipelines for large-scale robot learning. Previously, he was a Senior Research Scientist at Google Deepmind on the Gemini Robotics team where he worked on VLA post-training and improving instruction following capabilities. He did his PhD in Computer Science in the AUTOLAB at UC Berkeley and completed his bachelor's degree at Caltech in Electrical Engineering.
Haozhi Qi is a Member of Technical Staff at Amazon FAR and an incoming Assistant Professor at the University of Chicago. His research spans robotics and AI, with a particular focus on algorithms and systems for dexterous manipulation. He is a recipient of the Lotfi A. Zadeh Prize, was named an RSS Pioneer 2026, and received the EECS Evergreen Award for Excellence in Undergraduate Research Mentoring.
Simar Kareer is a PhD student at Georgia Tech advised by Danfei Xu and Judy Hoffman. His research focuses on scaling robot learning from egocentric human experience, including EgoMimic and EgoVerse. He previously led Human2Robot at Physical Intelligence and is currently interning with NVIDIA GEAR. His work has been featured in the Meta AI Blog and The Washington Post.
Contact us at: rssworkshop2026-official@outlook.com