Data-Centric Robotics

What Data Do Robots Really Need?

RSS 2026 Workshop Sydney, Australia July 17, 2026

Overview

Natural language processing and computer vision have recently undergone a paradigm shift toward data-driven intelligence, highlighted by the success of large language and vision models trained on massive internet-scale datasets. Robotics is at an analogous inflection point: progress in robot learning is increasingly bottlenecked not only by model architectures and compute, but by the availability, quality, diversity, and structure of robot data. Yet unlike the digital world, the physical world still lacks an “Internet for Robots”—a shared, scalable ecosystem of data, tooling, and evaluation that can reliably support general-purpose physical intelligence.

This workshop will bring together researchers and practitioners to examine a core question: What kinds of data matter most for training robots, and at what scale is data “enough”? We will focus on the key aspects of robot data—data sources, collection paradigms, scaling laws, dataset composition, curation and weighting, evaluation protocols, and post-deployment data flywheels—highlighting both complementary perspectives and unresolved tensions. The workshop is designed to be highly discussion-driven, using short talks and panels to identify practical bottlenecks and propose actionable research directions.

Call for Papers

Topics of Interest

We welcome topics related but not limited to:

  • Data Sources and Philosophies: Web data offers semantic knowledge, simulation offers scale, and real-world interaction offers grounding. How do these sources complement or conflict, and how do we combine them?
  • Scalability vs. Quality in Data Collection: Teleoperation gives high-fidelity demonstrations but is hard to scale; human videos and wearable devices are abundant but lack explicit action labels. How do we balance scale and fidelity, and weight sources?
  • Closing the Loop—Learning After Deployment: Can a deployed robot learn from its own experience to correct failures? We explore RL, online adaptation, and data flywheels for continuous improvement.
  • Data Evaluation, Analysis, and Interpretability: What makes a “good” training example, and how do we select, filter, or weight data? We seek benchmarks and tools that reveal how data shapes model behavior.

Submission Guidelines

  • Submission portal: all papers must be submitted through our OpenReview portal.
  • Page limit: there is no strict page limit, but we recommend 4–8 pages, excluding references and appendix.
  • Format: submissions must follow the official RSS paper template and style, and must be properly anonymized for double-blind review.
  • Dual submission policy: we welcome work that is previously unpublished, currently under review elsewhere, or recently published. Accepted papers will be listed on the workshop website but are non-archival and will not appear in formal proceedings.

Presentation & Awards

All accepted papers will be presented at an in-person poster session. A small number of selected papers will additionally give a 5-minute spotlight talk. We will also recognize outstanding work with a Best Paper Award, announced during the closing remarks. Camera-ready versions of all accepted papers will be made available on the workshop website.

Workshop Schedule (Tentative)

Location: University of Technology Sydney, Sydney, Australia

Date: Friday, July 17, 2026 (Morning Session)

9:00 - 9:05 Opening Remarks
9:05 - 9:35 Invited Talk 1: Hao Su
9:35 - 10:05 Invited Talk 2: Ashwin Balakrishna
10:05 - 10:35 Coffee Break and Poster Session
10:35 - 11:05 Invited Talk 3: Haozhi Qi
11:05 - 11:35 Invited Talk 4: Simar Kareer
11:35 - 11:50 Spotlight Talks
11:50 - 12:20 Panel Discussion
12:20 - 12:25 Closing Remarks

Speakers

Hao Su
Hao Su
Fudan University

Hao Su is "Haoqing" Distinguished Professor at Fudan University and Inaugural Dean of the Fudan Institute of General Physical Intelligence. His research spans perception, 3D understanding, and embodied control, with foundational contributions including ImageNet, ShapeNet, PointNet, and the ManiSkill benchmark. He is a recipient of the IEEE PAMI Young Researcher Award, NSF CAREER Award, and serves as Program Committee Co-Chair of CVPR 2025. He holds a Ph.D. in Computer Science from Stanford and a Ph.D. in Mathematics from Beihang University.

Ashwin Balakrishna
Ashwin Balakrishna
Physical Intelligence

Ashwin Balakrishna is a Researcher at Physical Intelligence working on building scalable robot data collection and evaluation pipelines for large-scale robot learning. Previously, he was a Senior Research Scientist at Google Deepmind on the Gemini Robotics team where he worked on VLA post-training and improving instruction following capabilities. He did his PhD in Computer Science in the AUTOLAB at UC Berkeley and completed his bachelor's degree at Caltech in Electrical Engineering.

Haozhi Qi
Haozhi Qi
Amazon Frontier AI & Robotics (FAR)

Haozhi Qi is a Member of Technical Staff at Amazon FAR and an incoming Assistant Professor at the University of Chicago. His research spans robotics and AI, with a particular focus on algorithms and systems for dexterous manipulation. He is a recipient of the Lotfi A. Zadeh Prize, was named an RSS Pioneer 2026, and received the EECS Evergreen Award for Excellence in Undergraduate Research Mentoring.

Simar Kareer
Simar Kareer
Georgia Tech

Simar Kareer is a PhD student at Georgia Tech advised by Danfei Xu and Judy Hoffman. His research focuses on scaling robot learning from egocentric human experience, including EgoMimic and EgoVerse. He previously led Human2Robot at Physical Intelligence and is currently interning with NVIDIA GEAR. His work has been featured in the Meta AI Blog and The Washington Post.

Organizers

Contact Us

Contact us at: rssworkshop2026-official@outlook.com