RSS RoboEval 2025

About
There has been immense progress from the scientific community in developing methods and systems that boast impressive performance on public benchmarks. However, systems still struggle with generalization, robustness, safety, and reliability when deployed to real-world settings (e.g., factories, construction sites, residential homes). In addition to the notion of optimising the algorithms and methods for deployment and transferability, this workshop raises such questions as "Should we also optimize our benchmarks to be more representative assessments of good real-world behavior?" and "Should we take more care in assessing the current level of robot capabilities, so that we know when significant and readily-deployable advancements are made?" We assert that these questions point to critical challenges in evaluating robots in the real world, which we organize into three categories:

Evaluations and Progress. How do we ensure evaluations are created to drive meaningful advancements in the field without causing too many barriers to stifle progress?
Accessibility and Relevance. Should benchmarks emphasize the full complexity of real-world deployment challenges, or should they be simplified to encourage broader participation and replicability across diverse research contexts?
Alignment Across Stakeholders. What are the needs of evaluations for different stakeholders? Should benchmarks address the needs of academia, industry, and policy simultaneously? Or developed more to fit individual needs?

Intended audience. To address the above questions and challenges, the workshop will bring together keynote speakers from different communities (e.g., academia, industry, policy), facilitate debates on pressing issues relating to real-world robot evaluation, and host a paper track prioritizing evaluation metrics and results focused on achieving deployability and generalization of robotic systems. In addition to the organizers, the presenters, panelists, and technical program committee are drawn from the following (sub-)communities: Robot Learning, Computer Vision, Natural Language Processing, Embodied AI, Formal Benchmarking, and Public Policy. We likewise intend to attract an audience from these diverse sub-communities to contribute to compelling discussions.