1st Workshop on Evaluating Robots for the Real World: Aligning Academia, Industry, and Policymakers (RoboEval)
Proposed Workshop @ Robotics Science and Systems Conference (RSS 2025) - June 25 - Los Angeles, USA
About
There has been immense progress from the scientific community in developing methods and systems that boast impressive
performance on public benchmarks. However, systems still struggle with generalization, robustness, safety, and
reliability when deployed to real-world settings (e.g., factories, construction sites, residential homes). In addition
to the notion of optimising the algorithms and methods for deployment and transferability, this workshop raises such
questions as "Should we also optimize our benchmarks to be more representative assessments of good real-world behavior?"
and "Should we take more care in assessing the current level of robot capabilities, so that we know when significant and
readily-deployable advancements are made?" We assert that these questions point to critical challenges in evaluating
robots in the real world, which we organize into three categories:
Evaluations and Progress. How do we ensure evaluations are created to drive meaningful advancements in the field without
causing too many barriers to stifle progress?
Accessibility and Relevance. Should benchmarks emphasize the full complexity of real-world deployment challenges, or
should they be simplified to encourage broader participation and replicability across diverse research contexts?
Alignment Across Stakeholders. What are the needs of evaluations for different stakeholders? Should benchmarks address
the needs of academia, industry, and policy simultaneously? Or developed more to fit individual needs?
Intended audience. To address the above questions and challenges, the workshop will bring together keynote speakers from
different communities (e.g., academia, industry, policy), facilitate debates on pressing issues relating to real-world
robot evaluation, and host a paper track prioritizing evaluation metrics and results focused on achieving deployability
and generalization of robotic systems. In addition to the organizers, the presenters, panelists, and technical program
committee are drawn from the following (sub-)communities: Robot Learning, Computer Vision, Natural Language Processing,
Embodied AI, Formal Benchmarking, and Public Policy. We likewise intend to attract an audience from these diverse
sub-communities to contribute to compelling discussions.