sDSPL - Towards a benchmark for general-purpose task evaluation in domestic service robots

—Although the skills required to solve isolated robotics problems are reaching amazing performances recently, we propose the evaluation of such individual solutions in fully integrated robot systems tested in real daily situations like those presented at international robotics competitions. The simulation Domestic Standard Platform League (sDSPL), which utilizes the HSR simulator developed for the World Robot Summit, surges from the necessity to standardise and spread the research on Domestic Service Robots where a series of solutions can be tested to solve a general-purpose task in a standard domestic environment; this approach has been proven successful at several international competitions, namely, the RoboCup Japan Open, the Mexican Tournament of Robotics, and the RoboCup 2021.


I. INTRODUCTION
The presence of service robots in domestic environments is increasing recently and, in consequence, the familiarity of users and the difficulty of the tasks that might request. Therefore, standardisation on full task performance (in contrast to individual skills' evaluation) is necessary to compare different software and hardware implementations; furthermore, complex scenarios in natural places are preferable over controlled laboratory spaces. In this context, we propose international robot competitions -and we offer a standard simulation platform to evaluate domestic tasks -as a benchmark for general-purpose service-robot performance.
The rest of this paper is organized as follows. Section II presents a review on service robots' use and performance evaluation and Section III presents the Tidy-Up task as in the World Robot Summit (WRS) and RoboCup international competitions. Section IV expands on the robot evaluation at a competition level. The paper ends outlining our conclusions.

II. RELATED WORK
In [1], the authors presented a novel navigation system for service robots by training an Attention Branch Network [ [2]] to learn visual cues when navigating based on human behaviours and semantic information from their global path planning [ [3]] and they tested it in a service robot in an indoors environment. While able to navigate autonomously by extracting an attention map that allows it to generate a series of behaviours to traverse the path based on visual information, dynamic information was not included in the study, as expected in real home environments. * contreras.luis@lab.tamagawa.ac.jp. • Presenter Other applications using service robots include object manipulation; for example, in [4] a robot starts their motion planing system using previous experience in consistent situations. Similarly, in [5] and [6], the authors present a system where a service robot is able to manipulate unknown doors by receiving a single users instruction. Although those system presented high performance, the limitation of the application and the necessity of user feedback make them hard to extend those systems to general purpose service robots.
A common object manipulation approach that has been proven efficient uses the current RGBD information to perform a series of simulations to obtain the best grasping strategy, however, dealing with partial information due to view occlusions can result in non-optimal solutions; furthermore, when the task consists on manipulating dozens or even hundreds of objects, the solution becomes inefficient. Recently, a test platform for several manipulation applications has been presented in [7] where they use a simulator with a fixed robot arm on a surface and an upper RGBD camera and an eye-inhand monocular camera, and they feature 100 different tasks providing propioceptive and visual observations.
As we can see, although several application have been developed, they mostly are task-specific and highly depend on the experimental setup. It's in this context that the use of robot competitions as a benchmark for robot systems' performance has been proposed [ [8]], where several skills can objectively be evaluated in human-like scenarios to mark trends and challenges in the area as in [9] and [10]. The most recent work towards this goal is Habitat 2.0 [11] where the authors present a simulated home environment where the robot can perform several tasks; however, no specific rules and regulations are provided. Furthermore, the authors of this last study express that their "experiments suggest that complex, multi-step tasks such as setting the Moreover, differences in performance between laboratory and competition setups are presented at competition scenarios where teams have few chances to test their solutions. In the particular task of service robots attending users in a restaurant, while [12] and [13] developed a system able to navigate in complex scenarios, their system was tested in laboratory setups with few participants and without dynamic obstacles. On the other hand, [14] developed a navigation system able to work in unknown environment and with dynamic obstacles at a competition level where pedestrians occasionally interfere and interact with the robot in unexpected ways that the robot has to deal with while solving the given task.
Similarly, in the Tidy Up task, where a robot has to clean a room and place the objects in their correct locations, [15] has shown a high performance in object recognition and manipulation; however, they highly depend on external camera devices and high and external computing. On the other hand, in [16], [17], the proposed system operates autonomously in the same task and follows standard rules and regulations like those proposed in WRS and RoboCup to evaluate their performance.

III. TIDY UP
Following the rules presented at WRS -Partner Robot Challenge (Real Space), we proposed the Tidy-Up task to evaluate the robot performance. It consist on taking objects from the incorrect locations to a predetermined deposit and then providing a person within a group with some food from a shelf when requested while avoiding obstacles when navigating (https://worldrobotsummit.org/). It uses the YCB Object and Model set ( [18]) consisting of objects commonly found in home and office environments and they vary on shape, material, size, texture, weight, etc.
Whereas promoting Smart solutions, the key performance indicator is based on a 4S philosophy: Speed, Smooth, Stable, and Safe, using a compact field that can be easily set up anywhere to allow for continuous evaluation for a variety of research activities [19]. While rewarding actions like opening the drawers, depositing objects softly and in free spaces and, in some cases, placing objects according to a specific orientation, the rulebook discourages actions such as dropping or hitting objects and furniture, false deliveries, and delivering an object in an occupied space that might prevent teams from getting full scores in a specific action.
To show the variety of solutions aim at solving the same task, we will present few different proposal in the Tidyup task proposed by some of the participating teams -a complete overview on team approaches to solve the same task can be seen in their Poster Presentation at https: In [17], eR@sers Team uses a system where the robot constantly updates its belief by gathering spacial, visual, and contact information in the low-level behaviours during the object manipulation process while Hibikino-Musashi Team in [20] presents their object detection and recognition systems based on data augmentation by using a 3D scans of the objects and automatic annotations in different scales, orientations, and backgrounds.
Similarly, in [21] OIT-RITS Team presents their object detection and recognition strategy in cluttered scenes where they first segment an image using edge information and then form groups of segments with similar color information to generate an object view from similar partial views and their corresponding 2D position in the image. Then, they feed a recognition system with the generated view of the object.
Finally, AISL-TUT Team in [22] presents a reactive system to unexpected situations. They consider grasping errors by using contact sensors in the hand and navigation errors using the laser in the base: when an obstacle is in the way to a target location, they use the upper camera to find it and perform an action to remove it and free the path. However, they do not consider changes in the environment in the manipulation process.

IV. VIRTUAL COMPETITION PERFORMANCE
As mentioned before, simulation DSPL is highly based on WRS and, therefore, some rules might not be applicable to a simulated environment, especially those limited by real physical and mechanical principles (http://humansupportrobot.org/ robocup2021-dspl-simulation/). The sDSPL has been evolving from the WRS Real Space to a virtual setup trough several local tournaments as follows.
In [23], a simulator that uses the Toyota's Human Support Robot (HSR) ( [24]) and that fits the current WRS's rulebook was developed; this system was tested at the RoboCup Japan Open 2020 (https://www.robocup.or.jp/japanopen2020b-en/) and the results can be seen at https://bit.ly/3hqOigF. Then, the next step was adapting the WRS rules to a RoboCup competition format, divided in several tasks and stages, while the proposed simulator system could still be used; this format was first tested at the Mexican Tournament of Robotics 2021 (https://www. femexrobotica.org/tmr2021/en/portfolio-item/ robocup-standard-platform-league/) where the results can be seen at https://bit.ly/3qQfvwI.

A. RoboCup Worldwide 2021
Finally, to meet the necessities given the current global situation, this system was tested in a worldwide scenario that allowed teams to participate remotely. The format in the RoboCup competition (https://athome.robocup.org/ home-virtual-2021/) consisted in two stages: • Stage I -Clean Up (5 min) -Go and Get It (5 min) • Stage II -Clean Up (15 min) The proposed platform has the flexibility to be used with several robot models that enables it to be used in several leages. In particular, the RoboCup was divided in three leagues: • Domestic Standard Platform League (DSPL) • Social Standard Platform League (SSPL) • Open Platform League (OPL) In the DSPL and SSPL, teams use the HSR as standard robot platform to solve a given task, as shown in Figure 1.
On the other hand, in OPL, teams are allowed to modify the provided architecture or using their own; to illustrate this, an example template using the robot TIAGo [25] has been provided in the simulator; we can observe in Figure 2 two different OPL robot models.  [25], an off-the-shelf service robot, and b) robot Justina (as in [26]), a custom service robot for research.
A template using the HSR can be found at https://github.com/devrt/ robocup-at-home-2021-challenge and, for the TIAGo robot, at https://github.com/devrt/ robocup-at-home-2021-opl-challenge. A complete overview of robot models provided by OPL teams can be seen at https://bit.ly/3qF617f. Using either a standard robot (DSPL or SSPL) or a custom (OPL) robot model brings different challenges; while a standard platform allows a research group to focus on the algorithmic part and consistently compare different proposals in the same setup, an open platform allows them to overcome physical limitations of a fixed model. Table I shows the performance in the different leagues to solve the same problem. We can observe that using a standard platform permits teams to focus on the tasks at hand allowing them to obtain the highest performance; however, once the limitations of the system are reached, it also allows all teams to imagine new solutions to improve the platform. An overview of the research output from different groups in all categories can be seen in the RoboCup's Open Demonstrations at (https: //bit.ly/3AjV0fT) while all participating teams' results in the different leagues can be seen at https://athome. robocup.org/rc2021/.

B. Technical Challenge
In addition to a competition format, but still promoting a full task performance as a baseline for robot evaluation, we propose a technical challenge by completing the Tidy-Up task as in the World Robot Summit for real robots. It's a 20 minutes test where Task 1 (Clean Up, 15 min) and then Task 2 (Go and Get It, 5 min) evaluated as a single continuous test. The score is the one that the system provides (i.e. no manual inspection to add or remove any point is performed); a mean or median performance after several runs is considered to account for any errors and inconsistencies in the simulator and the scoring system.
During the competition, teams are invited to register their repositories to participate and all tests are run in the cloud (i.e. all commands are provided in a Dockerfile without any further user intervention); however, for research, this can be run locally and users can report their results after several runs with the same or different seeds that should be provided in order to be able to replicate the experiments.
The command to locally randomise the objects in the arena, depending on the robot used, is:  Table II shows the best results per task (Clean Up and Go and Get It) as well as the best Technical Challenge team among all leagues. It can be observed that the use of a standard platform, like the HSR, allows teams to focus on solving the task at hand without limiting innovation.
In addition to robot competitions, that are very limited through the year and can be hard for a single research group to participate in all of them to test their latest developments, we propose automatic scoring systems as service robot benchmarks; this approach might allow teams to consistently measure their progress and objectively compare their results with other research groups that, in consequence, will stimulate the research in full service robots and not only in the individual skills that integrate them. The constant improvement of the simulators and the increasing scope and difficulty of the tasks to solve should consider a deep analysis of real robot competitions as the state-of-the-art for this kind of systems while real robot competitions should consider the results on these benchmarks to broaden their scope, as well.

V. CONCLUSIONS
We propose a simulation Domestic Standard Platform League (sDSPL) as a benchmark to evaluate the performance of service robots while executing a task (in contrast to  benchmarks focused on evaluating individual skills). We believe that a standard system can boost research in two main aspects: first, using the same robot and setup, a research group can consistently test several baselines to solve a given task; second, using a standard robot architecture to optimally solve a given task, a research group can evaluate the impact of any hardware modification or, even, they can assess their own custom robot models.