Suppose you are at your desk looking at some objects on it. You don't know the precise distance from your eye to any particular object in meters. However, you can immediately reach out and touch any of them. Instead of the meter, your knowledge of distance is encoded in unknown but embodied units of action. In contrast, standard approaches in robotics assume calibration to the meter, so that separated vision and control processes can be interfaced. Consequently, robots are precisely manufactured and calibrated, resulting in expensive systems available in only a few configurations.
In response, we propose Embodied Visuomotor Representation, a framework that allows distance to be measured by a robot's own actions and thus minimizes dependence on calibrated 3D sensors and physical models. Using it, we demonstrate that a robot without knowledge of its size, environmental scale, or its own strength can become capable of touching and clearing obstacles after several seconds of operation. Similarly, we demonstrate in simulation that an agent, without knowledge of its mass or strength, can jump a gap of unknown size after performing a few test oscillations. These experiments parallel bee and gerbil behavior, respectively.
(A): The classic sense-plan-act architecture used in robotics assuming VIO is used for state estimation. (B): An architecture based on Embodied Visuomotor Representation. Compared to sense-plan-act, the embodied approach includes an additional internal feedback connection (red arrow), that allows it to develop an embodied sense of scale without access to calibrated sensors such as an IMU. The resulting measurements naturally result in stable closed-loop control. On the other hand, the sense-plan-act loop's stability depends on accurate calibration of the sensors to an external scale.
Movie 1: Uncalibrated Touching and Clearing
Movie 2: Uncalibrated Jumping
Levi Burner, Cornelia Fermüller, Yiannis Aloimonos.