A team of researchers in the Allen School’s Robotics and State Estimation Lab earned the award for Best Robotic Vision Paper at the recent IEEE International Conference on Robotics and Automation (ICRA 2017). The winning paper, “Self-supervised Visual Descriptor Learning for Dense Correspondence,” represents a significant step forward in robot learning by providing a framework for enabling robots to understand their local environments without human intervention.
In order for robots to operate safely and effectively in dynamic environments, they must be able to recognize objects and scenes they have encountered on previous occasions and apply that knowledge to their current situation. The process by which robots develop this human-like perception of their environment is known as correspondence estimation, which typically requires image descriptors that have been engineered by hand or the expensive collection of vast quantities of training data to take advantage of deep learning techniques.
The team from the RSE-Lab — Allen School Ph.D. student Tanner Schmidt, former postdoc and current affiliate professor Richard Newcombe of Oculus, and professor Dieter Fox — came up with an alternative that automates the generation of training data and enable robots to learn the visual features of a scene in a self-supervised way. Leveraging dense mapping techniques such as KinectFusion and DynamicFusion, the researchers were able to generate correspondence labels from raw RGB-D video data. The researchers then used the resulting labels and a contrastive loss to train a fully convolutional network to produce dense visual descriptors from novel images that are consistent despite variations in pose, viewpoint, or lighting conditions. This work, which represents the state of the art in descriptor learning, will be useful for researchers tackling a number of important problems in robot vision, including tracking, mapping, and object recognition.
Another Allen School paper, “SE3-Nets: Learning Rigid Body Motion using Deep Neural Networks” by Ph.D. student Arunkumar Byravan and Fox, was a finalist for the same award. That paper describes SE3-Nets, which are deep neural networks for modeling and predicting the motion of objects subject to applied force. Byravan and Fox demonstrated that SE3-Nets are able to learn scene dynamics from limited real-world data, generalize across different scenes, and more consistently predict object motion compared to traditional flow networks.
As Fox points out, vision is becoming an increasingly important area of robotics — and our showing at ICRA demonstrates that we are at the forefront of this exciting line of research. Way to go, team!