-
In collaboration with researchers from Synthesis AI and Columbia University, researches from Google have developed a new method where robots can detect a transparent object.
-
The new algorithm can detect and provide accurate 3D models of transparent objects from RGB-D cameras.
-
When using a robot parallel-jaw gripper arm, the gripping success rate of transparent objects improved from 12% to 74%, and from 64% to 86% with suction.
Fundamental parts of a modern robotics platform are optical sensors such as cameras and lidar. But, they undergo a common flaw. A transparent object like glass containers tends to confuse them as they are colorless. That’s because most of the algorithms analyzing data from those sensors assume all surfaces are Lambertian, or that they reflect light evenly in all directions and from all angles. But this is not the case with transparent objects. They both refract and reflect light, rendering depth data invalid or full of noise.
To solve this issue, a team of Google researchers collaborated with Columbia University and Synthesis AI, a data generation platform for computer vision, to develop ClearGrasp. It’s an algorithm capable of estimating accurate 3D data of transparent objects from RGB (Red, Green, Blue) images, and importantly , using AI to reconstruct the depth of transparent objects and generalize to objects unseen during training.
Learn More: Google has developed a new method to detect transparent objects
While trying to find a solution, researchers note, training sophisticated AI models usually requires large data sets, and because no corpus of transparent objects existed, they created their own containing more than 50,000 photorealistic renders with corresponding depth, edges, surface normals (which represent the surface curvature), and more. Each image shows up to five transparent objects, either on a flat ground plane or inside a tote with various backgrounds and lighting. And a separate set of 286 real-world images with corresponding ground truth depth serves as a test set.
ClearGrasp comprises three altogether: a network to estimate surface normals, one for occlusion boundaries (depth discontinuities), and one that covers transparent objects. This mask removes all pixels belonging to transparent objects so that the correct depths can be filled in, and so an optimization module can extend the surface’s depth using predicted surface normals to guide the reconstruction’s shape. (The predicted occlusion boundaries help to keep a separation between distinct objects.)
In experiments, the researchers trained the models not only on their custom data set but also real indoor scenes from open-source Matterport3D and ScanNet corpora. They say that ClearGrasp managed to reconstruct depth for transparent objects with much higher fidelity than the baseline methods and that its output depth could be directly used as input to manipulation algorithms that use images. When using a robot parallel-jaw gripper arm, the gripping success rate of transparent objects improved very well. It grew from a baseline of 12% to 74%. With suction improvement rate raised from 64% to 86%.
“ClearGrasp can benefit robotic manipulation by incorporating it into our pick and place robot’s control system, where we observe significant improvements in the grasping success rate of transparent plastic objects.”
-Andy Zeng, Google research scientist
A promising direction for future work is improving the domain transfer to real-world images by generating renders with physically-correct caustics and surface imperfections such as fingerprints. Enabling machines to better sense transparent surfaces would not only improve safety but could also open up a range of new interactions in unstructured applications — from or sorting plastics for recycling, to navigating indoor environments or generating AR visualizations on glass tabletops.
Learn more:
Limitations & Future Work:
A is that it does not represent accurate caustics, due to the limitations of rendering with traditional path-tracing algorithms. As a result, Google AI models confuse bright caustics coupled with shadows to be independent transparent objects. Despite these drawbacks, their work with ClearGrasp shows that synthetic data remains a viable approach to achieve competent results for learning-based depth reconstruction methods. A promising direction for future work is improving the domain transfer to real-world images by generating renders with physically-correct caustics and surface imperfections such as fingerprints.
With ClearGrasp, Google AI demonstrates that high-quality renders can be used to successfully train models that perform well in the real world. Their data is ready to drive further research on data-driven perception algorithms for transparent objects