| We consider the problem of grasping novel objects, specifically objects that are being seen for the first time through vision. Grasping a
previously unknown object, one for which a 3-d model is not available,
is a challenging problem. Furthermore, even if given a model,
one still has to decide where to grasp the object. We present a learning
algorithm that neither requires nor tries to build a 3-d model of
the object. Given two (or more) images of an object, our algorithm
attempts to identify a few points in each image corresponding to good
locations at which to grasp the object. This sparse set of points is
then triangulated to obtain a 3-d location at which to attempt a grasp.
This is in contrast to standard dense stereo, which tries to triangulate
every single point in an image (and often fails to return a good 3-d
model). Our algorithm for identifying grasp locations from an image
is trained by means of supervised learning, using synthetic images for
the training set. We demonstrate this approach on two robotic manipulation
platforms. Our algorithm successfully grasps a wide variety
of objects, such as plates, tape rolls, jugs, cellphones, keys, screwdrivers,
staplers, a thick coil of wire, a strangely shaped power horn
and others, none of which were seen in the training set. We also apply
our method to the task of unloading items from dishwashers. |