Abstract:
To enhance the collaborative performance of pushing and grasping in cluttered environments and to strengthen the network's ability to perceive the location of objects and the positional information between objects, a collaborative network based on object positional information for pushing and grasping was proposed to address the problem of robotic grasping in cluttered environments. This network employed two fully convolutional networks to infer the locations and directions of grasping and pushing actions respectively from visual observations. A coordinate attention module was utilized to aggregate features along two directions in the two-dimensional space, capturing long-distance dependencies in the horizontal space direction while preserving object positional information in the vertical space direction. Attention maps of the pushing and grasping location features were then generated to enhance the accuracy of the network's inference on the positions of actions. Object dispersion was introduced to measure the dispersion degree of objects in the environment from a global perspective, and a pushing reward function based on object dispersion was designed to enhance the quality of pushing actions. In simulation experiments, the network achieved a grasping success rate and action efficiency of 75.1% and 73.2%, respectively. In the real world, the network achieved a grasping success rate and action efficiency of 80.1% and 76.2%, respectively.