Zero-Shot Robotic Grasping from Local Surface Geometry

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Robotic grasping of previously unseen objects remains a central challenge in manipulation, particularly in unstructured environments where object models, semantic labels, or prior pose information are unavailable. Although learning-based grasping systems have achieved strong performance, many rely on large annotated datasets, global object representations, or simulation-to-real transfer pipelines, which can limit scalability and generalization. This thesis presents RCVGrasp, a zero-shot grasp prediction framework grounded solely in local surface geometry. Instead of modelling global object shape or identity, a parallel-jaw grasp is represented as a pair of local three-dimensional surface patches corresponding to the gripper contact regions. Grasp feasibility is formulated as a binary classification problem over patch pairs, enabling contact-centric reasoning that is independent of object category, appearance, or semantic context. The model is trained entirely on procedurally generated synthetic data derived from geometric constraints, without using object meshes, CAD models, RGB input, or sim-to-real adaptation. RCVGrasp is evaluated across three progressively realistic settings: synthetic patch-pair classification, zero-shot transfer to complete 3D object meshes from standard benchmarks, and real-world robotic grasp execution using stereo-reconstructed point clouds. The framework achieves high classification accuracy on unseen synthetic data, demonstrates strong geometric generalization to previously unseen object models, and successfully executes grasps on physical objects without retraining. The proposed RCVGrasp framework is evaluated across multiple progressively realistic settings, including synthetic patch-pair classification, zero-shot transfer to complete 3D object meshes from standard benchmarks, cross-dataset agreement analysis with existing grasp datasets, and real-world robotic grasp execution using stereo-reconstructed point clouds. The framework achieves high classification accuracy on unseen synthetic data, demonstrates strong geometric generalization to previously unseen object models through dense patch-pair evaluation and candidate filtering, and shows consistent agreement with established grasp datasets. Furthermore, it successfully executes grasps on physical objects without retraining, validating its applicability under real-world sensing and deployment conditions. These results show that local surface geometry alone provides a robust and transferable prior for parallel-jaw grasp feasibility. By decoupling grasp reasoning from object identity and global modelling, this work establishes a lightweight, interpretable, and scalable alternative to object-dependent grasping methods for zero-shot robotic manipulation.

Description

Keywords

Robotics, Computer Vision, Machine Learning, Deep Learning, Zero-Shot, Robotic Grasping

Citation

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution 4.0 International