Zero-Shot Robotic Grasping from Local Surface Geometry
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Robotic grasping of previously unseen objects remains a central challenge in manipulation, particularly in unstructured environments where object models, semantic labels, or prior pose information are unavailable. Although learning-based grasping systems have achieved strong performance, many rely on large annotated datasets, global object representations, or simulation-to-real transfer pipelines, which can limit scalability and generalization. This thesis presents RCVGrasp, a zero-shot grasp prediction framework grounded solely in local surface geometry. Instead of modelling global object shape or identity, a parallel-jaw grasp is represented as a pair of local three-dimensional surface patches corresponding to the gripper contact regions. Grasp feasibility is formulated as a binary classification problem over patch pairs, enabling contact-centric reasoning that is independent of object category, appearance, or semantic context. The model is trained entirely on procedurally generated synthetic data derived from geometric constraints, without using object meshes, CAD models, RGB input, or sim-to-real adaptation. RCVGrasp is evaluated across three progressively realistic settings: synthetic patch-pair classification, zero-shot transfer to complete 3D object meshes from standard benchmarks, and real-world robotic grasp execution using stereo-reconstructed point clouds. The framework achieves high classification accuracy on unseen synthetic data, demonstrates strong geometric generalization to previously unseen object models, and successfully executes grasps on physical objects without retraining. The proposed RCVGrasp framework is evaluated across multiple progressively realistic settings, including synthetic patch-pair classification, zero-shot transfer to complete 3D object meshes from standard benchmarks, cross-dataset agreement analysis with existing grasp datasets, and real-world robotic grasp execution using stereo-reconstructed point clouds. The framework achieves high classification accuracy on unseen synthetic data, demonstrates strong geometric generalization to previously unseen object models through dense patch-pair evaluation and candidate filtering, and shows consistent agreement with established grasp datasets. Furthermore, it successfully executes grasps on physical objects without retraining, validating its applicability under real-world sensing and deployment conditions. These results show that local surface geometry alone provides a robust and transferable prior for parallel-jaw grasp feasibility. By decoupling grasp reasoning from object identity and global modelling, this work establishes a lightweight, interpretable, and scalable alternative to object-dependent grasping methods for zero-shot robotic manipulation.

