Robotic grasp planning based on raw sensory data is difficult due to occlusion and incomplete scene geometry. Often one sensory modality does not provide enough context to enable reliable planning. A single depth sensor image cannot provide information about occluded regions of an object, and tactile information is incredibly sparse spatially. We are building a Deep Learning CNN that combines both 3D vision and tactile information to perform shape completion of an object seen from a single view only, and plan stable grasps on these completed models.

This multi-modal system will be able to utilize tactile and RGBD information to form a more complete object model for grasping. The project draws heavily on using simulated vision and tactile data to create rich training data sets. The DSI scholar will be involved in creating the simulated data sets, and also using ML algorithms and systems (e.g. Caffe, Theano) to develop the CNN architecture. The contributions of this work include: an open source dataset for training a shape completion system using both tactile and RGBD information; a framework for integrating multi-modal sensory data to reason about object geometry; and results comparing the completed object models using depth only and combined depth-tactile information.

Faculty Advisor

  • Professor Peter Allen.
  • Department/School: Computer Science/SEAS.
  • Location: 6Lw1 CEPSR.

Project timeline

  • Start date: 06/01/2018
  • End date: 08/15/2018
  • Number of hours per week of research expected: ~40

Candidate requirements

  • Skill sets: knowledge of Python; basic Machine Learning.
  • Student eligibility (as of Spring 2018): freshman, sophomore, junior, senior, master’s
  • International students on F1 or J1 visa: eligible
  • Other comments: must have interest in robotics.