Humans draw on their vast life experience of touching things to make inferences about the objects in their environment; this capability enables one to make haptic judgments before actually touching things. For example, it is effortless to select the correct grip force for picking up a delicate object, or to choose a g...
University of Pennsylvania, Philadelphia, USA, August 2018, Department of Electrical and Systems Engineering (phdthesis)
Autonomous robots need to efficiently walk over varied surfaces and grasp diverse objects. We hypothesize that the association between how such surfaces look and how they physically feel during contact can be learned from a database of matched haptic and visual data recorded from various end-effectors' interactions with hundreds of real-world surfaces. Testing this hypothesis required the creation of a new multimodal sensing apparatus, the collection of a large multimodal dataset, and development of a machine-learning pipeline.
This thesis begins by describing the design and construction of the Portable Robotic Optical/Tactile ObservatioN PACKage (PROTONPACK, or Proton for short), an untethered handheld sensing device that emulates the capabilities of the human senses of vision and touch. Its sensory modalities include RGBD vision, egomotion, contact force, and contact vibration. Three interchangeable end-effectors (a steel tooling ball, an OptoForce three-axis force sensor, and a SynTouch BioTac artificial fingertip) allow for different material properties at the contact point and provide additional tactile data.
We then detail the calibration process for the motion and force sensing systems, as well as several proof-of-concept surface discrimination experiments that demonstrate the reliability of the device and the utility of the data it collects. This thesis then presents a large-scale dataset of multimodal surface interaction recordings, including 357 unique surfaces such as furniture, fabrics, outdoor fixtures, and items from several private and public material sample collections. Each surface was touched with one, two, or three end-effectors, comprising approximately one minute per end-effector of tapping and dragging at various forces and speeds. We hope that the larger community of robotics researchers will find broad applications for the published dataset.
Lastly, we demonstrate an algorithm that learns to estimate haptic surface properties given visual input. Surfaces were rated on hardness, roughness, stickiness, and temperature by the human experimenter and by a pool of purely visual observers. Then we trained an algorithm to perform the same task as well as infer quantitative properties calculated from the haptic data. Overall, the task of predicting haptic properties from vision alone proved difficult for both humans and computers, but a hybrid algorithm using a deep neural network and a support vector machine achieved a correlation between expected and actual regression output between approximately ρ = 0.3 and ρ = 0.5 on previously unseen surfaces.
Work-in-progress paper (3 pages) presented at the IEEE Haptics Symposium, San Francisco, USA, March 2018 (misc)
Human children typically experience their surroundings both visually and haptically, providing ample opportunities to learn rich cross-sensory associations. To thrive in human environments and interact with the real world, robots also need to build models of these cross-sensory associations; current advances in machine learning should make it possible to infer models from large amounts of data. We previously built a visuo-haptic sensing device, the Proton Pack, and are using it to collect a large database of matched multimodal data from tool-surface interactions. As a benchmark to compare with machine learning performance, we conducted a human subject study (n = 84) on estimating haptic surface properties (here: hardness, roughness, friction, and warmness) from images. Using a 100-surface subset of our database, we showed images to study participants and collected 5635 ratings of the four haptic properties, which we compared with ratings made by the Proton Pack operator and with physical data recorded using motion, force, and vibration sensors. Preliminary results indicate weak correlation between participant and operator ratings, but potential for matching up certain human ratings (particularly hardness and roughness) with features from the literature.
In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 439-445, Singapore, May 2017 (inproceedings)
The Portable Robotic Optical/Tactile ObservatioN PACKage (PROTONPACK, or Proton for short) is a new handheld visuo-haptic sensing system that records surface interactions. We previously demonstrated system calibration and a classification task using external motion tracking. This paper details improvements in surface classification performance and removal of the dependence on external motion tracking, necessary before embarking on our goal of gathering a vast surface interaction dataset. Two experiments were performed to refine data collection parameters. After adjusting the placement and filtering of the Proton's high-bandwidth accelerometers, we recorded interactions between two differently-sized steel tooling ball end-effectors (diameter 6.35 and 9.525 mm) and five surfaces. Using features based on normal force, tangential force, end-effector speed, and contact vibration, we trained multi-class SVMs to classify the surfaces using 50 ms chunks of data from each end-effector. Classification accuracies of 84.5% and 91.5% respectively were achieved on unseen test data, an improvement over prior results. In parallel, we pursued on-board motion tracking, using the Proton's camera and fiducial markers. Motion tracks from the external and onboard trackers agree within 2 mm and 0.01 rad RMS, and the accuracy decreases only slightly to 87.7% when using onboard tracking for the 9.525 mm end-effector. These experiments indicate that the Proton 2 is ready for portable data collection.
Workshop paper (5 pages) presented at the AAAI Spring Symposium on Interactive Multi-Sensory Object Perception for Embodied Agents, Stanford, USA, March 2017 (misc)
The Proton Pack is a portable visuo-haptic surface interaction recording device that will be used to collect a vast multimodal dataset, intended for robots to use as part of an approach to understanding the world around them. In order to collect a useful dataset, we want to pick a suitable interaction duration for each surface, noting the tradeoff between data collection resources and completeness of data. One interesting approach frames the data collection process as an online learning problem, building an incremental surface model and using that model to decide when there is enough data. Here we examine how to do such online surface modeling and when to stop collecting data, using kinetic friction as a first domain in which to apply online modeling.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems