Header logo is
Institute Talks

Artificial haptic intelligence for human-machine systems

IS Colloquium
  • 24 October 2018 • 11:00 12:00
  • Veronica J. Santos
  • 5H7 at MPI-IS in Stuttgart

The functionality of artificial manipulators could be enhanced by artificial “haptic intelligence” that enables the identification of object features via touch for semi-autonomous decision-making and/or display to a human operator. This could be especially useful when complementary sensory modalities, such as vision, are unavailable. I will highlight past and present work to enhance the functionality of artificial hands in human-machine systems. I will describe efforts to develop multimodal tactile sensor skins, and to teach robots how to haptically perceive salient geometric features such as edges and fingertip-sized bumps and pits using machine learning techniques. I will describe the use of reinforcement learning to teach robots goal-based policies for a functional contour-following task: the closure of a ziplock bag. Our Contextual Multi-Armed Bandits approach tightly couples robot actions to the tactile and proprioceptive consequences of the actions, and selects future actions based on prior experiences, the current context, and a functional task goal. Finally, I will describe current efforts to develop real-time capabilities for the perception of tactile directionality, and to develop models for haptically locating objects buried in granular media. Real-time haptic perception and decision-making capabilities could be used to advance semi-autonomous robot systems and reduce the cognitive burden on human teleoperators of devices ranging from wheelchair-mounted robots to explosive ordnance disposal robots.

Organizers: Katherine Kuchenbecker

Artificial haptic intelligence for human-machine systems

IS Colloquium
  • 25 October 2018 • 11:00 11:00
  • Veronica J. Santos
  • N2.025 at MPI-IS in Tübingen

The functionality of artificial manipulators could be enhanced by artificial “haptic intelligence” that enables the identification of object features via touch for semi-autonomous decision-making and/or display to a human operator. This could be especially useful when complementary sensory modalities, such as vision, are unavailable. I will highlight past and present work to enhance the functionality of artificial hands in human-machine systems. I will describe efforts to develop multimodal tactile sensor skins, and to teach robots how to haptically perceive salient geometric features such as edges and fingertip-sized bumps and pits using machine learning techniques. I will describe the use of reinforcement learning to teach robots goal-based policies for a functional contour-following task: the closure of a ziplock bag. Our Contextual Multi-Armed Bandits approach tightly couples robot actions to the tactile and proprioceptive consequences of the actions, and selects future actions based on prior experiences, the current context, and a functional task goal. Finally, I will describe current efforts to develop real-time capabilities for the perception of tactile directionality, and to develop models for haptically locating objects buried in granular media. Real-time haptic perception and decision-making capabilities could be used to advance semi-autonomous robot systems and reduce the cognitive burden on human teleoperators of devices ranging from wheelchair-mounted robots to explosive ordnance disposal robots.

Organizers: Katherine Kuchenbecker Adam Spiers

A fine-grained perspective onto object interactions

Talk
  • 30 October 2018 • 10:30 11:30
  • Dima Damen
  • N3.022 (Aquarium)

This talk aims to argue for a fine-grained perspective onto human-object interactions, from video sequences. I will present approaches for the understanding of ‘what’ objects one interacts with during daily activities, ‘when’ should we label the temporal boundaries of interactions, ‘which’ semantic labels one can use to describe such interactions and ‘who’ is better when contrasting people perform the same interaction. I will detail my group’s latest works on sub-topics related to: (1) assessing action ‘completion’ – when an interaction is attempted but not completed [BMVC 2018], (2) determining skill or expertise from video sequences [CVPR 2018] and (3) finding unequivocal semantic representations for object interactions [ongoing work]. I will also introduce EPIC-KITCHENS 2018, the recently released largest dataset of object interactions in people’s homes, recorded using wearable cameras. The dataset includes 11.5M frames fully annotated with objects and actions, based on unique annotations from the participants narrating their own videos, thus reflecting true intention. Three open challenges are now available on object detection, action recognition and action anticipation [http://epic-kitchens.github.io]

Organizers: Mohamed Hassan

TBA

IS Colloquium
  • 28 January 2019 • 3pm 4pm
  • Florian Marquardt

Organizers: Matthias Bauer

  • Prof. Christian Wallraven
  • MPI-IS Stuttgart, Heisenbergstr. 3, Room 5H7

Already starting at birth, humans integrate information from several sensory modalities in order to form a representation of the environment - such as when a baby explores, manipulates, and interacts with objects. The combination of visual and touch information is one of the most fundamental sensory integration processes, as touch information (such as body-relative size, shape, texture, material, temperature, and weight) can easily be linked to the visual image, thereby providing a grounding for later visual-only recognition. Previous research on such integration processes has so far mainly focused on low-level object properties (such as curvature, or surface granularity) such that little is known on how the human actually forms a high-level multisensory representation of objects. Here, I will review research from our lab that investigates how the human brain processes shape using input from vision and touch. Using a large variety of novel, 3D-printed shapes we were able to show that touch is actually equally good at shape processing than vision, suggesting a common, multisensory representation of shape. We next conducted a series of imaging experiments (using anatomical, functional, and white-matter analyses) that chart the brain networks that process this shape representation. I will conclude the talk with a brief medley of other haptics-related research in the lab, including robot learning, braille, and haptic face recognition.

Organizers: Katherine Kuchenbecker


  • Haliza Mat Husin
  • MPI-IS Stuttgart, Heisenbergstr. 3, Room 2P4

Background: Pre-pregnancy obesity and inadequate maternal weight gain during pregnancy can lead to adverse effects in the newborn but also to metabolic, cardiovascular and even neurological diseases in older ages of the offspring. Heart activity can be used as a proxy for the activity of the autonomic nervous system (ANS). The aim of this study is to evaluate the effect of pre-pregnancy weight, maternal weight gain and maternal metabolism on the ANS of the fetus in healthy pregnancies.

Organizers: Katherine Kuchenbecker


Appearance Modeling for 4D Multi-view Representations

Talk
  • 15 December 2017 • 12:00 12:45
  • Vagia Tsiminaki
  • PS Seminar Room (N3.022)

The emergence of multi-view capture systems has yield a tremendous amount of video sequences. The task of capturing spatio-temporal models from real world imagery (4D modeling) should arguably benefit from this enormous visual information. In order to achieve highly realistic representations both geometry and appearance need to be modeled in high precision. Yet, even with the great progress of the geometric modeling, the appearance aspect has not been fully explored and visual quality can still be improved. I will explain how we can optimally exploit the redundant visual information of the captured video sequences and provide a temporally coherent, super-resolved, view-independent appearance representation. I will further discuss how to exploit the interdependency of both geometry and appearance as separate modalities to enhance visual perception and finally how to decompose appearance representations into intrinsic components (shading & albedo) and super-resolve them jointly to allow for more realistic renderings.

Organizers: Despoina Paschalidou


Sum-Product Networks for Probabilistic Modeling

Talk
  • 06 December 2017 • 15:00 15:45
  • Robert Peharz
  • AGBS seminar room

Probabilistic modeling is the method of choice when it comes to reasoning under uncertainty. However, one of the main practical downsides of probabilistic models is that inference, i.e. the process of using the model to answer statistical queries, is notoriously hard in general. This led to a common folklore that probabilistic models which allow exact inference are necessarily simplistic and undermodel any practical task. In this talk, I will present sum-product networks (SPNs), a recently proposed architecture representing a rich and expressive class of probability distributions, which also allows exact and efficient computation of many inference tasks. I will discuss representational properties, inference routines and learning approaches in SPNs. Furthermore, I will provide some examples of practical applications using SPNs.


Reconstructing and Perceiving Humans in Motion

Talk
  • 30 November 2017 • 15:00
  • Dr. Gerard Pons-Moll

For man-machine interaction it is crucial to develop models of humans that look and move indistinguishably from real humans. Such virtual humans will be key for application areas such as computer vision, medicine and psychology, virtual and augmented reality and special effects in movies. Currently, digital models typically lack realistic soft tissue and clothing or require time-consuming manual editing of physical simulation parameters. Our hypothesis is that better and more realistic models of humans and clothing can be learned directly from real measurements coming from 4D scans, images and depth and inertial sensors. We combine statistical machine learning techniques and physics based simulation to create realistic models from data. We then use such models to extract information out of incomplete and noisy sensor data from monocular video, depth or IMUs. I will give an overview of a selection of projects conducted in Perceiving Systems in which we build realistic models of human pose, shape, soft-tissue and clothing. I will also present some of our recent work on 3D reconstruction of people models from monocular video, real-time fusion and online human body shape estimation from depth data and recovery of human pose in the wild from video and IMUs. I will conclude the talk outlining the next challenges in building digital humans and perceiving them from sensory data.

Organizers: Melanie Feldhofer


  • Professor Brent Gillespie
  • MPI-IS Stuttgart, Heisenbergstr.3, Werner-Köster-Hörsaal 2R 4 and broadcast

Relative to most robots and other machines, the human body is soft, its actuators compliant, and its control quite forgiving. But having a body that bends under load seems like a bad set-up for motor dexterity: the brain is faced with controlling more rather than fewer degrees of freedom. Undeniably, though, the soft body approach leads to superior solutions. Robots are putzes by comparison! While de-putzifying robots (perhaps by making them softer) is an endeavor I will discuss to some degree, in this talk I will focus on the design of robots intended to work cooperatively with humans, using physical interaction and haptic feedback in the axis of control. I will propose a backdrivable robot with forgiving control as a teammate for humans, with the aim of meeting pressing needs in rehabilitation robotics and semi-autonomous driving. In short, my lab is working to create alternatives to the domineering robot who wants complete control. Giving up complete control leads to “slacking” and loss of therapeutic benefit in rehabilitation and loss of vigilance and potential for disaster in driving. Cooperative or shared control is premised on the idea that two heads, especially two heads with complementary capabilities, are better than one. But the two heads must agree on a goal and a motor plan. How can one agent read the motor intent of another using only physical interaction signals? A few old-school control principles from biology and engineering to the rescue! One key is provided by von Holst and Mittelsteadt’s famous Reafference Principle, published in 1950 to describe how a hierarchically organized neural control system distinguishes what they called reafference from exafference—roughly: expected from unexpected. A second key is provided by Francis and Wonham’s Internal Model Principle, published in in 1976 and considered an enabler for the disk drive industry. If we extend the Reafference Principle with model-based control and use the Internal Model Principle to treat predictable exogenous (exafferent) signals, then we arrive at a theory that I will argue puts us into position to extract motor intent and thereby enable effective control sharing between humans and robots. To support my arguments I will present results from a series of experiments in which we asked human participants to move expected and unexpected loads, to track predictable and unpredictable reference signals, to exercise with self-assist and other-assist, and to share control over a simulated car with an automation system.

Organizers: Katherine Kuchenbecker


  • Christoph Mayer
  • S2 Seminar Room (S 2.014)

Variational image processing translates image processing tasks into optimisation problems. The practical success of this approach depends on the type of optimisation problem and on the properties of the ensuing algorithm. A recent breakthrough was to realise that old first-order optimisation algorithms based on operator splitting are particularly suited for modern data analysis problems. Operator splitting techniques decouple complex optimisation problems into many smaller and simpler sub-problems. In this talk I will revise the variational segmentation problem and a common family of algorithms to solve such optimisation problems. I will show that operator splitting leads to a divide-and-conquer strategy that allows to derive simple and massively parallel updates suitable for GPU implementations. The technique decouples the likelihood from the prior term and allows to use a data-driven model estimating the likelihood from data, using for example deep learning. Using a different decoupling strategy together with general consensus optimisation leads to fully distributed algorithms especially suitable for large-scale segmentation problems. Motivating applications are 3d yeast-cell reconstruction and segmentation of histology data.

Organizers: Benjamin Coors


Learning Complex Robot-Environment Interactions

Talk
  • 26 October 2017 • 11:00 12:15
  • Jens Kober
  • AMD meeting room

The acquisition and self-improvement of novel motor skills is among the most important problems in robotics. Reinforcement learning and imitation learning are two different but complimentary machine learning approaches commonly used for learning motor skills.

Organizers: Dieter Büchler


Modern Optimization for Structured Machine Learning

IS Colloquium
  • 23 October 2017 • 11:15 12:15
  • Simon Lacoste-Julien
  • IS Lecture Hall

Machine learning has become a popular application domain for modern optimization techniques, pushing its algorithmic frontier. The need for large scale optimization algorithms which can handle millions of dimensions or data points, typical for the big data era, have brought a resurgence of interest for first order algorithms, making us revisit the venerable stochastic gradient method [Robbins-Monro 1951] as well as the Frank-Wolfe algorithm [Frank-Wolfe 1956]. In this talk, I will review recent improvements on these algorithms which can exploit the structure of modern machine learning approaches. I will explain why the Frank-Wolfe algorithm has become so popular lately; and present a surprising tweak on the stochastic gradient method which yields a fast linear convergence rate. Motivating applications will include weakly supervised video analysis and structured prediction problems.

Organizers: Philipp Hennig


  • Arunkumar Byravan
  • AMD meeting room

The ability to predict how an environment changes based on forces applied to it is fundamental for a robot to achieve specific goals. Traditionally in robotics, this problem is addressed through the use of pre-specified models or physics simulators, taking advantage of prior knowledge of the problem structure. While these models are general and have broad applicability, they depend on accurate estimation of model parameters such as object shape, mass, friction etc. On the other hand, learning based methods such as Predictive State Representations or more recent deep learning approaches have looked at learning these models directly from raw perceptual information in a model-free manner. These methods operate on raw data without any intermediate parameter estimation, but lack the structure and generality of model-based techniques. In this talk, I will present some work that tries to bridge the gap between these two paradigms by proposing a specific class of deep visual dynamics models (SE3-Nets) that explicitly encode strong physical and 3D geometric priors (specifically, rigid body dynamics) in their structure. As opposed to traditional deep models that reason about dynamics/motion a pixel level, we show that the physical priors implicit in our network architectures enable them to reason about dynamics at the object level - our network learns to identify objects in the scene and to predict rigid body rotation and translation per object. I will present results on applying our deep architectures to two specific problems: 1) Modeling scene dynamics where the task is to predict future depth observations given the current observation and an applied action and 2) Real-time visuomotor control of a Baxter manipulator based only on raw depth data. We show that: 1) Our proposed architectures significantly outperform baseline deep models on dynamics modelling and 2) Our architectures perform comparably or better than baseline models for visuomotor control while operating at camera rates (30Hz) and relying on far less information.

Organizers: Franzi Meier