Header logo is
Institute Talks

Learning to Act with Confidence

Talk
  • 23 October 2018 • 12:00 13:00
  • Andreas Krause
  • MPI-IS Tübingen, N0.002

Actively acquiring decision-relevant information is a key capability of intelligent systems, and plays a central role in the scientific process. In this talk I will present research from my group on this topic at the intersection of statistical learning, optimization and decision making. In particular, I will discuss how statistical confidence bounds can guide data acquisition in a principled way to make effective and reliable decisions in a variety of complex domains. I will also discuss several applications, ranging from autonomously guiding wetlab experiments in protein function optimization to safe exploration in robotics.

Control Systems for a Surgical Robot on the Space Station

IS Colloquium
  • 23 October 2018 • 16:30 17:30
  • Chris Macnab
  • MPI-IS Stuttgart, Heisenbergstr. 3, Room 2P4

As part of a proposed design for a surgical robot on the space station, my research group has been asked to look at controls that can provide literally surgical precision. Due to excessive time delay, we envision a system with a local model being controlled by a surgeon while the remote system on the space station follows along in a safe manner. Two of the major design considerations that come into play for the low-level feedback loops on the remote side are 1) the harmonic drives in a robot will cause excessive vibrations in a micro-gravity environment unless active damping strategies are employed and 2) when interacting with a human tissue environment the robot must apply smooth control signals that result in precise positions and forces. Thus, we envision intelligent strategies that utilize nonlinear, adaptive, neural-network, and/or fuzzy control theory as the most suitable. However, space agencies, or their engineering sub-contractors, typically provide gain and phase margin characteristics as requirements to the engineers involved in a control system design, which are normally associated with PID or other traditional linear control schemes. We are currently endeavouring to create intelligent controls that have guaranteed gain and phase margins using the Cerebellar Model Articulation Controller.

Organizers: Katherine Kuchenbecker

Artificial haptic intelligence for human-machine systems

IS Colloquium
  • 24 October 2018 • 11:00 12:00
  • Veronica J. Santos
  • 5H7 at MPI-IS in Stuttgart

The functionality of artificial manipulators could be enhanced by artificial “haptic intelligence” that enables the identification of object features via touch for semi-autonomous decision-making and/or display to a human operator. This could be especially useful when complementary sensory modalities, such as vision, are unavailable. I will highlight past and present work to enhance the functionality of artificial hands in human-machine systems. I will describe efforts to develop multimodal tactile sensor skins, and to teach robots how to haptically perceive salient geometric features such as edges and fingertip-sized bumps and pits using machine learning techniques. I will describe the use of reinforcement learning to teach robots goal-based policies for a functional contour-following task: the closure of a ziplock bag. Our Contextual Multi-Armed Bandits approach tightly couples robot actions to the tactile and proprioceptive consequences of the actions, and selects future actions based on prior experiences, the current context, and a functional task goal. Finally, I will describe current efforts to develop real-time capabilities for the perception of tactile directionality, and to develop models for haptically locating objects buried in granular media. Real-time haptic perception and decision-making capabilities could be used to advance semi-autonomous robot systems and reduce the cognitive burden on human teleoperators of devices ranging from wheelchair-mounted robots to explosive ordnance disposal robots.

Organizers: Katherine Kuchenbecker

Artificial haptic intelligence for human-machine systems

IS Colloquium
  • 25 October 2018 • 11:00 11:00
  • Veronica J. Santos
  • N2.025 at MPI-IS in Tübingen

The functionality of artificial manipulators could be enhanced by artificial “haptic intelligence” that enables the identification of object features via touch for semi-autonomous decision-making and/or display to a human operator. This could be especially useful when complementary sensory modalities, such as vision, are unavailable. I will highlight past and present work to enhance the functionality of artificial hands in human-machine systems. I will describe efforts to develop multimodal tactile sensor skins, and to teach robots how to haptically perceive salient geometric features such as edges and fingertip-sized bumps and pits using machine learning techniques. I will describe the use of reinforcement learning to teach robots goal-based policies for a functional contour-following task: the closure of a ziplock bag. Our Contextual Multi-Armed Bandits approach tightly couples robot actions to the tactile and proprioceptive consequences of the actions, and selects future actions based on prior experiences, the current context, and a functional task goal. Finally, I will describe current efforts to develop real-time capabilities for the perception of tactile directionality, and to develop models for haptically locating objects buried in granular media. Real-time haptic perception and decision-making capabilities could be used to advance semi-autonomous robot systems and reduce the cognitive burden on human teleoperators of devices ranging from wheelchair-mounted robots to explosive ordnance disposal robots.

Organizers: Katherine Kuchenbecker Adam Spiers

TBA

IS Colloquium
  • 28 January 2019 • 3pm 4pm
  • Florian Marquardt

Organizers: Matthias Bauer

  • Karl Abson
  • MRZ Seminar Room

Motion capture and data driven technologies have come very far over the past few years. In terms of human capture the high volume of research that has gone into this sub group has led to very impressive results. Human motion can now be captured in real time which when used in the creative sectors can lead to blockbuster films such as Avatar. Similarly in the medical sectors these techniques can be used to diagnose, analyse performance and avoid invasive procedures in tasks such as deformity correction. There is, however, very little research on motion capture of animals. While the technology for capturing animal motion exists, the method used is inefficient, unreliable and limited, as much manual work is required to turn blocked out motions into acceptable results. How we move forward with a suitable procedure however is the major question. Do we extend the life of marker based capture or do we move towards the holy grail of markerless tracking? In this talk we look at a possible solution suitable for both possibilities through physically based simulation techniques. It is our belief that such techniques could help cross the gap in the uncanny valley as far as marker based capture is concerned but also be useful as far as markerless tracking is concerned.


Discriminative Non-blind Deblurring

Talk
  • 03 June 2013 • 13:00:00
  • Uwe Schmidt
  • MRZ seminar

Non-blind deblurring is an integral component of blind approaches for removing image blur due to camera shake. Even though learning-based deblurring methods exist, they have been limited to the generative case and are computationally expensive. To this date, manually-defined models are thus most widely used, though limiting the attained restoration quality. We address this gap by proposing a discriminative approach for non-blind deblurring. One key challenge is that the blur kernel in use at test time is not known in advance. To address this, we analyze existing approaches that use half-quadratic regularization. From this analysis, we derive a discriminative model cascade for image deblurring. Our cascade model consists of a Gaussian CRF at each stage, based on the recently introduced regression tree fields. We train our model by loss minimization and use synthetically generated blur kernels to generate training data. Our experiments show that the proposed approach is efficient and yields state-of-the-art restoration quality on images corrupted with synthetic and real blur.


Interactive Variational Shape Modeling

Talk
  • 27 May 2013 • 11:15:00
  • Olga Sorkine-Hornung
  • Max Planck Haus Lecture Hall

Irregular triangle meshes are a powerful digital shape representation: they are flexible and can represent virtually any complex shape; they are efficiently rendered by graphics hardware; they are the standard output of 3D acquisition and routinely used as input to simulation software. Yet irregular meshes are difficult to model and edit because they lack a higher-level control mechanism. In this talk, I will survey a series of research results on surface modeling with meshes and show how high-quality shapes can be manipulated in a fast and intuitive manner. I will outline the current challenges in intelligent and more user-friendly modeling metaphors and will attempt to suggest possible directions for future work in this area.


3D vision in a changing world

Talk
  • 17 May 2013 • 09:15:00
  • Andrew Fitzgibbon
  • MPH Lecture Hall

3D reconstruction from images has been a tremendous success-story of computer vision, with city-scale reconstruction now a reality.   However, these successes apply almost exclusively in a static world, where the only motion is that of the camera.  Even with the advent of realtime depth cameras, full 3D modelling of dynamic scenes lags behind the rigid-scene case, and for many objects of interest (e.g. animals moving in natural environments), depth sensing remains challenging.  In this talk, I will discuss a range of recent work in the modelling of nonrigid real-world 3D shape from 2D images, for example building generic animal models from internet photo collections.   While the state of the art depends heavily on dense point tracks from textured surfaces,  it is rare to find suitably textured surfaces: most animals are limited in texture (think of dogs, cats, cows, horses, …). I will show how this assumption can be relaxed by incorporating the strong constraints given by the object’s silhouette.
 


  • Gerard Pons-Moll
  • MPH Lecture Hall

Significant progress has been made over the last years in estimating people's shape and motion from video and nonetheless the problem still remains unsolved. This is especially true in uncontrolled environments such as people in the streets or the office where background clutter and occlusions make the problem even more challenging.
The goal of our research is to develop computational methods that enable human pose estimation from video and inertial sensors in indoor and outdoor environments. Specifically, I will focus on one of our past projects in which we introduce a hybrid Human Motion Capture system that combines video input with sparse inertial sensor input. Employing a particle-based optimization scheme, our idea is to use orientation cues derived from the inertial input to sample particles from the manifold of valid poses. Additionally, we introduce a novel sensor noise model to account for uncertainties based on the von Mises-Fisher distribution. Doing so, orientation constraints are naturally fulfilled and the number of needed particles can be kept very small. More generally, our method can be used to sample poses that fulfill arbitrary orientation or positional kinematic constraints. In the experiments, we show that our system can track even highly dynamic motions in an outdoor environment with changing illumination, background clutter, and shadows.


What Make Big Visual Data Hard?

Talk
  • 29 April 2013 • 13:15:00
  • Alexei Efros
  • MPH Lecture Hall

There are an estimated 3.5 trillion photographs in the world, of which 10% have been taken in the past 12 months. Facebook alone reports 6 billion photo uploads per month. Every minute, 72 hours of video are uploaded to YouTube. Cisco estimates that in the next few years, visual data (photos and video) will account for over 85% of total internet traffic. Yet, we currently lack effective computational methods for making sense of all this mass of visual data. Unlike easily indexed content, such as text, visual content is not routinely searched or mined; it's not even hyperlinked. Visual data is Internet's "digital dark matter" [Perona,2010] -- it's just sitting there!

In this talk, I will first discuss some of the unique challenges that make Big Visual Data difficult compared to other types of content. In particular, I will argue that the central problem is the lack a good measure of similarity for visual data. I will then present some of our recent work that aims to address this challenge in the context of visual matching, image retrieval and visual data mining. As an application of the latter, we used Google Street View data for an entire city in an attempt to answer that age-old question which has been vexing poets (and poets-turned-geeks): "What makes Paris look like Paris?"


  • Cristobal Curio

Studying the interface between artificial and biological vision has been an area of research that has been greatly promoted for a long time. It seems promising that cognitive science can provide new ideas to interface computer vision and human perception, yet no established design principles do exist. In the first part of my talk I am going to introduce the novel concept of 'object detectability'. Object detectability refers to a measure of how likely a human observer is visually aware of the location and presence of specific object types in a complex, dynamic, urban scene.

We have shown a proof of concept of how to maximize human observers' scene awareness in a dynamic driving context. Nonlinear functions are learnt from experimental samples of a combined feature vector of human gaze and visual features mapping to object detectabilities. We obtain object detectabilities through a detection experiment, simulating a proxy task of distracted real-world driving. In order to specifically enhance overall pedestrian detectability in a dynamic scene, the sum of individual detectability predictors defines a complex cost function that we seek to optimize with respect to human gaze. Results show significantly increased human scene awareness in hazardous test situations comparing optimized gaze and random fixation. Thus, our approach can potentially help a driver to save reaction time and resolve a risky maneuvre. In our framework, the remarkable ability of the human visual system to detect specific objects in the periphery has been implicitly characterized by our perceptual detectability task and has thus been taken into account.

The framework may provide a foundation for future work to determine what kind of information a Computer Vision system should process reliably, e.g. certain pose or motion features, in order to optimally alert a driver in time-critical situations. Dynamic image data was taken from the Caltech Pedestrian database. I will conclude with a brief overview of recent work, including a new circular output random regression forest for continuous object viewpoint estimation and a novel learning-based, monocular odometry approach based on robust LVMs and sensorimotor learning, offering stable 3D information integration. Last but not least, I present results of a perception experiment to quantify emotion in estimated facial movement synergy components that can be exploited to control emotional content of 3D avatars in a perceptually meaningful way.

This work was done in particular with David Engel (now a Post-Doc at M.I.T.), Christian Herdtweck (a PhD student at MPI Biol. Cybernetics), and in collaboration with Prof. Martin A. Giese and Dr. Enrico Chiovetto, Center for Integrated Neuroscience, Tübingen.


  • Oisin Mac Aodha

We present a supervised learning based method to estimate a per-pixel confidence for optical flow vectors. Regions of low texture and pixels close to occlusion boundaries are known to be difficult for optical flow algorithms. Using a spatiotemporal feature vector, we estimate if a flow algorithm is likely to fail in a given region.

Our method is not restricted to any specific class of flow algorithm, and does not make any scene specific assumptions. By automatically learning this confidence we can combine the output of several computed flow fields from different algorithms to select the best performing algorithm per pixel. Our optical flow confidence measure allows one to achieve better overall results by discarding the most troublesome pixels. We illustrate the effectiveness of our method on four different optical flow algorithms over a variety of real and synthetic sequences. For algorithm selection, we achieve the top overall results on a large test set, and at times even surpasses the results of the best algorithm among the candidates.


  • Andreas Müller

Semantic image segmentation is the task of assigning semantic labels to the pixels of a natural image. It is an important step towards general scene understanding and has lately received much attention in the computer vision community. It was found that detailed annotation of images are helpful for solving this task, but obtaining accurate and consistent annotations still proves to be difficult on a large scale. One possible way forward is to work with partial supervision and latent variable models to infer semantic annotations from the data during training.

The talk will present two approaches working with partial supervision for image segmentation. The first uses an efficient multi-instance formulation to obtain object class segmentations when trained on class labels alone. The second uses a latent CRF formulation to extract object parts based on object class segmentation.


From Particle Stereo to Scene Stereo

Talk
  • 25 February 2013
  • Carsten Rother

In this talk I will present two lines of research which are both applied to the problem of stereo matching. The first line of research tries to make progress on the very traditional problem of stereo matching. In BMVC 11 we presented the PatchmatchStereo work which achieves surprisingly good results with a simple energy function consisting of unary terms only. As optimization engine we used the PatchMatch method, which was designed for image editing purposes. In BMVC 12 we extended this work by adding to the energy function the standard pairwise smoothness terms. The main contribution of this work is the optimization technique, which we call PatchMatch-BeliefPropagation (PMBP). It is a special case of max-product Particle Belief Propagation, with a new sampling schema motivated by Patchmatch.

The method may be suitable for many energy minimization problems in computer vision, which have a non-convex, continuous and potentially high-dimensional label space. The second line of research combines the problem of stereo matching with the problem of object extracting in the scene. We show that both tasks can be solved jointly and boost the performance of each individual task. In particular, stereo matching improves since objects have to obey physical properties, e.g. they are not allowed to fly in the air. Object extracting improves, as expected, since we have additional information about depth in the scene.