Header logo is
Institute Talks

Automatic Understanding of the Visual World

Talk
  • 26 April 2018 • 11:00 12:00
  • Dr. Cordelia Schmid
  • N3.022

One of the central problems of artificial intelligence is machine perception, i.e., the ability to understand the visual world based on input from sensors such as cameras. In this talk, I will present recent progress with respect to data generation using weak annotations, motion information and synthetic data. I will also discuss our recent results for action recognition, where human tubes and tubelets have shown to be successful. Our tubelets moves away from state-of-the-art frame based approaches and improve classification and localization by relying on joint information from several frames. I also show how to extend this type of method to weakly supervised learning of actions, which allows us to scale to large amounts of data with sparse manual annotation. Furthermore, I discuss several recent extensions, including 3D pose estimation.

Organizers: Ahmed Osman

Constructing Artificial Characters - Traditional versus Deep Learning Approaches

Talk
  • 27 April 2018 • 16:30 17:30
  • JP Lewis
  • PS Aquarium, 3rd floor, north, MPI-IS

Over the past 15 years computer graphics characters have progressed to the point where they are occasionally indistinguishable from videos of real humans. Nevertheless, truly believable and photoreal characters generally require large teams of people and considerable time to construct. Is the field continuing to make progress, or have we reached an asymptote? Can deep learning replace traditional approaches to character construction? We will consider perspectives on these questions drawn from nearly two decades of research and algorithm development for character animation.

Organizers: Michael Black

Inference with Kernel Embeddings

Talk
  • 22 May 2017 • 11:00 12:15
  • Dino Sejdinovic

Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric hypothesis testing and for learning on distributional inputs. I will give an overview of this framework and present some of its recent applications within the context of approximate Bayesian inference. Further, I will discuss a recent modification of MMD which aims to encode invariance to additive symmetric noise and leads to learning on distributions robust to the distributional covariate shift, e.g. where measurement noise on the training data differs from that on the testing data.

Organizers: Philipp Hennig


Learning to segment moving objects

Talk
  • 19 May 2017 • 14:00 15:00
  • Cordelia Schmid
  • Greenhouse (PS)

This talk addresses the task of segmenting moving objects in unconstrained videos. We introduce a novel two-stream neural network with an explicit memory module to achieve this. The two streams of the network encode spatial and temporal features in a video sequence respectively, while the memory module captures the evolution of objects over time. The module to build a “visual memory” in video, i.e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences. Given video frames as input, our approach first assigns each pixel an object or background label obtained with an encoder-decoder network that takes as input optical flow and is trained on synthetic data. Next, a “visual memory” specific to the video is acquired automatically without any manually-annotated frames. The visual memory is implemented with convolutional gated recurrent units, which allows to propagate spatial information over time. We evaluate our method extensively on two benchmarks, DAVIS and Freiburg-Berkeley motion segmentation datasets, and show state-of-the-art results. This is joint work with K. Alahari and P. Tokmakov.

Organizers: Osman Ulusoy


  • Dr. Raj Madhavan
  • N2.025 (AMD seminar room - 2nd floor)

Many of the existing Robotics & Automation (R&A) technologies are at a sufficient level of maturity and are widely accepted by the academic (and to a lesser extent by the industrial) community after having undergone the scientific rigor and peer reviews that accompany such works. I believe that most of the past and current research and development efforts in robotics and automation have been squarely aimed at increasing the Standard of Living (SoL) in developed economies where housing, running water, transportation, schools, access to healthcare, to name a few, are taken for granted. Humanitarian R&A, on the other hand, can be taken to mean technologies that can make a fundamental difference in people’s lives by alleviating their suffering in times of need, such as during natural or man-made disasters or in pockets of the population where the most basic needs of humanity are not met, thus improving their Quality of Life (QoL) and not just SoL. My current work focuses on the applied use of robotics and automation technologies for the benefit of under-served and under-developed communities by working closely with them to develop solutions that showcase the effectiveness of R&A solutions in domains that strike a chord with the beneficiaries. This is made possible by bringing together researchers, practitioners from industry, academia, local governments, and various entities such as the IEEE Robotics Automation Society’s Special Interest Group on Humanitarian Technology (RAS-SIGHT), NGOs, and NPOs across the globe. I will share some of my efforts and thoughts on challenges that need to be taken into consideration including sustainability of developed solutions. I will also outline my recent efforts in the technology and public policy domains with emphasis on socio-economic, cultural, privacy, and security issues in developing and developed economies.

Organizers: Ludovic Righetti


Biquadratic Forms and Semi-Definite Relaxations

Talk
  • 11 May 2017 • 10:30 11:00
  • Carolin Schmitt
  • PS Green House

I'll present my master thesis "Biquadratic Forms and Semi-Definite Relaxations". It is about biquadratic optimization programs (which are NP-hard generally) and examines a condition under which there exists an algorithm that finds a solution to every instance of the problem in polynomial time. I'll present a counterexample for which this is not possible generally and face the question of what happens if further knowledge about the variables over which we optimise is applied.

Organizers: Fatma Güney


Graph Decomposition Problems in Image Analysis

Talk
  • 08 May 2017 • 11:00 12:00
  • Björn Andres
  • N3.022

A large part of image analysis is about breaking things into pieces. Decompositions of a graph are a mathematical abstraction of the possible outcomes. This talk is about optimization problems whose feasible solutions define decompositions of a graph. One example is the correlation clustering problem whose feasible solutions relate one-to-one to the decompositions of a graph, and whose objective function puts a cost or reward on neighboring nodes ending up in distinct components. This talk shows applications of this problem and proposed generalizations to diverse image analysis tasks. It sketches algorithms for finding feasible solutions for large instances in practice, solutions that are often superior in the metrics of application-specific benchmarks. It also sketches algorithms for finding lower bounds and points to new findings and open problems of polyhedral geometry in this context.

Organizers: Christoph Lassner


  • Rahul Chaudhari and David Gueorguiev
  • N2.025

Colloquium on haptics: Two guests of the department "Haptic Intelligence" (Dept. Kuchenbecker), will each give a short talk this Friday (May 5) in Tübingen. The talks will be broadcasted to Stuttgart, room 2 P4.


Learning from Synthetic Humans

Talk
  • 04 May 2017 • 15:00 16:00
  • Gul Varol
  • N3.022 (Greenhouse)

Estimating human pose, shape, and motion from images and video are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL: a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data.

Organizers: Dimitris Tzionas


  • Sylvain Calinon
  • N2.025

Human-centric robotic applications often require the robots to learn new skills by interacting with the end-users. From a machine learning perspective, the challenge is to acquire skills from only few interactions, with strong generalization demands. It requires: 1) the development of intuitive active learning interfaces to acquire meaningful demonstrations; 2) the development of models that can exploit the structure and geometry of the acquired data in an efficient way; 3) the development of adaptive control techniques that can exploit the learned task variations and coordination patterns. The developed models often need to serve several purposes (recognition, prediction, online synthesis), and be compatible with different learning strategies (imitation, emulation, exploration). For the reproduction of skills, these models need to be enriched with force and impedance information to enable human-robot collaboration and to generate safe and natural movements. I will present an approach combining model predictive control and statistical learning of movement primitives in multiple coordinate systems. The proposed approach will be illustrated in various applications, with robots either close to us (robot for dressing assistance), part of us (prosthetic hand with EMG and tactile sensing), or far from us (teleoperation of bimanual robot in deep water).

Organizers: Ludovic Righetti


Multi-contact locomotion control for legged robots

Talk
  • 25 April 2017 • 11:00 12:30
  • Dr. Andrea Del Prete
  • N2.025 (AMD seminar room - 2nd floor)

This talk will survey recent work to achieve multi-contact locomotion control of humanoid and legged robots. I will start by presenting some results on robust optimization-based control. We exploited robust optimization techniques, either stochastic or worst-case, to improve the robustness of Task-Space Inverse Dynamics (TSID), a well-known control framework for legged robots. We modeled uncertainties in the joint torques, and we immunized the constraints of the system to any of the realizations of these uncertainties. We also applied the same methodology to ensure the balance of the robot despite bounded errors in the its inertial parameters. Extensive simulations in a realistic environment show that the proposed robust controllers greatly outperform the classic one. Then I will present preliminary results on a new capturability criterion for legged robots in multi-contact. "N-step capturability" is the ability of a system to come to a stop by taking N or fewer steps. Simplified models to compute N-step capturability already exist and are widely used, but they are limited to locomotion on flat terrains. We propose a new efficient algorithm to compute 0-step capturability for a robot in arbitrary contact scenarios. Finally, I will present our recent efforts to transfer the above-mentioned techniques to the real humanoid robot HRP-2, on which we recently implemented joint torque control.

Organizers: Ludovic Righetti


  • Philipp Berens
  • tba

The retina in the eye performs complex computations, to transmit only behaviourally relevant information about our visual environment to the brain. These computations are implemented by numerous different cell types that form complex circuits. New experimental and computational methods make it possible to study the cellular diversity of the retina in detail – the goal of obtaining a complete list of all the cell types in the retina and, thus, its “building blocks”, is within reach. I will review our recent contributions in this area, showing how analyzing multimodal datasets from electron microscopy and functional imaging can yield insights into the cellular organization of retinal circuits.

Organizers: Philipp Hennig