Header logo is


2020


Event-triggered Learning
Event-triggered Learning

Solowjow, F., Trimpe, S.

Automatica, 117, Elsevier, July 2020 (article)

ics

arXiv PDF DOI Project Page [BibTex]

2020


arXiv PDF DOI Project Page [BibTex]


Bayesian Optimization in Robot Learning - Automatic Controller Tuning and Sample-Efficient Methods
Bayesian Optimization in Robot Learning - Automatic Controller Tuning and Sample-Efficient Methods

Marco-Valle, A.

University of Tübingen, June 2020 (thesis)

Abstract
The problem of designing controllers to regulate dynamical systems has been studied by engineers during the past millennia. Ever since, suboptimal performance lingers in many closed loops as an unavoidable side effect of manually tuning the parameters of the controllers. Nowadays, industrial settings remain skeptic about data-driven methods that allow one to automatically learn controller parameters. In the context of robotics, machine learning (ML) keeps growing its influence on increasing autonomy and adaptability, for example to aid automating controller tuning. However, data-hungry ML methods, such as standard reinforcement learning, require a large number of experimental samples, prohibitive in robotics, as hardware can deteriorate and break. This brings about the following question: Can manual controller tuning, in robotics, be automated by using data-efficient machine learning techniques? In this thesis, we tackle the question above by exploring Bayesian optimization (BO), a data-efficient ML framework, to buffer the human effort and side effects of manual controller tuning, while retaining a low number of experimental samples. We focus this work in the context of robotic systems, providing thorough theoretical results that aim to increase data-efficiency, as well as demonstrations in real robots. Specifically, we present four main contributions. We first consider using BO to replace manual tuning in robotic platforms. To this end, we parametrize the design weights of a linear quadratic regulator (LQR) and learn its parameters using an information-efficient BO algorithm. Such algorithm uses Gaussian processes (GPs) to model the unknown performance objective. The GP model is used by BO to suggest controller parameters that are expected to increment the information about the optimal parameters, measured as a gain in entropy. The resulting “automatic LQR tuning” framework is demonstrated on two robotic platforms: A robot arm balancing an inverted pole and a humanoid robot performing a squatting task. In both cases, an existing controller is automatically improved in a handful of experiments without human intervention. BO compensates for data scarcity by means of the GP, which is a probabilistic model that encodes prior assumptions about the unknown performance objective. Usually, incorrect or non-informed assumptions have negative consequences, such as higher number of robot experiments, poor tuning performance or reduced sample-efficiency. The second to fourth contributions presented herein attempt to alleviate this issue. The second contribution proposes to include the robot simulator into the learning loop as an additional information source for automatic controller tuning. While doing a real robot experiment generally entails high associated costs (e.g., require preparation and take time), simulations are cheaper to obtain (e.g., they can be computed faster). However, because the simulator is an imperfect model of the robot, its information is biased and could have negative repercussions in the learning performance. To address this problem, we propose “simu-vs-real”, a principled multi-fidelity BO algorithm that trades off cheap, but inaccurate information from simulations with expensive and accurate physical experiments in a cost-effective manner. The resulting algorithm is demonstrated on a cart-pole system, where simulations and real experiments are alternated, thus sparing many real evaluations. The third contribution explores how to adequate the expressiveness of the probabilistic prior to the control problem at hand. To this end, the mathematical structure of LQR controllers is leveraged and embedded into the GP, by means of the kernel function. Specifically, we propose two different “LQR kernel” designs that retain the flexibility of Bayesian nonparametric learning. Simulated results indicate that the LQR kernel yields superior performance than non-informed kernel choices when used for controller learning with BO. Finally, the fourth contribution specifically addresses the problem of handling controller failures, which are typically unavoidable in practice while learning from data, specially if non-conservative solutions are expected. Although controller failures are generally problematic (e.g., the robot has to be emergency-stopped), they are also a rich information source about what should be avoided. We propose “failures-aware excursion search”, a novel algorithm for Bayesian optimization under black-box constraints, where failures are limited in number. Our results in numerical benchmarks indicate that by allowing a confined number of failures, better optima are revealed as compared with state-of-the-art methods. The first contribution of this thesis, “automatic LQR tuning”, lies among the first on applying BO to real robots. While it demonstrated automatic controller learning from few experimental samples, it also revealed several important challenges, such as the need of higher sample-efficiency, which opened relevant research directions that we addressed through several methodological contributions. Summarizing, we proposed “simu-vs-real”, a novel BO algorithm that includes the simulator as an additional information source, an “LQR kernel” design that learns faster than standard choices and “failures-aware excursion search”, a new BO algorithm for constrained black-box optimization problems, where the number of failures is limited.

ics

Repository (Universitätsbibliothek) - University of Tübingen PDF DOI [BibTex]


Data-efficient Auto-tuning with Bayesian Optimization: An Industrial Control Study
Data-efficient Auto-tuning with Bayesian Optimization: An Industrial Control Study

Neumann-Brosig, M., Marco, A., Schwarzmann, D., Trimpe, S.

IEEE Transactions on Control Systems Technology, 28(3):730-740, May 2020 (article)

Abstract
Bayesian optimization is proposed for automatic learning of optimal controller parameters from experimental data. A probabilistic description (a Gaussian process) is used to model the unknown function from controller parameters to a user-defined cost. The probabilistic model is updated with data, which is obtained by testing a set of parameters on the physical system and evaluating the cost. In order to learn fast, the Bayesian optimization algorithm selects the next parameters to evaluate in a systematic way, for example, by maximizing information gain about the optimum. The algorithm thus iteratively finds the globally optimal parameters with only few experiments. Taking throttle valve control as a representative industrial control example, the proposed auto-tuning method is shown to outperform manual calibration: it consistently achieves better performance with a low number of experiments. The proposed auto-tuning framework is flexible and can handle different control structures and objectives.

ics

arXiv (PDF) DOI Project Page [BibTex]

arXiv (PDF) DOI Project Page [BibTex]


no image
Automatic Discovery of Interpretable Planning Strategies

Skirzyński, J., Becker, F., Lieder, F.

May 2020 (article) Submitted

Abstract
When making decisions, people often overlook critical information or are overly swayed by irrelevant information. A common approach to mitigate these biases is to provide decisionmakers, especially professionals such as medical doctors, with decision aids, such as decision trees and flowcharts. Designing effective decision aids is a difficult problem. We propose that recently developed reinforcement learning methods for discovering clever heuristics for good decision-making can be partially leveraged to assist human experts in this design process. One of the biggest remaining obstacles to leveraging the aforementioned methods for improving human decision-making is that the policies they learn are opaque to people. To solve this problem, we introduce AI-Interpret: a general method for transforming idiosyncratic policies into simple and interpretable descriptions. Our algorithm combines recent advances in imitation learning and program induction with a new clustering method for identifying a large subset of demonstrations that can be accurately described by a simple, high-performing decision rule. We evaluate our new AI-Interpret algorithm and employ it to translate information-acquisition policies discovered through metalevel reinforcement learning. The results of three large behavioral experiments showed that the provision of decision rules as flowcharts significantly improved people’s planning strategies and decisions across three different classes of sequential decision problems. Furthermore, a series of ablation studies confirmed that our AI-Interpret algorithm was critical to the discovery of interpretable decision rules and that it is ready to be applied to other reinforcement learning problems. We conclude that the methods and findings presented in this article are an important step towards leveraging automatic strategy discovery to improve human decision-making.

re

Automatic Discovery of Interpretable Planning Strategies The code for our algorithm and the experiments is available [BibTex]


no image
Advancing Rational Analysis to the Algorithmic Level

Lieder, F., Griffiths, T. L.

Behavioral and Brain Sciences, 43, E27, March 2020 (article)

Abstract
The commentaries raised questions about normativity, human rationality, cognitive architectures, cognitive constraints, and the scope or resource rational analysis (RRA). We respond to these questions and clarify that RRA is a methodological advance that extends the scope of rational modeling to understanding cognitive processes, why they differ between people, why they change over time, and how they could be improved.

re

Advancing rational analysis to the algorithmic level DOI [BibTex]

Advancing rational analysis to the algorithmic level DOI [BibTex]


no image
Learning to Overexert Cognitive Control in a Stroop Task

Bustamante, L., Lieder, F., Musslick, S., Shenhav, A., Cohen, J.

Febuary 2020, Laura Bustamante and Falk Lieder contributed equally to this publication. (article) In revision

Abstract
How do people learn when to allocate how much cognitive control to which task? According to the Learned Value of Control (LVOC) model, people learn to predict the value of alternative control allocations from features of a given situation. This suggests that people may generalize the value of control learned in one situation to other situations with shared features, even when the demands for cognitive control are different. This makes the intriguing prediction that what a person learned in one setting could, under some circumstances, cause them to misestimate the need for, and potentially over-exert control in another setting, even if this harms their performance. To test this prediction, we had participants perform a novel variant of the Stroop task in which, on each trial, they could choose to either name the color (more control-demanding) or read the word (more automatic). However only one of these tasks was rewarded, it changed from trial to trial, and could be predicted by one or more of the stimulus features (the color and/or the word). Participants first learned colors that predicted the rewarded task. Then they learned words that predicted the rewarded task. In the third part of the experiment, we tested how these learned feature associations transferred to novel stimuli with some overlapping features. The stimulus-task-reward associations were designed so that for certain combinations of stimuli the transfer of learned feature associations would incorrectly predict that more highly rewarded task would be color naming, which would require the exertion of control, even though the actually rewarded task was word reading and therefore did not require the engagement of control. Our results demonstrated that participants over-exerted control for these stimuli, providing support for the feature-based learning mechanism described by the LVOC model.

re

Learning to Overexert Cognitive Control in a Stroop Task DOI [BibTex]

Learning to Overexert Cognitive Control in a Stroop Task DOI [BibTex]


no image
Sliding Mode Control with Gaussian Process Regression for Underwater Robots

Lima, G. S., Trimpe, S., Bessa, W. M.

Journal of Intelligent & Robotic Systems, January 2020 (article)

ics

DOI [BibTex]

DOI [BibTex]


Hierarchical Event-triggered Learning for Cyclically Excited Systems with Application to Wireless Sensor Networks
Hierarchical Event-triggered Learning for Cyclically Excited Systems with Application to Wireless Sensor Networks

Beuchert, J., Solowjow, F., Raisch, J., Trimpe, S., Seel, T.

IEEE Control Systems Letters, 4(1):103-108, January 2020 (article)

ics

arXiv PDF DOI Project Page [BibTex]

arXiv PDF DOI Project Page [BibTex]


Toward a Formal Theory of Proactivity
Toward a Formal Theory of Proactivity

Lieder, F., Iwama, G.

January 2020 (article) Submitted

Abstract
Beyond merely reacting to their environment and impulses, people have the remarkable capacity to proactively set and pursue their own goals. But the extent to which they leverage this capacity varies widely across people and situations. The goal of this article is to make the mechanisms and variability of proactivity more amenable to rigorous experiments and computational modeling. We proceed in three steps. First, we develop and validate a mathematically precise behavioral measure of proactivity and reactivity that can be applied across a wide range of experimental paradigms. Second, we propose a formal definition of proactivity and reactivity, and develop a computational model of proactivity in the AX Continuous Performance Task (AX-CPT). Third, we develop and test a computational-level theory of meta-control over proactivity in the AX-CPT that identifies three distinct meta-decision-making problems: intention setting, resolving response conflict between intentions and automaticity, and deciding whether to recall context and intentions into working memory. People's response frequencies in the AX-CPT were remarkably well captured by a mixture between the predictions of our models of proactive and reactive control. Empirical data from an experiment varying the incentives and contextual load of an AX-CPT confirmed the predictions of our meta-control model of individual differences in proactivity. Our results suggest that proactivity can be understood in terms of computational models of meta-control. Our model makes additional empirically testable predictions. Future work will extend our models from proactive control in the AX-CPT to proactive goal creation and goal pursuit in the real world.

re

Toward a formal theory of proactivity DOI Project Page [BibTex]


Control-guided Communication: Efficient Resource Arbitration and Allocation in Multi-hop Wireless Control Systems
Control-guided Communication: Efficient Resource Arbitration and Allocation in Multi-hop Wireless Control Systems

Baumann, D., Mager, F., Zimmerling, M., Trimpe, S.

IEEE Control Systems Letters, 4(1):127-132, January 2020 (article)

ics

arXiv PDF DOI [BibTex]

arXiv PDF DOI [BibTex]


no image
Analytical classical density functionals from an equation learning network

Lin, S., Martius, G., Oettel, M.

The Journal of Chemical Physics, 152(2):021102, 2020, arXiv preprint \url{https://arxiv.org/abs/1910.12752} (article)

al

Preprint_PDF DOI [BibTex]

Preprint_PDF DOI [BibTex]


Spatial Scheduling of Informative Meetings for Multi-Agent Persistent Coverage
Spatial Scheduling of Informative Meetings for Multi-Agent Persistent Coverage

Haksar, R. N., Trimpe, S., Schwager, M.

IEEE Robotics and Automation Letters, 2020 (article) Accepted

ics

DOI [BibTex]

DOI [BibTex]


Safe and Fast Tracking on a Robot Manipulator: Robust MPC and Neural Network Control
Safe and Fast Tracking on a Robot Manipulator: Robust MPC and Neural Network Control

Nubert, J., Koehler, J., Berenz, V., Allgower, F., Trimpe, S.

IEEE Robotics and Automation Letters, 2020 (article) Accepted

Abstract
Fast feedback control and safety guarantees are essential in modern robotics. We present an approach that achieves both by combining novel robust model predictive control (MPC) with function approximation via (deep) neural networks (NNs). The result is a new approach for complex tasks with nonlinear, uncertain, and constrained dynamics as are common in robotics. Specifically, we leverage recent results in MPC research to propose a new robust setpoint tracking MPC algorithm, which achieves reliable and safe tracking of a dynamic setpoint while guaranteeing stability and constraint satisfaction. The presented robust MPC scheme constitutes a one-layer approach that unifies the often separated planning and control layers, by directly computing the control command based on a reference and possibly obstacle positions. As a separate contribution, we show how the computation time of the MPC can be drastically reduced by approximating the MPC law with a NN controller. The NN is trained and validated from offline samples of the MPC, yielding statistical guarantees, and used in lieu thereof at run time. Our experiments on a state-of-the-art robot manipulator are the first to show that both the proposed robust and approximate MPC schemes scale to real-world robotic systems.

am ics

arXiv PDF DOI [BibTex]

arXiv PDF DOI [BibTex]

2016


A New Perspective and Extension of the Gaussian Filter
A New Perspective and Extension of the Gaussian Filter

Wüthrich, M., Trimpe, S., Garcia Cifuentes, C., Kappler, D., Schaal, S.

The International Journal of Robotics Research, 35(14):1731-1749, December 2016 (article)

Abstract
The Gaussian Filter (GF) is one of the most widely used filtering algorithms; instances are the Extended Kalman Filter, the Unscented Kalman Filter and the Divided Difference Filter. The GF represents the belief of the current state by a Gaussian distribution, whose mean is an affine function of the measurement. We show that this representation can be too restrictive to accurately capture the dependences in systems with nonlinear observation models, and we investigate how the GF can be generalized to alleviate this problem. To this end, we view the GF as the solution to a constrained optimization problem. From this new perspective, the GF is seen as a special case of a much broader class of filters, obtained by relaxing the constraint on the form of the approximate posterior. On this basis, we outline some conditions which potential generalizations have to satisfy in order to maintain the computational efficiency of the GF. We propose one concrete generalization which corresponds to the standard GF using a pseudo measurement instead of the actual measurement. Extending an existing GF implementation in this manner is trivial. Nevertheless, we show that this small change can have a major impact on the estimation accuracy.

am ics

PDF DOI Project Page [BibTex]

2016


PDF DOI Project Page [BibTex]


no image
Event-based Sampling for Reducing Communication Load in Realtime Human Motion Analysis by Wireless Inertial Sensor Networks

Laidig, D., Trimpe, S., Seel, T.

Current Directions in Biomedical Engineering, 2(1):711-714, De Gruyter, 2016 (article)

am ics

PDF DOI [BibTex]

PDF DOI [BibTex]

2012


no image
The Balancing Cube: A Dynamic Sculpture as Test Bed for Distributed Estimation and Control

Trimpe, S., D’Andrea, R.

IEEE Control Systems Magazine, 32(6):48-75, December 2012 (article)

am ics

DOI [BibTex]

2012


DOI [BibTex]


no image
Burn-in, bias, and the rationality of anchoring

Lieder, F., Griffiths, T. L., Goodman, N. D.

Advances in Neural Information Processing Systems 25, pages: 2699-2707, 2012 (article)

Abstract
Bayesian inference provides a unifying framework for addressing problems in machine learning, artificial intelligence, and robotics, as well as the problems facing the human mind. Unfortunately, exact Bayesian inference is intractable in all but the simplest models. Therefore minds and machines have to approximate Bayesian inference. Approximate inference algorithms can achieve a wide range of time-accuracy tradeoffs, but what is the optimal tradeoff? We investigate time-accuracy tradeoffs using the Metropolis-Hastings algorithm as a metaphor for the mind's inference algorithm(s). We find that reasonably accurate decisions are possible long before the Markov chain has converged to the posterior distribution, i.e. during the period known as burn-in. Therefore the strategy that is optimal subject to the mind's bounded processing speed and opportunity costs may perform so few iterations that the resulting samples are biased towards the initial value. The resulting cognitive process model provides a rational basis for the anchoring-and-adjustment heuristic. The model's quantitative predictions are tested against published data on anchoring in numerical estimation tasks. Our theoretical and empirical results suggest that the anchoring bias is consistent with approximate Bayesian inference.

re

link (url) [BibTex]

link (url) [BibTex]


no image
Variants of guided self-organization for robot control

Martius, G., Herrmann, J.

Theory in Biosci., 131(3):129-137, Springer Berlin / Heidelberg, 2012 (article)

al

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers

Rolinek, M., Swoboda, P., Zietlow, D., Paulus, A., Musil, V., Martius, G.

Arxiv (article)

Abstract
Building on recent progress at the intersection of combinatorial optimization and deep learning, we propose an end-to-end trainable architecture for deep graph matching that contains unmodified combinatorial solvers. Using the presence of heavily optimized combinatorial solvers together with some improvements in architecture design, we advance state-of-the-art on deep graph matching benchmarks for keypoint correspondence. In addition, we highlight the conceptual advantages of incorporating solvers into deep learning architectures, such as the possibility of post-processing with a strong multi-graph matching solver or the indifference to changes in the training setting. Finally, we propose two new challenging experimental setups

al

Arxiv [BibTex]