Header logo is


2020


A little damping goes a long way: a simulation study of how damping influences task-level stability in running
A little damping goes a long way: a simulation study of how damping influences task-level stability in running

Heim, S., Millard, M., Mouel, C. L., Badri-Spröwitz, A.

Biology Letters, 16(9), September 2020 (article)

Abstract
It is currently unclear if damping plays a functional role in legged locomotion, and simple models often do not include damping terms. We present a new model with a damping term that is isolated from other parameters: that is, the damping term can be adjusted without retuning other model parameters for nominal motion. We systematically compare how increased damping affects stability in the face of unexpected ground-height perturbations. Unlike most studies, we focus on task-level stability: instead of observing whether trajectories converge towards a nominal limit-cycle, we quantify the ability to avoid falls using a recently developed mathematical measure. This measure allows trajectories to be compared quantitatively instead of only being separated into a binary classification of ‘stable' or ‘unstable'. Our simulation study shows that increased damping contributes significantly to task-level stability; however, this benefit quickly plateaus after only a small amount of damping. These results suggest that the low intrinsic damping values observed experimentally may have stability benefits and are not simply minimized for energetic reasons. All Python code and data needed to generate our results are available open source.

dlg ics

link (url) DOI [BibTex]

2020


link (url) DOI [BibTex]


Optimal To-Do List Gamification
Optimal To-Do List Gamification

Stojcheski, J., Felso, V., Lieder, F.

arXiv, August 2020 (techreport)

Abstract
What should I work on first? What can wait until later? Which projects should I prioritize and which tasks are not worth my time? These are challenging questions that many people face every day. People’s intuitive strategy is to prioritize their immediate experience over the long-term consequences. This leads to procrastination and the neglect of important long-term projects in favor of seemingly urgent tasks that are less important. Optimal gamification strives to help people overcome these problems by incentivizing each task by a number of points that communicates how valuable it is in the long-run. Unfortunately, computing the optimal number of points with standard dynamic programming methods quickly becomes intractable as the number of a person’s projects and the number of tasks required by each project increase. Here, we introduce and evaluate a scalable method for identifying which tasks are most important in the long run and incentivizing each task according to its long-term value. Our method makes it possible to create to-do list gamification apps that can handle the size and complexity of people’s to-do lists in the real world.

re

link (url) Project Page [BibTex]


Event-triggered Learning
Event-triggered Learning

Solowjow, F., Trimpe, S.

Automatica, 117, Elsevier, July 2020 (article)

ics

arXiv PDF DOI Project Page [BibTex]

arXiv PDF DOI Project Page [BibTex]


Bayesian Optimization in Robot Learning - Automatic Controller Tuning and Sample-Efficient Methods
Bayesian Optimization in Robot Learning - Automatic Controller Tuning and Sample-Efficient Methods

Marco-Valle, A.

University of Tübingen, June 2020 (thesis)

Abstract
The problem of designing controllers to regulate dynamical systems has been studied by engineers during the past millennia. Ever since, suboptimal performance lingers in many closed loops as an unavoidable side effect of manually tuning the parameters of the controllers. Nowadays, industrial settings remain skeptic about data-driven methods that allow one to automatically learn controller parameters. In the context of robotics, machine learning (ML) keeps growing its influence on increasing autonomy and adaptability, for example to aid automating controller tuning. However, data-hungry ML methods, such as standard reinforcement learning, require a large number of experimental samples, prohibitive in robotics, as hardware can deteriorate and break. This brings about the following question: Can manual controller tuning, in robotics, be automated by using data-efficient machine learning techniques? In this thesis, we tackle the question above by exploring Bayesian optimization (BO), a data-efficient ML framework, to buffer the human effort and side effects of manual controller tuning, while retaining a low number of experimental samples. We focus this work in the context of robotic systems, providing thorough theoretical results that aim to increase data-efficiency, as well as demonstrations in real robots. Specifically, we present four main contributions. We first consider using BO to replace manual tuning in robotic platforms. To this end, we parametrize the design weights of a linear quadratic regulator (LQR) and learn its parameters using an information-efficient BO algorithm. Such algorithm uses Gaussian processes (GPs) to model the unknown performance objective. The GP model is used by BO to suggest controller parameters that are expected to increment the information about the optimal parameters, measured as a gain in entropy. The resulting “automatic LQR tuning” framework is demonstrated on two robotic platforms: A robot arm balancing an inverted pole and a humanoid robot performing a squatting task. In both cases, an existing controller is automatically improved in a handful of experiments without human intervention. BO compensates for data scarcity by means of the GP, which is a probabilistic model that encodes prior assumptions about the unknown performance objective. Usually, incorrect or non-informed assumptions have negative consequences, such as higher number of robot experiments, poor tuning performance or reduced sample-efficiency. The second to fourth contributions presented herein attempt to alleviate this issue. The second contribution proposes to include the robot simulator into the learning loop as an additional information source for automatic controller tuning. While doing a real robot experiment generally entails high associated costs (e.g., require preparation and take time), simulations are cheaper to obtain (e.g., they can be computed faster). However, because the simulator is an imperfect model of the robot, its information is biased and could have negative repercussions in the learning performance. To address this problem, we propose “simu-vs-real”, a principled multi-fidelity BO algorithm that trades off cheap, but inaccurate information from simulations with expensive and accurate physical experiments in a cost-effective manner. The resulting algorithm is demonstrated on a cart-pole system, where simulations and real experiments are alternated, thus sparing many real evaluations. The third contribution explores how to adequate the expressiveness of the probabilistic prior to the control problem at hand. To this end, the mathematical structure of LQR controllers is leveraged and embedded into the GP, by means of the kernel function. Specifically, we propose two different “LQR kernel” designs that retain the flexibility of Bayesian nonparametric learning. Simulated results indicate that the LQR kernel yields superior performance than non-informed kernel choices when used for controller learning with BO. Finally, the fourth contribution specifically addresses the problem of handling controller failures, which are typically unavoidable in practice while learning from data, specially if non-conservative solutions are expected. Although controller failures are generally problematic (e.g., the robot has to be emergency-stopped), they are also a rich information source about what should be avoided. We propose “failures-aware excursion search”, a novel algorithm for Bayesian optimization under black-box constraints, where failures are limited in number. Our results in numerical benchmarks indicate that by allowing a confined number of failures, better optima are revealed as compared with state-of-the-art methods. The first contribution of this thesis, “automatic LQR tuning”, lies among the first on applying BO to real robots. While it demonstrated automatic controller learning from few experimental samples, it also revealed several important challenges, such as the need of higher sample-efficiency, which opened relevant research directions that we addressed through several methodological contributions. Summarizing, we proposed “simu-vs-real”, a novel BO algorithm that includes the simulator as an additional information source, an “LQR kernel” design that learns faster than standard choices and “failures-aware excursion search”, a new BO algorithm for constrained black-box optimization problems, where the number of failures is limited.

ics

Repository (Universitätsbibliothek) - University of Tübingen PDF DOI [BibTex]


Data-efficient Auto-tuning with Bayesian Optimization: An Industrial Control Study
Data-efficient Auto-tuning with Bayesian Optimization: An Industrial Control Study

Neumann-Brosig, M., Marco, A., Schwarzmann, D., Trimpe, S.

IEEE Transactions on Control Systems Technology, 28(3):730-740, May 2020 (article)

Abstract
Bayesian optimization is proposed for automatic learning of optimal controller parameters from experimental data. A probabilistic description (a Gaussian process) is used to model the unknown function from controller parameters to a user-defined cost. The probabilistic model is updated with data, which is obtained by testing a set of parameters on the physical system and evaluating the cost. In order to learn fast, the Bayesian optimization algorithm selects the next parameters to evaluate in a systematic way, for example, by maximizing information gain about the optimum. The algorithm thus iteratively finds the globally optimal parameters with only few experiments. Taking throttle valve control as a representative industrial control example, the proposed auto-tuning method is shown to outperform manual calibration: it consistently achieves better performance with a low number of experiments. The proposed auto-tuning framework is flexible and can handle different control structures and objectives.

ics

arXiv (PDF) DOI Project Page [BibTex]

arXiv (PDF) DOI Project Page [BibTex]


no image
Automatic Discovery of Interpretable Planning Strategies

Skirzyński, J., Becker, F., Lieder, F.

Machine Learning Journal, May 2020 (article) Submitted

Abstract
When making decisions, people often overlook critical information or are overly swayed by irrelevant information. A common approach to mitigate these biases is to provide decisionmakers, especially professionals such as medical doctors, with decision aids, such as decision trees and flowcharts. Designing effective decision aids is a difficult problem. We propose that recently developed reinforcement learning methods for discovering clever heuristics for good decision-making can be partially leveraged to assist human experts in this design process. One of the biggest remaining obstacles to leveraging the aforementioned methods for improving human decision-making is that the policies they learn are opaque to people. To solve this problem, we introduce AI-Interpret: a general method for transforming idiosyncratic policies into simple and interpretable descriptions. Our algorithm combines recent advances in imitation learning and program induction with a new clustering method for identifying a large subset of demonstrations that can be accurately described by a simple, high-performing decision rule. We evaluate our new AI-Interpret algorithm and employ it to translate information-acquisition policies discovered through metalevel reinforcement learning. The results of three large behavioral experiments showed that the provision of decision rules as flowcharts significantly improved people’s planning strategies and decisions across three different classes of sequential decision problems. Furthermore, a series of ablation studies confirmed that our AI-Interpret algorithm was critical to the discovery of interpretable decision rules and that it is ready to be applied to other reinforcement learning problems. We conclude that the methods and findings presented in this article are an important step towards leveraging automatic strategy discovery to improve human decision-making.

re

Automatic Discovery of Interpretable Planning Strategies The code for our algorithm and the experiments is available Project Page [BibTex]


no image
Advancing Rational Analysis to the Algorithmic Level

Lieder, F., Griffiths, T. L.

Behavioral and Brain Sciences, 43, E27, March 2020 (article)

Abstract
The commentaries raised questions about normativity, human rationality, cognitive architectures, cognitive constraints, and the scope or resource rational analysis (RRA). We respond to these questions and clarify that RRA is a methodological advance that extends the scope of rational modeling to understanding cognitive processes, why they differ between people, why they change over time, and how they could be improved.

re

Advancing rational analysis to the algorithmic level DOI [BibTex]

Advancing rational analysis to the algorithmic level DOI [BibTex]


no image
Learning to Overexert Cognitive Control in a Stroop Task

Bustamante, L., Lieder, F., Musslick, S., Shenhav, A., Cohen, J.

Febuary 2020, Laura Bustamante and Falk Lieder contributed equally to this publication. (article) In revision

Abstract
How do people learn when to allocate how much cognitive control to which task? According to the Learned Value of Control (LVOC) model, people learn to predict the value of alternative control allocations from features of a given situation. This suggests that people may generalize the value of control learned in one situation to other situations with shared features, even when the demands for cognitive control are different. This makes the intriguing prediction that what a person learned in one setting could, under some circumstances, cause them to misestimate the need for, and potentially over-exert control in another setting, even if this harms their performance. To test this prediction, we had participants perform a novel variant of the Stroop task in which, on each trial, they could choose to either name the color (more control-demanding) or read the word (more automatic). However only one of these tasks was rewarded, it changed from trial to trial, and could be predicted by one or more of the stimulus features (the color and/or the word). Participants first learned colors that predicted the rewarded task. Then they learned words that predicted the rewarded task. In the third part of the experiment, we tested how these learned feature associations transferred to novel stimuli with some overlapping features. The stimulus-task-reward associations were designed so that for certain combinations of stimuli the transfer of learned feature associations would incorrectly predict that more highly rewarded task would be color naming, which would require the exertion of control, even though the actually rewarded task was word reading and therefore did not require the engagement of control. Our results demonstrated that participants over-exerted control for these stimuli, providing support for the feature-based learning mechanism described by the LVOC model.

re

Learning to Overexert Cognitive Control in a Stroop Task DOI [BibTex]

Learning to Overexert Cognitive Control in a Stroop Task DOI [BibTex]


no image
Sliding Mode Control with Gaussian Process Regression for Underwater Robots

Lima, G. S., Trimpe, S., Bessa, W. M.

Journal of Intelligent & Robotic Systems, January 2020 (article)

ics

DOI [BibTex]

DOI [BibTex]


Hierarchical Event-triggered Learning for Cyclically Excited Systems with Application to Wireless Sensor Networks
Hierarchical Event-triggered Learning for Cyclically Excited Systems with Application to Wireless Sensor Networks

Beuchert, J., Solowjow, F., Raisch, J., Trimpe, S., Seel, T.

IEEE Control Systems Letters, 4(1):103-108, January 2020 (article)

ics

arXiv PDF DOI Project Page [BibTex]

arXiv PDF DOI Project Page [BibTex]


Toward a Formal Theory of Proactivity
Toward a Formal Theory of Proactivity

Lieder, F., Iwama, G.

January 2020 (article) Submitted

Abstract
Beyond merely reacting to their environment and impulses, people have the remarkable capacity to proactively set and pursue their own goals. But the extent to which they leverage this capacity varies widely across people and situations. The goal of this article is to make the mechanisms and variability of proactivity more amenable to rigorous experiments and computational modeling. We proceed in three steps. First, we develop and validate a mathematically precise behavioral measure of proactivity and reactivity that can be applied across a wide range of experimental paradigms. Second, we propose a formal definition of proactivity and reactivity, and develop a computational model of proactivity in the AX Continuous Performance Task (AX-CPT). Third, we develop and test a computational-level theory of meta-control over proactivity in the AX-CPT that identifies three distinct meta-decision-making problems: intention setting, resolving response conflict between intentions and automaticity, and deciding whether to recall context and intentions into working memory. People's response frequencies in the AX-CPT were remarkably well captured by a mixture between the predictions of our models of proactive and reactive control. Empirical data from an experiment varying the incentives and contextual load of an AX-CPT confirmed the predictions of our meta-control model of individual differences in proactivity. Our results suggest that proactivity can be understood in terms of computational models of meta-control. Our model makes additional empirically testable predictions. Future work will extend our models from proactive control in the AX-CPT to proactive goal creation and goal pursuit in the real world.

re

Toward a formal theory of proactivity DOI Project Page [BibTex]


Control-guided Communication: Efficient Resource Arbitration and Allocation in Multi-hop Wireless Control Systems
Control-guided Communication: Efficient Resource Arbitration and Allocation in Multi-hop Wireless Control Systems

Baumann, D., Mager, F., Zimmerling, M., Trimpe, S.

IEEE Control Systems Letters, 4(1):127-132, January 2020 (article)

ics

arXiv PDF DOI [BibTex]

arXiv PDF DOI [BibTex]


Wireless Control for Smart Manufacturing: Recent Approaches and Open Challenges
Wireless Control for Smart Manufacturing: Recent Approaches and Open Challenges

Baumann, D., Mager, F., Wetzker, U., Thiele, L., Zimmerling, M., Trimpe, S.

Proceedings of the IEEE, 2020 (article) Accepted

ics

arXiv DOI [BibTex]

arXiv DOI [BibTex]


Spatial Scheduling of Informative Meetings for Multi-Agent Persistent Coverage
Spatial Scheduling of Informative Meetings for Multi-Agent Persistent Coverage

Haksar, R. N., Trimpe, S., Schwager, M.

IEEE Robotics and Automation Letters, 2020 (article) Accepted

ics

DOI [BibTex]

DOI [BibTex]


Safe and Fast Tracking on a Robot Manipulator: Robust MPC and Neural Network Control
Safe and Fast Tracking on a Robot Manipulator: Robust MPC and Neural Network Control

Nubert, J., Koehler, J., Berenz, V., Allgower, F., Trimpe, S.

IEEE Robotics and Automation Letters, 2020 (article) Accepted

Abstract
Fast feedback control and safety guarantees are essential in modern robotics. We present an approach that achieves both by combining novel robust model predictive control (MPC) with function approximation via (deep) neural networks (NNs). The result is a new approach for complex tasks with nonlinear, uncertain, and constrained dynamics as are common in robotics. Specifically, we leverage recent results in MPC research to propose a new robust setpoint tracking MPC algorithm, which achieves reliable and safe tracking of a dynamic setpoint while guaranteeing stability and constraint satisfaction. The presented robust MPC scheme constitutes a one-layer approach that unifies the often separated planning and control layers, by directly computing the control command based on a reference and possibly obstacle positions. As a separate contribution, we show how the computation time of the MPC can be drastically reduced by approximating the MPC law with a NN controller. The NN is trained and validated from offline samples of the MPC, yielding statistical guarantees, and used in lieu thereof at run time. Our experiments on a state-of-the-art robot manipulator are the first to show that both the proposed robust and approximate MPC schemes scale to real-world robotic systems.

am ics

arXiv PDF DOI [BibTex]

arXiv PDF DOI [BibTex]


Event-triggered Learning for Linear Quadratic Control
Event-triggered Learning for Linear Quadratic Control

Schlüter, H., Solowjow, F., Trimpe, S.

IEEE Transactions on Automatic Control, 2020 (article) Accepted

ics

arXiv [BibTex]

arXiv [BibTex]