Header logo is


2010


no image
Reinforcement learning of full-body humanoid motor skills

Stulp, F., Buchli, J., Theodorou, E., Schaal, S.

In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on, pages: 405-410, December 2010, clmc (inproceedings)

Abstract
Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI2 is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.

am

link (url) [BibTex]

2010


link (url) [BibTex]


no image
Relative Entropy Policy Search

Peters, J., Mülling, K., Altun, Y.

In Proceedings of the Twenty-Fourth National Conference on Artificial Intelligence, pages: 1607-1612, (Editors: Fox, M. , D. Poole), AAAI Press, Menlo Park, CA, USA, Twenty-Fourth National Conference on Artificial Intelligence (AAAI-10), July 2010 (inproceedings)

Abstract
Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients (Bagnell and Schneider 2003), many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step. It works well on typical reinforcement learning benchmark problems.

am ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Reinforcement learning of motor skills in high dimensions: A path integral approach

Theodorou, E., Buchli, J., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 2397-2403, May 2010, clmc (inproceedings)

Abstract
Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensional continuous state-action spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal control theory and estimation theory, the update equations for learning are surprisingly simple and have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a robot dog illustrates the functionality of our algorithm in a real-world scenario. We believe that our new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse dynamics control of floating base systems using orthogonal decomposition

Mistry, M., Buchli, J., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 3406-3412, May 2010, clmc (inproceedings)

Abstract
Model-based control methods can be used to enable fast, dexterous, and compliant motion of robots without sacrificing control accuracy. However, implementing such techniques on floating base robots, e.g., humanoids and legged systems, is non-trivial due to under-actuation, dynamically changing constraints from the environment, and potentially closed loop kinematics. In this paper, we show how to compute the analytically correct inverse dynamics torques for model-based control of sufficiently constrained floating base rigid-body systems, such as humanoid robots with one or two feet in contact with the environment. While our previous inverse dynamics approach relied on an estimation of contact forces to compute an approximate inverse dynamics solution, here we present an analytically correct solution by using an orthogonal decomposition to project the robot dynamics onto a reduced dimensional space, independent of contact forces. We demonstrate the feasibility and robustness of our approach on a simulated floating base bipedal humanoid robot and an actual robot dog locomoting over rough terrain.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Fast, robust quadruped locomotion over challenging terrain

Kalakrishnan, M., Buchli, J., Pastor, P., Mistry, M., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 2665-2670, May 2010, clmc (inproceedings)

Abstract
We present a control architecture for fast quadruped locomotion over rough terrain. We approach the problem by decomposing it into many sub-systems, in which we apply state-of-the-art learning, planning, optimization and control techniques to achieve robust, fast locomotion. Unique features of our control strategy include: (1) a system that learns optimal foothold choices from expert demonstration using terrain templates, (2) a body trajectory optimizer based on the Zero-Moment Point (ZMP) stability criterion, and (3) a floating-base inverse dynamics controller that, in conjunction with force control, allows for robust, compliant locomotion over unperceived obstacles. We evaluate the performance of our controller by testing it on the LittleDog quadruped robot, over a wide variety of rough terrain of varying difficulty levels. We demonstrate the generalization ability of this controller by presenting test results from an independent external test team on terrains that have never been shown to us.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Policy learning algorithmis for motor learning (Algorithmen zum automatischen Erlernen von Motorfähigkigkeiten)

Peters, J., Kober, J., Schaal, S.

Automatisierungstechnik, 58(12):688-694, 2010, clmc (article)

Abstract
Robot learning methods which allow au- tonomous robots to adapt to novel situations have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. However, to date, learning techniques have yet to ful- fill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics. If possible, scaling was usually only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general ap- proach policy learning with the goal of an application to motor skill refinement in order to get one step closer towards human- like performance. For doing so, we study two major components for such an approach, i. e., firstly, we study policy learning algo- rithms which can be applied in the general setting of motor skill learning, and, secondly, we study a theoretically well-founded general approach to representing the required control structu- res for task representation and execution.

am

link (url) [BibTex]


no image
A Bayesian approach to nonlinear parameter identification for rigid-body dynamics

Ting, J., DSouza, A., Schaal, S.

Neural Networks, 2010, clmc (article)

Abstract
For complex robots such as humanoids, model-based control is highly beneficial for accurate tracking while keeping negative feedback gains low for compliance. However, in such multi degree-of-freedom lightweight systems, conventional identification of rigid body dynamics models using CAD data and actuator models is inaccurate due to unknown nonlinear robot dynamic effects. An alternative method is data-driven parameter estimation, but significant noise in measured and inferred variables affects it adversely. Moreover, standard estimation procedures may give physically inconsistent results due to unmodeled nonlinearities or insufficiently rich data. This paper addresses these problems, proposing a Bayesian system identification technique for linear or piecewise linear systems. Inspired by Factor Analysis regression, we develop a computationally efficient variational Bayesian regression algorithm that is robust to ill-conditioned data, automatically detects relevant features, and identifies input and output noise. We evaluate our approach on rigid body parameter estimation for various robotic systems, achieving an error of up to three times lower than other state-of-the-art machine learning methods.

am

link (url) [BibTex]


no image
A first optimal control solution for a complex, nonlinear, tendon driven neuromuscular finger model

Theodorou, E. A., Todorov, E., Valero-Cuevas, F.

Proceedings of the ASME 2010 Summer Bioengineering Conference August 30-September 2, 2010, Naples, Florida, USA, 2010, clmc (article)

Abstract
In this work we present the first constrained stochastic op- timal feedback controller applied to a fully nonlinear, tendon driven index finger model. Our model also takes into account an extensor mechanism, and muscle force-length and force-velocity properties. We show this feedback controller is robust to noise and perturbations to the dynamics, while successfully handling the nonlinearities and high dimensionality of the system. By ex- tending prior methods, we are able to approximate physiological realism by ensuring positivity of neural commands and tendon tensions at all timesthus can, for the first time, use the optimal control framework to predict biologically plausible tendon tensions for a nonlinear neuromuscular finger model. METHODS 1 Muscle Model The rigid-body triple pendulum finger model with slightly viscous joints is actuated by Hill-type muscle models. Joint torques are generated by the seven muscles of the index fin-

am

PDF [BibTex]

PDF [BibTex]


no image
Are reaching movements planned in kinematic or dynamic coordinates?

Ellmer, A., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2010), Naples, Florida, 2010, 2010, clmc (inproceedings)

Abstract
Whether human reaching movements are planned and optimized in kinematic (task space) or dynamic (joint or muscle space) coordinates is still an issue of debate. The first hypothesis implies that a planner produces a desired end-effector position at each point in time during the reaching movement, whereas the latter hypothesis includes the dynamics of the muscular-skeletal control system to produce a continuous end-effector trajectory. Previous work by Wolpert et al (1995) showed that when subjects were led to believe that their straight reaching paths corresponded to curved paths as shown on a computer screen, participants adapted the true path of their hand such that they would visually perceive a straight line in visual space, despite that they actually produced a curved path. These results were interpreted as supporting the stance that reaching trajectories are planned in kinematic coordinates. However, this experiment could only demonstrate that adaptation to altered paths, i.e. the position of the end-effector, did occur, but not that the precise timing of end-effector position was equally planned, i.e., the trajectory. Our current experiment aims at filling this gap by explicitly testing whether position over time, i.e. velocity, is a property of reaching movements that is planned in kinematic coordinates. In the current experiment, the velocity profiles of cursor movements corresponding to the participant's hand motions were skewed either to the left or to the right; the path itself was left unaltered. We developed an adaptation paradigm, where the skew of the velocity profile was introduced gradually and participants reported no awareness of any manipulation. Preliminary results indicate that the true hand motion of participants did not alter, i.e. there was no adaptation so as to counterbalance the introduced skew. However, for some participants, peak hand velocities were lowered for higher skews, which suggests that participants interpreted the manipulation as mere noise due to variance in their own movement. In summary, for a visuomotor transformation task, the hypothesis of a planned continuous end-effector trajectory predicts adaptation to a modified velocity profile. The current experiment found no systematic adaptation under such transformation, but did demonstrate an effect that is more in accordance that subjects could not perceive the manipulation and rather interpreted as an increase of noise.

am

[BibTex]

[BibTex]


no image
Optimality in Neuromuscular Systems

Theodorou, E. A., Valero-Cuevas, F.

In 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2010, clmc (inproceedings)

Abstract
Abstract? We provide an overview of optimal control meth- ods to nonlinear neuromuscular systems and discuss their lim- itations. Moreover we extend current optimal control methods to their application to neuromuscular models with realistically numerous musculotendons; as most prior work is limited to torque-driven systems. Recent work on computational motor control has explored the used of control theory and esti- mation as a conceptual tool to understand the underlying computational principles of neuromuscular systems. After all, successful biological systems regularly meet conditions for stability, robustness and performance for multiple classes of complex tasks. Among a variety of proposed control theory frameworks to explain this, stochastic optimal control has become a dominant framework to the point of being a standard computational technique to reproduce kinematic trajectories of reaching movements (see [12]) In particular, we demonstrate the application of optimal control to a neuromuscular model of the index finger with all seven musculotendons producing a tapping task. Our simu- lations include 1) a muscle model that includes force- length and force-velocity characteristics; 2) an anatomically plausible biomechanical model of the index finger that includes a tendi- nous network for the extensor mechanism and 3) a contact model that is based on a nonlinear spring-damper attached at the end effector of the index finger. We demonstrate that it is feasible to apply optimal control to systems with realistically large state vectors and conclude that, while optimal control is an adequate formalism to create computational models of neuro- musculoskeletal systems, there remain important challenges and limitations that need to be considered and overcome such as contact transitions, curse of dimensionality, and constraints on states and controls.

am

PDF [BibTex]

PDF [BibTex]


no image
Efficient learning and feature detection in high dimensional regression

Ting, J., D’Souza, A., Vijayakumar, S., Schaal, S.

Neural Computation, 22, pages: 831-886, 2010, clmc (article)

Abstract
We present a novel algorithm for efficient learning and feature selection in high- dimensional regression problems. We arrive at this model through a modification of the standard regression model, enabling us to derive a probabilistic version of the well-known statistical regression technique of backfitting. Using the Expectation- Maximization algorithm, along with variational approximation methods to overcome intractability, we extend our algorithm to include automatic relevance detection of the input features. This Variational Bayesian Least Squares (VBLS) approach retains its simplicity as a linear model, but offers a novel statistically robust â??black- boxâ? approach to generalized linear regression with high-dimensional inputs. It can be easily extended to nonlinear regression and classification problems. In particular, we derive the framework of sparse Bayesian learning, e.g., the Relevance Vector Machine, with VBLS at its core, offering significant computational and robustness advantages for this class of methods. We evaluate our algorithm on synthetic and neurophysiological data sets, as well as on standard regression and classification benchmark data sets, comparing it with other competitive statistical approaches and demonstrating its suitability as a drop-in replacement for other generalized linear regression techniques.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Stochastic Differential Dynamic Programming

Theodorou, E., Tassa, Y., Todorov, E.

In the proceedings of American Control Conference (ACC 2010) , 2010, clmc (article)

Abstract
We present a generalization of the classic Differential Dynamic Programming algorithm. We assume the existence of state- and control-dependent process noise, and proceed to derive the second-order expansion of the cost-to-go. Despite having quartic and cubic terms in the initial expression, we show that these vanish, leaving us with the same quadratic structure as standard DDP.

am

PDF [BibTex]

PDF [BibTex]


no image
Learning Policy Improvements with Path Integrals

Theodorou, E. A., Buchli, J., Schaal, S.

In International Conference on Artificial Intelligence and Statistics (AISTATS 2010), 2010, clmc (inproceedings)

Abstract
With the goal to generate more scalable algo- rithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classi- cal techniques from optimal control and dy- namic programming with modern learning techniques from statistical estimation the- ory. In this vein, this paper suggests the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parametrized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-Jacobi-Bellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path inte- gral which has no open parameters other than the exploration noise. The resulting algorithm can be conceived of as model- based, semi-model-based, or even model free, depending on how the learning problem is structured. Our new algorithm demon- strates interesting similarities with previous RL research in the framework of proba- bility matching and provides intuition why the slightly heuristically motivated proba- bility matching approach can actually per- form well. Empirical evaluations demon- strate significant performance improvements over gradient-based policy learning and scal- ability to high-dimensional control problems. We believe that Policy Improvement with Path Integrals (PI2) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs.

am

PDF [BibTex]

PDF [BibTex]


no image
Learning optimal control solutions: a path integral approach

Theodorou, E., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2010), Naples, Florida, 2010, 2010, clmc (inproceedings)

Abstract
Investigating principles of human motor control in the framework of optimal control has had a long tradition in neural control of movement, and has recently experienced a new surge of investigations. Ideally, optimal control problems are addresses as a reinforcement learning (RL) problem, which would allow to investigate both the process of acquiring an optimal control solution as well as the solution itself. Unfortunately, the applicability of RL to complex neural and biomechanics systems has been largely impossible so far due to the computational difficulties that arise in high dimensional continuous state-action spaces. As a way out, research has focussed on computing optimal control solutions based on iterative optimal control methods that are based on linear and quadratic approximations of dynamical models and cost functions. These methods require perfect knowledge of the dynamics and cost functions while they are based on gradient and Newton optimization schemes. Their applicability is also restricted to low dimensional problems due to problematic convergence in high dimensions. Moreover, the process of computing the optimal solution is removed from the learning process that might be plausible in biology. In this work, we present a new reinforcement learning method for learning optimal control solutions or motor control. This method, based on the framework of stochastic optimal control with path integrals, has a very solid theoretical foundation, while resulting in surprisingly simple learning algorithms. It is also possible to apply this approach without knowledge of the system model, and to use a wide variety of complex nonlinear cost functions for optimization. We illustrate the theoretical properties of this approach and its applicability to learning motor control tasks for reaching movements and locomotion studies. We discuss its applicability to learning desired trajectories, variable stiffness control (co-contraction), and parameterized control policies. We also investigate the applicability to signal dependent noise control systems. We believe that the suggested method offers one of the easiest to use approaches to learning optimal control suggested in the literature so far, which makes it ideally suited for computational investigations of biological motor control.

am

[BibTex]

[BibTex]


no image
Learning control in robotics – trajectory-based opitimal control techniques

Schaal, S., Atkeson, C. G.

Robotics and Automation Magazine, 17(2):20-29, 2010, clmc (article)

Abstract
In a not too distant future, robots will be a natural part of daily life in human society, providing assistance in many areas ranging from clinical applications, education and care giving, to normal household environments [1]. It is hard to imagine that all possible tasks can be preprogrammed in such robots. Robots need to be able to learn, either by themselves or with the help of human supervision. Additionally, wear and tear on robots in daily use needs to be automatically compensated for, which requires a form of continuous self-calibration, another form of learning. Finally, robots need to react to stochastic and dynamic environments, i.e., they need to learn how to optimally adapt to uncertainty and unforeseen changes. Robot learning is going to be a key ingredient for the future of autonomous robots. While robot learning covers a rather large field, from learning to perceive, to plan, to make decisions, etc., we will focus this review on topics of learning control, in particular, as it is concerned with learning control in simulated or actual physical robots. In general, learning control refers to the process of acquiring a control strategy for a particular control system and a particular task by trial and error. Learning control is usually distinguished from adaptive control [2] in that the learning system can have rather general optimization objectivesâ??not just, e.g., minimal tracking errorâ??and is permitted to fail during the process of learning, while adaptive control emphasizes fast convergence without failure. Thus, learning control resembles the way that humans and animals acquire new movement strategies, while adaptive control is a special case of learning control that fulfills stringent performance constraints, e.g., as needed in life-critical systems like airplanes. Learning control has been an active topic of research for at least three decades. However, given the lack of working robots that actually use learning components, more work needs to be done before robot learning will make it beyond the laboratory environment. This article will survey some ongoing and past activities in robot learning to assess where the field stands and where it is going. We will largely focus on nonwheeled robots and less on topics of state estimation, as typically explored in wheeled robots [3]â??6], and we emphasize learning in continuous state-action spaces rather than discrete state-action spaces [7], [8]. We will illustrate the different topics of robot learning with examples from our own research with anthropomorphic and humanoid robots.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Learning, planning, and control for quadruped locomotion over challenging terrain

Kalakrishnan, M., Buchli, J., Pastor, P., Mistry, M., Schaal, S.

International Journal of Robotics Research, 30(2):236-258, 2010, clmc (article)

Abstract
We present a control architecture for fast quadruped locomotion over rough terrain. We approach the problem by decomposing it into many sub-systems, in which we apply state-of-the-art learning, planning, optimization, and control techniques to achieve robust, fast locomotion. Unique features of our control strategy include: (1) a system that learns optimal foothold choices from expert demonstration using terrain templates, (2) a body trajectory optimizer based on the Zero- Moment Point (ZMP) stability criterion, and (3) a floating-base inverse dynamics controller that, in conjunction with force control, allows for robust, compliant locomotion over unperceived obstacles. We evaluate the performance of our controller by testing it on the LittleDog quadruped robot, over a wide variety of rough terrains of varying difficulty levels. The terrain that the robot was tested on includes rocks, logs, steps, barriers, and gaps, with obstacle sizes up to the leg length of the robot. We demonstrate the generalization ability of this controller by presenting results from testing performed by an independent external test team on terrain that has never been shown to us.

am

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
Constrained Accelerations for Controlled Geometric Reduction: Sagittal-Plane Decoupling for Bipedal Locomotion

Gregg, R., Righetti, L., Buchli, J., Schaal, S.

In 2010 10th IEEE-RAS International Conference on Humanoid Robots, pages: 1-7, IEEE, Nashville, USA, 2010 (inproceedings)

Abstract
Energy-shaping control methods have produced strong theoretical results for asymptotically stable 3D bipedal dynamic walking in the literature. In particular, geometric controlled reduction exploits robot symmetries to control momentum conservation laws that decouple the sagittal-plane dynamics, which are easier to stabilize. However, the associated control laws require high-dimensional matrix inverses multiplied with complicated energy-shaping terms, often making these control theories difficult to apply to highly-redundant humanoid robots. This paper presents a first step towards the application of energy-shaping methods on real robots by casting controlled reduction into a framework of constrained accelerations for inverse dynamics control. By representing momentum conservation laws as constraints in acceleration space, we construct a general expression for desired joint accelerations that render the constraint surface invariant. By appropriately choosing an orthogonal projection, we show that the unconstrained (reduced) dynamics are decoupled from the constrained dynamics. Any acceleration-based controller can then be used to stabilize this planar subsystem, including passivity-based methods. The resulting control law is surprisingly simple and represents a practical way to employ control theoretic stability results in robotic platforms. Simulated walking of a 3D compass-gait biped show correspondence between the new and original controllers, and simulated motions of a 16-DOF humanoid demonstrate the applicability of this method.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Variable impedance control - a reinforcement learning approach

Buchli, J., Theodorou, E., Stulp, F., Schaal, S.

In Robotics Science and Systems (2010), Zaragoza, Spain, June 27-30, 2010, clmc (inproceedings)

Abstract
One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high DOF robotic tasks. In this contribution, we accomplish such gain scheduling with a reinforcement learning approach algorithm, PI2 (Policy Improvement with Path Integrals). PI2 is a model-free, sampling based learning method derived from first principles of optimal control. The PI2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI2 is that it can scale to problems of many DOFs, so that RL on real robotic systems becomes feasible. We sketch the PI2 algorithm and its theoretical properties, and how it is applied to gain scheduling. We evaluate our approach by presenting results on two different simulated robotic systems, a 3-DOF Phantom Premium Robot and a 6-DOF Kuka Lightweight Robot. We investigate tasks where the optimal strategy requires both tuning of the impedance of the end-effector, and tuning of a reference trajectory. The results show that we can use path integral based RL not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse dynamics with optimal distribution of ground reaction forces for legged robot

Righetti, L., Buchli, J., Mistry, M., Schaal, S.

In Proceedings of the 13th International Conference on Climbing and Walking Robots (CLAWAR), pages: 580-587, Nagoya, Japan, sep 2010 (inproceedings)

Abstract
Contact interaction with the environment is crucial in the design of locomotion controllers for legged robots, to prevent slipping for example. Therefore, it is of great importance to be able to control the effects of the robots movements on the contact reaction forces. In this contribution, we extend a recent inverse dynamics algorithm for floating base robots to optimize the distribution of contact forces while achieving precise trajectory tracking. The resulting controller is algorithmically simple as compared to other approaches. Numerical simulations show that this result significantly increases the range of possible movements of a humanoid robot as compared to the previous inverse dynamics algorithm. We also present a simplification of the result where no inversion of the inertia matrix is needed which is particularly relevant for practical use on a real robot. Such an algorithm becomes interesting for agile locomotion of robots on difficult terrains where the contacts with the environment are critical, such as walking over rough or slippery terrain.

am mg

DOI [BibTex]

DOI [BibTex]

2007


no image
Towards Machine Learning of Motor Skills

Peters, J., Schaal, S., Schölkopf, B.

In Proceedings of Autonome Mobile Systeme (AMS), pages: 138-144, (Editors: K Berns and T Luksch), 2007, clmc (inproceedings)

Abstract
Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks that a robot should fulfill. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two ma jor components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.

am ei

PDF DOI [BibTex]

2007


PDF DOI [BibTex]


no image
Reinforcement Learning for Optimal Control of Arm Movements

Theodorou, E., Peters, J., Schaal, S.

In Abstracts of the 37st Meeting of the Society of Neuroscience., Neuroscience, 2007, clmc (inproceedings)

Abstract
Every day motor behavior consists of a plethora of challenging motor skills from discrete movements such as reaching and throwing to rhythmic movements such as walking, drumming and running. How this plethora of motor skills can be learned remains an open question. In particular, is there any unifying computa-tional framework that could model the learning process of this variety of motor behaviors and at the same time be biologically plausible? In this work we aim to give an answer to these questions by providing a computational framework that unifies the learning mechanism of both rhythmic and discrete movements under optimization criteria, i.e., in a non-supervised trial-and-error fashion. Our suggested framework is based on Reinforcement Learning, which is mostly considered as too costly to be a plausible mechanism for learning com-plex limb movement. However, recent work on reinforcement learning with pol-icy gradients combined with parameterized movement primitives allows novel and more efficient algorithms. By using the representational power of such mo-tor primitives we show how rhythmic motor behaviors such as walking, squash-ing and drumming as well as discrete behaviors like reaching and grasping can be learned with biologically plausible algorithms. Using extensive simulations and by using different reward functions we provide results that support the hy-pothesis that Reinforcement Learning could be a viable candidate for motor learning of human motor behavior when other learning methods like supervised learning are not feasible.

am ei

[BibTex]

[BibTex]


no image
Reinforcement learning by reward-weighted regression for operational space control

Peters, J., Schaal, S.

In Proceedings of the 24th Annual International Conference on Machine Learning, pages: 745-750, ICML, 2007, clmc (inproceedings)

Abstract
Many robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.

am ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Policy gradient methods for machine learning

Peters, J., Theodorou, E., Schaal, S.

In Proceedings of the 14th INFORMS Conference of the Applied Probability Society, pages: 97-98, Eindhoven, Netherlands, July 9-11, 2007, 2007, clmc (inproceedings)

Abstract
We present an in-depth survey of policy gradient methods as they are used in the machine learning community for optimizing parameterized, stochastic control policies in Markovian systems with respect to the expected reward. Despite having been developed separately in the reinforcement learning literature, policy gradient methods employ likelihood ratio gradient estimators as also suggested in the stochastic simulation optimization community. It is well-known that this approach to policy gradient estimation traditionally suffers from three drawbacks, i.e., large variance, a strong dependence on baseline functions and a inefficient gradient descent. In this talk, we will present a series of recent results which tackles each of these problems. The variance of the gradient estimation can be reduced significantly through recently introduced techniques such as optimal baselines, compatible function approximations and all-action gradients. However, as even the analytically obtainable policy gradients perform unnaturally slow, it required the step from ÔvanillaÕ policy gradient methods towards natural policy gradients in order to overcome the inefficiency of the gradient descent. This development resulted into the Natural Actor-Critic architecture which can be shown to be very efficient in application to motor primitive learning for robotics.

am ei

[BibTex]

[BibTex]


no image
Policy Learning for Motor Skills

Peters, J., Schaal, S.

In Proceedings of 14th International Conference on Neural Information Processing (ICONIP), pages: 233-242, (Editors: Ishikawa, M. , K. Doya, H. Miyamoto, T. Yamakawa), 2007, clmc (inproceedings)

Abstract
Policy learning which allows autonomous robots to adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach policy learning with the goal of an application to motor skill refinement in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, we study policy learning algorithms which can be applied in the general setting of motor skill learning, and, secondly, we study a theoretically well-founded general approach to representing the required control structures for task representation and execution.

am ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Reinforcement learning for operational space control

Peters, J., Schaal, S.

In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, pages: 2111-2116, IEEE Computer Society, ICRA, 2007, clmc (inproceedings)

Abstract
While operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting supervised learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-convexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. The important insight that many operational space control algorithms can be reformulated as optimal control problems, however, allows addressing this inverse learning problem in the framework of reinforcement learning. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-based reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.

am ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Using reward-weighted regression for reinforcement learning of task space control

Peters, J., Schaal, S.

In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages: 262-267, Honolulu, Hawaii, April 1-5, 2007, 2007, clmc (inproceedings)

Abstract
In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.

am ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark

Riedmiller, M., Peters, J., Schaal, S.

In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages: 254-261, ADPRL, 2007, clmc (inproceedings)

Abstract
In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.

am ei

PDF [BibTex]

PDF [BibTex]


no image
Uncertain 3D Force Fields in Reaching Movements: Do Humans Favor Robust or Average Performance?

Mistry, M., Theodorou, E., Hoffmann, H., Schaal, S.

In Abstracts of the 37th Meeting of the Society of Neuroscience, 2007, clmc (inproceedings)

am

PDF [BibTex]

PDF [BibTex]


no image
Applying the episodic natural actor-critic architecture to motor primitive learning

Peters, J., Schaal, S.

In Proceedings of the 2007 European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, April 25-27, 2007, clmc (inproceedings)

Abstract
In this paper, we investigate motor primitive learning with the Natural Actor-Critic approach. The Natural Actor-Critic consists out of actor updates which are achieved using natural stochastic policy gradients while the critic obtains the natural policy gradient by linear regression. We show that this architecture can be used to learn the Òbuilding blocks of movement generationÓ, called motor primitives. Motor primitives are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. We show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

am

link (url) [BibTex]

link (url) [BibTex]


no image
The new robotics - towards human-centered machines

Schaal, S.

HFSP Journal Frontiers of Interdisciplinary Research in the Life Sciences, 1(2):115-126, 2007, clmc (article)

Abstract
Research in robotics has moved away from its primary focus on industrial applications. The New Robotics is a vision that has been developed in past years by our own university and many other national and international research instiutions and addresses how increasingly more human-like robots can live among us and take over tasks where our current society has shortcomings. Elder care, physical therapy, child education, search and rescue, and general assistance in daily life situations are some of the examples that will benefit from the New Robotics in the near future. With these goals in mind, research for the New Robotics has to embrace a broad interdisciplinary approach, ranging from traditional mathematical issues of robotics to novel issues in psychology, neuroscience, and ethics. This paper outlines some of the important research problems that will need to be resolved to make the New Robotics a reality.

am

link (url) [BibTex]

link (url) [BibTex]


no image
A computational model of human trajectory planning based on convergent flow fields

Hoffman, H., Schaal, S.

In Abstracts of the 37st Meeting of the Society of Neuroscience, San Diego, CA, Nov. 3-7, 2007, clmc (inproceedings)

Abstract
A popular computational model suggests that smooth reaching movements are generated in humans by minimizing a difference vector between hand and target in visual coordinates (Shadmehr and Wise, 2005). To achieve such a task, the optimal joint accelerations may be pre-computed. However, this pre-planning is inflexible towards perturbations of the limb, and there is strong evidence that reaching movements can be modified on-line at any moment during the movement. Thus, next-state planning models (Bullock and Grossberg, 1988) have been suggested that compute the current control command from a function of the goal state such that the overall movement smoothly converges to the goal (see Shadmehr and Wise (2005) for an overview). So far, these models have been restricted to simple point-to-point reaching movements with (approximately) straight trajectories. Here, we present a computational model for learning and executing arbitrary trajectories that combines ideas from pattern generation with dynamic systems and the observation of convergent force fields, which control a frog leg after spinal stimulation (Giszter et al., 1993). In our model, we incorporate the following two observations: first, the orientation of vectors in a force field is invariant over time, but their amplitude is modulated by a time-varying function, and second, two force fields add up when stimulated simultaneously (Giszter et al., 1993). This addition of convergent force fields varying over time results in a virtual trajectory (a moving equilibrium point) that correlates with the actual leg movement (Giszter et al., 1993). Our next-state planner is a set of differential equations that provide the desired end-effector or joint accelerations using feedback of the current state of the limb. These accelerations can be interpreted as resulting from a damped spring that links the current limb position with a virtual trajectory. This virtual trajectory can be learned to realize any desired limb trajectory and velocity profile, and learning is efficient since the time-modulated sum of convergent force fields equals a sum of weighted basis functions (Gaussian time pulses). Thus, linear algebra is sufficient to compute these weights, which correspond to points on the virtual trajectory. During movement execution, the differential equation corrects automatically for perturbations and brings back smoothly the limb towards the goal. Virtual trajectories can be rescaled and added allowing to build a set of movement primitives to describe movements more complex than previously learned. We demonstrate the potential of the suggested model by learning and generating a wide variety of movements.

am

[BibTex]

[BibTex]


no image
A Computational Model of Arm Trajectory Modification Using Dynamic Movement Primitives

Mohajerian, P., Hoffmann, H., Mistry, M., Schaal, S.

In Abstracts of the 37st Meeting of the Society of Neuroscience, San Diego, CA, Nov 3-7, 2007, clmc (inproceedings)

Abstract
Several scientists used a double-step target-displacement protocol to investigate how an unexpected upcoming new target modifies ongoing discrete movements. Interesting observations are the initial direction of the movement, the spatial path of the movement to the second target, and the amplification of the speed in the second movement. Experimental data show that the above properties are influenced by the movement reaction time and the interstimulus interval between the onset of the first and second target. Hypotheses in the literature concerning the interpretation of the observed data include a) the second movement is superimposed on the first movement (Henis and Flash, 1995), b) the first movement is aborted and the second movement is planned to smoothly connect the current state of the arm with the new target (Hoff and Arbib, 1992), c) the second movement is initiated by a new control signal that replaces the first movement's control signal, but does not take the state of the system into account (Flanagan et al., 1993), and (d) the second movement is initiated by a new goal command, but the control structure stays unchanged, and feed-back from the current state is taken into account (Hoff and Arbib, 1993). We investigate target switching from the viewpoint of Dynamic Movement Primitives (DMPs). DMPs are trajectory planning units that are formalized as stable nonlinear attractor systems (Ijspeert et al., 2002). They are a useful framework for biological motor control as they are highly flexible in creating complex rhythmic and discrete behaviors that can quickly adapt to the inevitable perturbations of dynamically changing, stochastic environments. In this model, target switching is accomplished simply by updating the target input to the discrete movement primitive for reaching. The reaching trajectory in this model can be straight or take any other route; in contrast, the Hoff and Arbib (1993) model is restricted to straight reaching movement plans. In the present study, we use DMPs to reproduce in simulation a large number of target-switching experimental data from the literature and to show that online correction and the observed target switching phenomena can be accomplished by changing the goal state of an on-going DMP, without the need to switch to different movement primitives or to re-plan the movement. :

am

PDF [BibTex]

PDF [BibTex]


no image
Inverse dynamics control with floating base and constraints

Nakanishi, J., Mistry, M., Schaal, S.

In International Conference on Robotics and Automation (ICRA2007), pages: 1942-1947, Rome, Italy, April 10-14, 2007, clmc (inproceedings)

Abstract
In this paper, we address the issues of compliant control of a robot under contact constraints with a goal of using joint space based pattern generators as movement primitives, as often considered in the studies of legged locomotion and biological motor control. For this purpose, we explore inverse dynamics control of constrained dynamical systems. When the system is overconstrained, it is not straightforward to formulate an inverse dynamics control law since the problem becomes an ill-posed one, where infinitely many combinations of joint torques are possible to achieve the desired joint accelerations. The goal of this paper is to develop a general and computationally efficient inverse dynamics algorithm for a robot with a free floating base and constraints. We suggest an approximate way of computing inverse dynamics algorithm by treating constraint forces computed with a Lagrange multiplier method as simply external forces based on FeatherstoneÕs floating base formulation of inverse dynamics. We present how all the necessary quantities to compute our controller can be efficiently extracted from FeatherstoneÕs spatial notation of robot dynamics. We evaluate the effectiveness of the suggested approach on a simulated biped robot model.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Kernel carpentry for onlne regression using randomly varying coefficient model

Edakunni, N. U., Schaal, S., Vijayakumar, S.

In Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India: Jan. 6-12, 2007, clmc (inproceedings)

Abstract
We present a Bayesian formulation of locally weighted learning (LWL) using the novel concept of a randomly varying coefficient model. Based on this, we propose a mechanism for multivariate non-linear regression using spatially localised linear models that learns completely independent of each other, uses only local information and adapts the local model complexity in a data driven fashion. We derive online updates for the model parameters based on variational Bayesian EM. The evaluation of the proposed algorithm against other state-of-the-art methods reveal the excellent, robust generalization performance beside surprisingly efficient time and space complexity properties. This paper, for the first time, brings together the computational efficiency and the adaptability of Õnon-competitiveÕ locally weighted learning schemes and the modeling guarantees of the Bayesian formulation.

am

link (url) [BibTex]

link (url) [BibTex]


no image
A robust quadruped walking gait for traversing rough terrain

Pongas, D., Mistry, M., Schaal, S.

In International Conference on Robotics and Automation (ICRA2007), pages: 1474-1479, Rome, April 10-14, 2007, 2007, clmc (inproceedings)

Abstract
Legged locomotion excels when terrains become too rough for wheeled systems or open-loop walking pattern generators to succeed, i.e., when accurate foot placement is of primary importance in successfully reaching the task goal. In this paper we address the scenario where the rough terrain is traversed with a static walking gait, and where for every foot placement of a leg, the location of the foot placement was selected irregularly by a planning algorithm. Our goal is to adjust a smooth walking pattern generator with the selection of every foot placement such that the COG of the robot follows a stable trajectory characterized by a stability margin relative to the current support triangle. We propose a novel parameterization of the COG trajectory based on the current position, velocity, and acceleration of the four legs of the robot. This COG trajectory has guaranteed continuous velocity and acceleration profiles, which leads to continuous velocity and acceleration profiles of the leg movement, which is ideally suited for advanced model-based controllers. Pitch, yaw, and ground clearance of the robot are easily adjusted automatically under any terrain situation. We evaluate our gait generation technique on the Little-Dog quadruped robot when traversing complex rocky and sloped terrains.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Bayesian Nonparametric Regression with Local Models

Ting, J., Schaal, S.

In Workshop on Robotic Challenges for Machine Learning, NIPS 2007, 2007, clmc (inproceedings)

am

[BibTex]

[BibTex]


no image
Task space control with prioritization for balance and locomotion

Mistry, M., Nakanishi, J., Schaal, S.

In IEEE International Conference on Intelligent Robotics Systems (IROS 2007), San Diego, CA: Oct. 29 Ð Nov. 2, 2007, clmc (inproceedings)

Abstract
This paper addresses locomotion with active balancing, via task space control with prioritization. The center of gravity (COG) and foot of the swing leg are treated as task space control points. Floating base inverse kinematics with constraints is employed, thereby allowing for a mobile platform suitable for locomotion. Different techniques of task prioritization are discussed and we clarify differences and similarities of previous suggested work. Varying levels of prioritization for control are examined with emphasis on singularity robustness and the negative effects of constraint switching. A novel controller for task space control of balance and locomotion is developed which attempts to address singularity robustness, while minimizing discontinuities created by constraint switching. Controllers are evaluated using a quadruped robot simulator engaging in a locomotion task.

am

link (url) [BibTex]

link (url) [BibTex]