Header logo is de


2012


Thumb xl screen shot 2015 08 23 at 13.56.29
Towards Multi-DOF model mediated teleoperation: Using vision to augment feedback

Willaert, B., Bohg, J., Van Brussel, H., Niemeyer, G.

In IEEE International Workshop on Haptic Audio Visual Environments and Games (HAVE), pages: 25-31, October 2012 (inproceedings)

Abstract
In this paper, we address some of the challenges that arise as model-mediated teleoperation is applied to systems with multiple degrees of freedom and multiple sensors. Specifically we use a system with position, force, and vision sensors to explore an environment geometry in two degrees of freedom. The inclusion of vision is proposed to alleviate the difficulties of estimating an increasing number of environment properties. Vision can furthermore increase the predictive nature of model-mediated teleoperation, by effectively predicting touch feedback before the slave is even in contact with the environment. We focus on the case of estimating the location and orientation of a local surface patch at the contact point between the slave and the environment. We describe the various information sources with their respective limitations and create a combined model estimator as part of a multi-d.o.f. model-mediated controller. An experiment demonstrates the feasibility and benefits of utilizing vision sensors in teleoperation.

am

DOI [BibTex]

2012


DOI [BibTex]


Thumb xl sankaran iros 20121
Failure Recovery with Shared Autonomy

Sankaran, B., Pitzer, B., Osentoski, S.

In International Conference on Intelligent Robots and Systems, October 2012 (inproceedings)

Abstract
Building robots capable of long term autonomy has been a long standing goal of robotics research. Such systems must be capable of performing certain tasks with a high degree of robustness and repeatability. In the context of personal robotics, these tasks could range anywhere from retrieving items from a refrigerator, loading a dishwasher, to setting up a dinner table. Given the complexity of tasks there are a multitude of failure scenarios that the robot can encounter, irrespective of whether the environment is static or dynamic. For a robot to be successful in such situations, it would need to know how to recover from failures or when to ask a human for help. This paper, presents a novel shared autonomy behavioral executive to addresses these issues. We demonstrate how this executive combines generalized logic based recovery and human intervention to achieve continuous failure free operation. We tested the systems over 250 trials of two different use case experiments. Our current algorithm drastically reduced human intervention from 26% to 4% on the first experiment and 46% to 9% on the second experiment. This system provides a new dimension to robot autonomy, where robots can exhibit long term failure free operation with minimal human supervision. We also discuss how the system can be generalized.

am

link (url) [BibTex]

link (url) [BibTex]


Thumb xl bottlehandovergrasp
Task-Based Grasp Adaptation on a Humanoid Robot

Bohg, J., Welke, K., León, B., Do, M., Song, D., Wohlkinger, W., Aldoma, A., Madry, M., Przybylski, M., Asfour, T., Marti, H., Kragic, D., Morales, A., Vincze, M.

In 10th IFAC Symposium on Robot Control, SyRoCo 2012, Dubrovnik, Croatia, September 5-7, 2012., pages: 779-786, September 2012 (inproceedings)

Abstract
In this paper, we present an approach towards autonomous grasping of objects according to their category and a given task. Recent advances in the field of object segmentation and categorization as well as task-based grasp inference have been leveraged by integrating them into one pipeline. This allows us to transfer task-specific grasp experience between objects of the same category. The effectiveness of the approach is demonstrated on the humanoid robot ARMAR-IIIa.

am

Video pdf DOI [BibTex]

Video pdf DOI [BibTex]


no image
Movement Segmentation and Recognition for Imitation Learning

Meier, F., Theodorou, E., Schaal, S.

In Seventeenth International Conference on Artificial Intelligence and Statistics, La Palma, Canary Islands, Fifteenth International Conference on Artificial Intelligence and Statistics , April 2012 (inproceedings)

am

link (url) [BibTex]

link (url) [BibTex]


no image
From Dynamic Movement Primitives to Associative Skill Memories

Pastor, P., Kalakrishnan, M., Meier, F., Stulp, F., Buchli, J., Theodorou, E., Schaal, S.

Robotics and Autonomous Systems, 2012 (article)

am

Project Page [BibTex]

Project Page [BibTex]


no image
Inverse dynamics with optimal distribution of contact forces for the control of legged robots

Righetti, L., Schaal, S.

In Dynamic Walking 2012, Pensacola, 2012 (inproceedings)

am

[BibTex]

[BibTex]


no image
Encoding of Periodic and their Transient Motions by a Single Dynamic Movement Primitive

Ernesti, J., Righetti, L., Do, M., Asfour, T., Schaal, S.

In 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages: 57-64, IEEE, Osaka, Japan, November 2012 (inproceedings)

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Force Control Policies for Compliant Robotic Manipulation

Kalakrishnan, M., Righetti, L., Pastor, P., Schaal, S.

In ICML’12 Proceedings of the 29th International Coference on International Conference on Machine Learning, pages: 49-50, Edinburgh, Scotland, 2012 (inproceedings)

am mg

[BibTex]

[BibTex]


no image
Quadratic programming for inverse dynamics with optimal distribution of contact forces

Righetti, L., Schaal, S.

In 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages: 538-543, IEEE, Osaka, Japan, November 2012 (inproceedings)

Abstract
In this contribution we propose an inverse dynamics controller for a humanoid robot that exploits torque redundancy to minimize any combination of linear and quadratic costs in the contact forces and the commands. In addition the controller satisfies linear equality and inequality constraints in the contact forces and the commands such as torque limits, unilateral contacts or friction cones limits. The originality of our approach resides in the formulation of the problem as a quadratic program where we only need to solve for the control commands and where the contact forces are optimized implicitly. Furthermore, we do not need a structured representation of the dynamics of the robot (i.e. an explicit computation of the inertia matrix). It is in contrast with existing methods based on quadratic programs. The controller is then robust to uncertainty in the estimation of the dynamics model and the optimization is fast enough to be implemented in high bandwidth torque control loops that are increasingly available on humanoid platforms. We demonstrate properties of our controller with simulations of a human size humanoid robot.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Model-free reinforcement learning of impedance control in stochastic environments

Stulp, Freek, Buchli, Jonas, Ellmer, Alice, Mistry, Michael, Theodorou, Evangelos A., Schaal, S.

Autonomous Mental Development, IEEE Transactions on, 4(4):330-341, 2012 (article)

am

[BibTex]

[BibTex]


no image
Towards Associative Skill Memories

Pastor, P., Kalakrishnan, M., Righetti, L., Schaal, S.

In 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages: 309-315, IEEE, Osaka, Japan, November 2012 (inproceedings)

Abstract
Movement primitives as basis of movement planning and control have become a popular topic in recent years. The key idea of movement primitives is that a rather small set of stereotypical movements should suffice to create a large set of complex manipulation skills. An interesting side effect of stereotypical movement is that it also creates stereotypical sensory events, e.g., in terms of kinesthetic variables, haptic variables, or, if processed appropriately, visual variables. Thus, a movement primitive executed towards a particular object in the environment will associate a large number of sensory variables that are typical for this manipulation skill. These association can be used to increase robustness towards perturbations, and they also allow failure detection and switching towards other behaviors. We call such movement primitives augmented with sensory associations Associative Skill Memories (ASM). This paper addresses how ASMs can be acquired by imitation learning and how they can create robust manipulation skill by determining subsequent ASMs online to achieve a particular manipulation goal. Evaluation for grasping and manipulation with a Barrett WAM/Hand illustrate our approach.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Template-based learning of grasp selection

Herzog, A., Pastor, P., Kalakrishnan, M., Righetti, L., Asfour, T., Schaal, S.

In 2012 IEEE International Conference on Robotics and Automation, pages: 2379-2384, IEEE, Saint Paul, USA, 2012 (inproceedings)

Abstract
The ability to grasp unknown objects is an important skill for personal robots, which has been addressed by many present and past research projects, but still remains an open problem. A crucial aspect of grasping is choosing an appropriate grasp configuration, i.e. the 6d pose of the hand relative to the object and its finger configuration. Finding feasible grasp configurations for novel objects, however, is challenging because of the huge variety in shape and size of these objects. Moreover, possible configurations also depend on the specific kinematics of the robotic arm and hand in use. In this paper, we introduce a new grasp selection algorithm able to find object grasp poses based on previously demonstrated grasps. Assuming that objects with similar shapes can be grasped in a similar way, we associate to each demonstrated grasp a grasp template. The template is a local shape descriptor for a possible grasp pose and is constructed using 3d information from depth sensors. For each new object to grasp, the algorithm then finds the best grasp candidate in the library of templates. The grasp selection is also able to improve over time using the information of previous grasp attempts to adapt the ranking of the templates. We tested the algorithm on two different platforms, the Willow Garage PR2 and the Barrett WAM arm which have very different hands. Our results show that the algorithm is able to find good grasp configurations for a large set of objects from a relatively small set of demonstrations, and does indeed improve its performance over time.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Reinforcement Learning with Sequences of Motion Primitives for Robust Manipulation

Stulp, F., Theodorou, E., Schaal, S.

IEEE Transactions on Robotics, 2012 (article)

am

[BibTex]

[BibTex]


no image
Probabilistic depth image registration incorporating nonvisual information

Wüthrich, M., Pastor, P., Righetti, L., Billard, A., Schaal, S.

In 2012 IEEE International Conference on Robotics and Automation, pages: 3637-3644, IEEE, Saint Paul, USA, 2012 (inproceedings)

Abstract
In this paper, we derive a probabilistic registration algorithm for object modeling and tracking. In many robotics applications, such as manipulation tasks, nonvisual information about the movement of the object is available, which we will combine with the visual information. Furthermore we do not only consider observations of the object, but we also take space into account which has been observed to not be part of the object. Furthermore we are computing a posterior distribution over the relative alignment and not a point estimate as typically done in for example Iterative Closest Point (ICP). To our knowledge no existing algorithm meets these three conditions and we thus derive a novel registration algorithm in a Bayesian framework. Experimental results suggest that the proposed methods perform favorably in comparison to PCL [1] implementations of feature mapping and ICP, especially if nonvisual information is available.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2010


no image
Reinforcement learning of full-body humanoid motor skills

Stulp, F., Buchli, J., Theodorou, E., Schaal, S.

In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on, pages: 405-410, December 2010, clmc (inproceedings)

Abstract
Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI2 is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.

am

link (url) [BibTex]

2010


link (url) [BibTex]


no image
Relative Entropy Policy Search

Peters, J., Mülling, K., Altun, Y.

In Proceedings of the Twenty-Fourth National Conference on Artificial Intelligence, pages: 1607-1612, (Editors: Fox, M. , D. Poole), AAAI Press, Menlo Park, CA, USA, Twenty-Fourth National Conference on Artificial Intelligence (AAAI-10), July 2010 (inproceedings)

Abstract
Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients (Bagnell and Schneider 2003), many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step. It works well on typical reinforcement learning benchmark problems.

am ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Reinforcement learning of motor skills in high dimensions: A path integral approach

Theodorou, E., Buchli, J., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 2397-2403, May 2010, clmc (inproceedings)

Abstract
Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensional continuous state-action spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal control theory and estimation theory, the update equations for learning are surprisingly simple and have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a robot dog illustrates the functionality of our algorithm in a real-world scenario. We believe that our new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse dynamics control of floating base systems using orthogonal decomposition

Mistry, M., Buchli, J., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 3406-3412, May 2010, clmc (inproceedings)

Abstract
Model-based control methods can be used to enable fast, dexterous, and compliant motion of robots without sacrificing control accuracy. However, implementing such techniques on floating base robots, e.g., humanoids and legged systems, is non-trivial due to under-actuation, dynamically changing constraints from the environment, and potentially closed loop kinematics. In this paper, we show how to compute the analytically correct inverse dynamics torques for model-based control of sufficiently constrained floating base rigid-body systems, such as humanoid robots with one or two feet in contact with the environment. While our previous inverse dynamics approach relied on an estimation of contact forces to compute an approximate inverse dynamics solution, here we present an analytically correct solution by using an orthogonal decomposition to project the robot dynamics onto a reduced dimensional space, independent of contact forces. We demonstrate the feasibility and robustness of our approach on a simulated floating base bipedal humanoid robot and an actual robot dog locomoting over rough terrain.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Fast, robust quadruped locomotion over challenging terrain

Kalakrishnan, M., Buchli, J., Pastor, P., Mistry, M., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 2665-2670, May 2010, clmc (inproceedings)

Abstract
We present a control architecture for fast quadruped locomotion over rough terrain. We approach the problem by decomposing it into many sub-systems, in which we apply state-of-the-art learning, planning, optimization and control techniques to achieve robust, fast locomotion. Unique features of our control strategy include: (1) a system that learns optimal foothold choices from expert demonstration using terrain templates, (2) a body trajectory optimizer based on the Zero-Moment Point (ZMP) stability criterion, and (3) a floating-base inverse dynamics controller that, in conjunction with force control, allows for robust, compliant locomotion over unperceived obstacles. We evaluate the performance of our controller by testing it on the LittleDog quadruped robot, over a wide variety of rough terrain of varying difficulty levels. We demonstrate the generalization ability of this controller by presenting test results from an independent external test team on terrains that have never been shown to us.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Policy learning algorithmis for motor learning (Algorithmen zum automatischen Erlernen von Motorfähigkigkeiten)

Peters, J., Kober, J., Schaal, S.

Automatisierungstechnik, 58(12):688-694, 2010, clmc (article)

Abstract
Robot learning methods which allow au- tonomous robots to adapt to novel situations have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. However, to date, learning techniques have yet to ful- fill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics. If possible, scaling was usually only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general ap- proach policy learning with the goal of an application to motor skill refinement in order to get one step closer towards human- like performance. For doing so, we study two major components for such an approach, i. e., firstly, we study policy learning algo- rithms which can be applied in the general setting of motor skill learning, and, secondly, we study a theoretically well-founded general approach to representing the required control structu- res for task representation and execution.

am

link (url) [BibTex]


no image
A Bayesian approach to nonlinear parameter identification for rigid-body dynamics

Ting, J., DSouza, A., Schaal, S.

Neural Networks, 2010, clmc (article)

Abstract
For complex robots such as humanoids, model-based control is highly beneficial for accurate tracking while keeping negative feedback gains low for compliance. However, in such multi degree-of-freedom lightweight systems, conventional identification of rigid body dynamics models using CAD data and actuator models is inaccurate due to unknown nonlinear robot dynamic effects. An alternative method is data-driven parameter estimation, but significant noise in measured and inferred variables affects it adversely. Moreover, standard estimation procedures may give physically inconsistent results due to unmodeled nonlinearities or insufficiently rich data. This paper addresses these problems, proposing a Bayesian system identification technique for linear or piecewise linear systems. Inspired by Factor Analysis regression, we develop a computationally efficient variational Bayesian regression algorithm that is robust to ill-conditioned data, automatically detects relevant features, and identifies input and output noise. We evaluate our approach on rigid body parameter estimation for various robotic systems, achieving an error of up to three times lower than other state-of-the-art machine learning methods.

am

link (url) [BibTex]


no image
A first optimal control solution for a complex, nonlinear, tendon driven neuromuscular finger model

Theodorou, E. A., Todorov, E., Valero-Cuevas, F.

Proceedings of the ASME 2010 Summer Bioengineering Conference August 30-September 2, 2010, Naples, Florida, USA, 2010, clmc (article)

Abstract
In this work we present the first constrained stochastic op- timal feedback controller applied to a fully nonlinear, tendon driven index finger model. Our model also takes into account an extensor mechanism, and muscle force-length and force-velocity properties. We show this feedback controller is robust to noise and perturbations to the dynamics, while successfully handling the nonlinearities and high dimensionality of the system. By ex- tending prior methods, we are able to approximate physiological realism by ensuring positivity of neural commands and tendon tensions at all timesthus can, for the first time, use the optimal control framework to predict biologically plausible tendon tensions for a nonlinear neuromuscular finger model. METHODS 1 Muscle Model The rigid-body triple pendulum finger model with slightly viscous joints is actuated by Hill-type muscle models. Joint torques are generated by the seven muscles of the index fin-

am

PDF [BibTex]

PDF [BibTex]


no image
Locally weighted regression for control

Ting, J., Vijayakumar, S., Schaal, S.

In Encyclopedia of Machine Learning, pages: 613-624, (Editors: Sammut, C.;Webb, G. I.), Springer, 2010, clmc (inbook)

Abstract
This is article addresses two topics: learning control and locally weighted regression.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Are reaching movements planned in kinematic or dynamic coordinates?

Ellmer, A., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2010), Naples, Florida, 2010, 2010, clmc (inproceedings)

Abstract
Whether human reaching movements are planned and optimized in kinematic (task space) or dynamic (joint or muscle space) coordinates is still an issue of debate. The first hypothesis implies that a planner produces a desired end-effector position at each point in time during the reaching movement, whereas the latter hypothesis includes the dynamics of the muscular-skeletal control system to produce a continuous end-effector trajectory. Previous work by Wolpert et al (1995) showed that when subjects were led to believe that their straight reaching paths corresponded to curved paths as shown on a computer screen, participants adapted the true path of their hand such that they would visually perceive a straight line in visual space, despite that they actually produced a curved path. These results were interpreted as supporting the stance that reaching trajectories are planned in kinematic coordinates. However, this experiment could only demonstrate that adaptation to altered paths, i.e. the position of the end-effector, did occur, but not that the precise timing of end-effector position was equally planned, i.e., the trajectory. Our current experiment aims at filling this gap by explicitly testing whether position over time, i.e. velocity, is a property of reaching movements that is planned in kinematic coordinates. In the current experiment, the velocity profiles of cursor movements corresponding to the participant's hand motions were skewed either to the left or to the right; the path itself was left unaltered. We developed an adaptation paradigm, where the skew of the velocity profile was introduced gradually and participants reported no awareness of any manipulation. Preliminary results indicate that the true hand motion of participants did not alter, i.e. there was no adaptation so as to counterbalance the introduced skew. However, for some participants, peak hand velocities were lowered for higher skews, which suggests that participants interpreted the manipulation as mere noise due to variance in their own movement. In summary, for a visuomotor transformation task, the hypothesis of a planned continuous end-effector trajectory predicts adaptation to a modified velocity profile. The current experiment found no systematic adaptation under such transformation, but did demonstrate an effect that is more in accordance that subjects could not perceive the manipulation and rather interpreted as an increase of noise.

am

[BibTex]

[BibTex]


no image
Optimality in Neuromuscular Systems

Theodorou, E. A., Valero-Cuevas, F.

In 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2010, clmc (inproceedings)

Abstract
Abstract? We provide an overview of optimal control meth- ods to nonlinear neuromuscular systems and discuss their lim- itations. Moreover we extend current optimal control methods to their application to neuromuscular models with realistically numerous musculotendons; as most prior work is limited to torque-driven systems. Recent work on computational motor control has explored the used of control theory and esti- mation as a conceptual tool to understand the underlying computational principles of neuromuscular systems. After all, successful biological systems regularly meet conditions for stability, robustness and performance for multiple classes of complex tasks. Among a variety of proposed control theory frameworks to explain this, stochastic optimal control has become a dominant framework to the point of being a standard computational technique to reproduce kinematic trajectories of reaching movements (see [12]) In particular, we demonstrate the application of optimal control to a neuromuscular model of the index finger with all seven musculotendons producing a tapping task. Our simu- lations include 1) a muscle model that includes force- length and force-velocity characteristics; 2) an anatomically plausible biomechanical model of the index finger that includes a tendi- nous network for the extensor mechanism and 3) a contact model that is based on a nonlinear spring-damper attached at the end effector of the index finger. We demonstrate that it is feasible to apply optimal control to systems with realistically large state vectors and conclude that, while optimal control is an adequate formalism to create computational models of neuro- musculoskeletal systems, there remain important challenges and limitations that need to be considered and overcome such as contact transitions, curse of dimensionality, and constraints on states and controls.

am

PDF [BibTex]

PDF [BibTex]


no image
Efficient learning and feature detection in high dimensional regression

Ting, J., D’Souza, A., Vijayakumar, S., Schaal, S.

Neural Computation, 22, pages: 831-886, 2010, clmc (article)

Abstract
We present a novel algorithm for efficient learning and feature selection in high- dimensional regression problems. We arrive at this model through a modification of the standard regression model, enabling us to derive a probabilistic version of the well-known statistical regression technique of backfitting. Using the Expectation- Maximization algorithm, along with variational approximation methods to overcome intractability, we extend our algorithm to include automatic relevance detection of the input features. This Variational Bayesian Least Squares (VBLS) approach retains its simplicity as a linear model, but offers a novel statistically robust â??black- boxâ? approach to generalized linear regression with high-dimensional inputs. It can be easily extended to nonlinear regression and classification problems. In particular, we derive the framework of sparse Bayesian learning, e.g., the Relevance Vector Machine, with VBLS at its core, offering significant computational and robustness advantages for this class of methods. We evaluate our algorithm on synthetic and neurophysiological data sets, as well as on standard regression and classification benchmark data sets, comparing it with other competitive statistical approaches and demonstrating its suitability as a drop-in replacement for other generalized linear regression techniques.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Stochastic Differential Dynamic Programming

Theodorou, E., Tassa, Y., Todorov, E.

In the proceedings of American Control Conference (ACC 2010) , 2010, clmc (article)

Abstract
We present a generalization of the classic Differential Dynamic Programming algorithm. We assume the existence of state- and control-dependent process noise, and proceed to derive the second-order expansion of the cost-to-go. Despite having quartic and cubic terms in the initial expression, we show that these vanish, leaving us with the same quadratic structure as standard DDP.

am

PDF [BibTex]

PDF [BibTex]


no image
Learning Policy Improvements with Path Integrals

Theodorou, E. A., Buchli, J., Schaal, S.

In International Conference on Artificial Intelligence and Statistics (AISTATS 2010), 2010, clmc (inproceedings)

Abstract
With the goal to generate more scalable algo- rithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classi- cal techniques from optimal control and dy- namic programming with modern learning techniques from statistical estimation the- ory. In this vein, this paper suggests the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parametrized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-Jacobi-Bellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path inte- gral which has no open parameters other than the exploration noise. The resulting algorithm can be conceived of as model- based, semi-model-based, or even model free, depending on how the learning problem is structured. Our new algorithm demon- strates interesting similarities with previous RL research in the framework of proba- bility matching and provides intuition why the slightly heuristically motivated proba- bility matching approach can actually per- form well. Empirical evaluations demon- strate significant performance improvements over gradient-based policy learning and scal- ability to high-dimensional control problems. We believe that Policy Improvement with Path Integrals (PI2) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs.

am

PDF [BibTex]

PDF [BibTex]


no image
Learning optimal control solutions: a path integral approach

Theodorou, E., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2010), Naples, Florida, 2010, 2010, clmc (inproceedings)

Abstract
Investigating principles of human motor control in the framework of optimal control has had a long tradition in neural control of movement, and has recently experienced a new surge of investigations. Ideally, optimal control problems are addresses as a reinforcement learning (RL) problem, which would allow to investigate both the process of acquiring an optimal control solution as well as the solution itself. Unfortunately, the applicability of RL to complex neural and biomechanics systems has been largely impossible so far due to the computational difficulties that arise in high dimensional continuous state-action spaces. As a way out, research has focussed on computing optimal control solutions based on iterative optimal control methods that are based on linear and quadratic approximations of dynamical models and cost functions. These methods require perfect knowledge of the dynamics and cost functions while they are based on gradient and Newton optimization schemes. Their applicability is also restricted to low dimensional problems due to problematic convergence in high dimensions. Moreover, the process of computing the optimal solution is removed from the learning process that might be plausible in biology. In this work, we present a new reinforcement learning method for learning optimal control solutions or motor control. This method, based on the framework of stochastic optimal control with path integrals, has a very solid theoretical foundation, while resulting in surprisingly simple learning algorithms. It is also possible to apply this approach without knowledge of the system model, and to use a wide variety of complex nonlinear cost functions for optimization. We illustrate the theoretical properties of this approach and its applicability to learning motor control tasks for reaching movements and locomotion studies. We discuss its applicability to learning desired trajectories, variable stiffness control (co-contraction), and parameterized control policies. We also investigate the applicability to signal dependent noise control systems. We believe that the suggested method offers one of the easiest to use approaches to learning optimal control suggested in the literature so far, which makes it ideally suited for computational investigations of biological motor control.

am

[BibTex]

[BibTex]


no image
Learning control in robotics – trajectory-based opitimal control techniques

Schaal, S., Atkeson, C. G.

Robotics and Automation Magazine, 17(2):20-29, 2010, clmc (article)

Abstract
In a not too distant future, robots will be a natural part of daily life in human society, providing assistance in many areas ranging from clinical applications, education and care giving, to normal household environments [1]. It is hard to imagine that all possible tasks can be preprogrammed in such robots. Robots need to be able to learn, either by themselves or with the help of human supervision. Additionally, wear and tear on robots in daily use needs to be automatically compensated for, which requires a form of continuous self-calibration, another form of learning. Finally, robots need to react to stochastic and dynamic environments, i.e., they need to learn how to optimally adapt to uncertainty and unforeseen changes. Robot learning is going to be a key ingredient for the future of autonomous robots. While robot learning covers a rather large field, from learning to perceive, to plan, to make decisions, etc., we will focus this review on topics of learning control, in particular, as it is concerned with learning control in simulated or actual physical robots. In general, learning control refers to the process of acquiring a control strategy for a particular control system and a particular task by trial and error. Learning control is usually distinguished from adaptive control [2] in that the learning system can have rather general optimization objectivesâ??not just, e.g., minimal tracking errorâ??and is permitted to fail during the process of learning, while adaptive control emphasizes fast convergence without failure. Thus, learning control resembles the way that humans and animals acquire new movement strategies, while adaptive control is a special case of learning control that fulfills stringent performance constraints, e.g., as needed in life-critical systems like airplanes. Learning control has been an active topic of research for at least three decades. However, given the lack of working robots that actually use learning components, more work needs to be done before robot learning will make it beyond the laboratory environment. This article will survey some ongoing and past activities in robot learning to assess where the field stands and where it is going. We will largely focus on nonwheeled robots and less on topics of state estimation, as typically explored in wheeled robots [3]â??6], and we emphasize learning in continuous state-action spaces rather than discrete state-action spaces [7], [8]. We will illustrate the different topics of robot learning with examples from our own research with anthropomorphic and humanoid robots.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Learning, planning, and control for quadruped locomotion over challenging terrain

Kalakrishnan, M., Buchli, J., Pastor, P., Mistry, M., Schaal, S.

International Journal of Robotics Research, 30(2):236-258, 2010, clmc (article)

Abstract
We present a control architecture for fast quadruped locomotion over rough terrain. We approach the problem by decomposing it into many sub-systems, in which we apply state-of-the-art learning, planning, optimization, and control techniques to achieve robust, fast locomotion. Unique features of our control strategy include: (1) a system that learns optimal foothold choices from expert demonstration using terrain templates, (2) a body trajectory optimizer based on the Zero- Moment Point (ZMP) stability criterion, and (3) a floating-base inverse dynamics controller that, in conjunction with force control, allows for robust, compliant locomotion over unperceived obstacles. We evaluate the performance of our controller by testing it on the LittleDog quadruped robot, over a wide variety of rough terrains of varying difficulty levels. The terrain that the robot was tested on includes rocks, logs, steps, barriers, and gaps, with obstacle sizes up to the leg length of the robot. We demonstrate the generalization ability of this controller by presenting results from testing performed by an independent external test team on terrain that has never been shown to us.

am

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
Constrained Accelerations for Controlled Geometric Reduction: Sagittal-Plane Decoupling for Bipedal Locomotion

Gregg, R., Righetti, L., Buchli, J., Schaal, S.

In 2010 10th IEEE-RAS International Conference on Humanoid Robots, pages: 1-7, IEEE, Nashville, USA, 2010 (inproceedings)

Abstract
Energy-shaping control methods have produced strong theoretical results for asymptotically stable 3D bipedal dynamic walking in the literature. In particular, geometric controlled reduction exploits robot symmetries to control momentum conservation laws that decouple the sagittal-plane dynamics, which are easier to stabilize. However, the associated control laws require high-dimensional matrix inverses multiplied with complicated energy-shaping terms, often making these control theories difficult to apply to highly-redundant humanoid robots. This paper presents a first step towards the application of energy-shaping methods on real robots by casting controlled reduction into a framework of constrained accelerations for inverse dynamics control. By representing momentum conservation laws as constraints in acceleration space, we construct a general expression for desired joint accelerations that render the constraint surface invariant. By appropriately choosing an orthogonal projection, we show that the unconstrained (reduced) dynamics are decoupled from the constrained dynamics. Any acceleration-based controller can then be used to stabilize this planar subsystem, including passivity-based methods. The resulting control law is surprisingly simple and represents a practical way to employ control theoretic stability results in robotic platforms. Simulated walking of a 3D compass-gait biped show correspondence between the new and original controllers, and simulated motions of a 16-DOF humanoid demonstrate the applicability of this method.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Variable impedance control - a reinforcement learning approach

Buchli, J., Theodorou, E., Stulp, F., Schaal, S.

In Robotics Science and Systems (2010), Zaragoza, Spain, June 27-30, 2010, clmc (inproceedings)

Abstract
One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high DOF robotic tasks. In this contribution, we accomplish such gain scheduling with a reinforcement learning approach algorithm, PI2 (Policy Improvement with Path Integrals). PI2 is a model-free, sampling based learning method derived from first principles of optimal control. The PI2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI2 is that it can scale to problems of many DOFs, so that RL on real robotic systems becomes feasible. We sketch the PI2 algorithm and its theoretical properties, and how it is applied to gain scheduling. We evaluate our approach by presenting results on two different simulated robotic systems, a 3-DOF Phantom Premium Robot and a 6-DOF Kuka Lightweight Robot. We investigate tasks where the optimal strategy requires both tuning of the impedance of the end-effector, and tuning of a reference trajectory. The results show that we can use path integral based RL not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse dynamics with optimal distribution of ground reaction forces for legged robot

Righetti, L., Buchli, J., Mistry, M., Schaal, S.

In Proceedings of the 13th International Conference on Climbing and Walking Robots (CLAWAR), pages: 580-587, Nagoya, Japan, sep 2010 (inproceedings)

Abstract
Contact interaction with the environment is crucial in the design of locomotion controllers for legged robots, to prevent slipping for example. Therefore, it is of great importance to be able to control the effects of the robots movements on the contact reaction forces. In this contribution, we extend a recent inverse dynamics algorithm for floating base robots to optimize the distribution of contact forces while achieving precise trajectory tracking. The resulting controller is algorithmically simple as compared to other approaches. Numerical simulations show that this result significantly increases the range of possible movements of a humanoid robot as compared to the previous inverse dynamics algorithm. We also present a simplification of the result where no inversion of the inertia matrix is needed which is particularly relevant for practical use on a real robot. Such an algorithm becomes interesting for agile locomotion of robots on difficult terrains where the contacts with the environment are critical, such as walking over rough or slippery terrain.

am mg

DOI [BibTex]

DOI [BibTex]

2009


no image
Path integral-based stochastic optimal control for rigid body dynamics

Theodorou, E. A., Buchli, J., Schaal, S.

In Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL ’09. IEEE Symposium on, pages: 219-225, 2009, clmc (inproceedings)

Abstract
Recent advances on path integral stochastic optimal control [1],[2] provide new insights in the optimal control of nonlinear stochastic systems which are linear in the controls, with state independent and time invariant control transition matrix. Under these assumptions, the Hamilton-Jacobi-Bellman (HJB) equation is formulated and linearized with the use of the logarithmic transformation of the optimal value function. The resulting HJB is a linear second order partial differential equation which is solved by an approximation based on the Feynman-Kac formula [3]. In this work we review the theory of path integral control and derive the linearized HJB equation for systems with state dependent control transition matrix. In addition we derive the path integral formulation for the general class of systems with state dimensionality that is higher than the dimensionality of the controls. Furthermore, by means of a modified inverse dynamics controller, we apply path integral stochastic optimal control over the new control space. Simulations illustrate the theoretical results. Future developments and extensions are discussed.

am

link (url) [BibTex]

2009


link (url) [BibTex]


no image
Learning locomotion over rough terrain using terrain templates

Kalakrishnan, M., Buchli, J., Pastor, P., Schaal, S.

In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages: 167-172, 2009, clmc (inproceedings)

Abstract
We address the problem of foothold selection in robotic legged locomotion over very rough terrain. The difficulty of the problem we address here is comparable to that of human rock-climbing, where foot/hand-hold selection is one of the most critical aspects. Previous work in this domain typically involves defining a reward function over footholds as a weighted linear combination of terrain features. However, a significant amount of effort needs to be spent in designing these features in order to model more complex decision functions, and hand-tuning their weights is not a trivial task. We propose the use of terrain templates, which are discretized height maps of the terrain under a foothold on different length scales, as an alternative to manually designed features. We describe an algorithm that can simultaneously learn a small set of templates and a foothold ranking function using these templates, from expert-demonstrated footholds. Using the LittleDog quadruped robot, we experimentally show that the use of terrain templates can produce complex ranking functions with higher performance than standard terrain features, and improved generalization to unseen terrain.

am

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


Valero-Cuevas, F., Hoffmann, H., Kurse, M. U., Kutch, J. J., Theodorou, E. A.

IEEE Reviews in Biomedical Engineering – (All authors have equally contributed), (2):110?135, 2009, clmc (article)

Abstract
Computational models of the neuromuscular system hold the potential to allow us to reach a deeper understanding of neuromuscular function and clinical rehabilitation by complementing experimentation. By serving as a means to distill and explore specific hypotheses, computational models emerge from prior experimental data and motivate future experimental work. Here we review computational tools used to understand neuromuscular function including musculoskeletal modeling, machine learning, control theory, and statistical model analysis. We conclude that these tools, when used in combination, have the potential to further our understanding of neuromuscular function by serving as a rigorous means to test scientific hypotheses in ways that complement and leverage experimental data.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Compact models of motor primitive variations for predictible reaching and obstacle avoidance

Stulp, F., Oztop, E., Pastor, P., Beetz, M., Schaal, S.

In IEEE-RAS International Conference on Humanoid Robots (Humanoids 2009), Paris, Dec.7-10, 2009, clmc (inproceedings)

Abstract
over and over again. This regularity allows humans and robots to reuse existing solutions for known recurring tasks. We expect that reusing a set of standard solutions to solve similar tasks will facilitate the design and on-line adaptation of the control systems of robots operating in human environments. In this paper, we derive a set of standard solutions for reaching behavior from human motion data. We also derive stereotypical reaching trajectories for variations of the task, in which obstacles are present. These stereotypical trajectories are then compactly represented with Dynamic Movement Primitives. On the humanoid robot Sarcos CB, this approach leads to reproducible, predictable, and human-like reaching motions.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Human optimization strategies under reward feedback

Hoffmann, H., Theodorou, E., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2009), Waikoloa, Hawaii, 2009, 2009, clmc (inproceedings)

Abstract
Many hypothesis on human movement generation have been cast into an optimization framework, implying that movements are adapted to optimize a single quantity, like, e.g., jerk, end-point variance, or control cost. However, we still do not understand how humans actually learn when given only a cost or reward feedback at the end of a movement. Such a reinforcement learning setting has been extensively explored theoretically in engineering and computer science, but in human movement control, hardly any experiment studied movement learning under reward feedback. We present experiments probing which computational strategies humans use to optimize a movement under a continuous reward function. We present two experimental paradigms. The first paradigm mimics a ball-hitting task. Subjects (n=12) sat in front of a computer screen and moved a stylus on a tablet towards an unknown target. This target was located on a line that the subjects had to cross. During the movement, visual feedback was suppressed. After the movement, a reward was displayed graphically as a colored bar. As reward, we used a Gaussian function of the distance between the target location and the point of line crossing. We chose such a function since in sensorimotor tasks, the cost or loss function that humans seem to represent is close to an inverted Gaussian function (Koerding and Wolpert 2004). The second paradigm mimics pocket billiards. On the same experimental setup as above, the computer screen displayed a pocket (two bars), a white disk, and a green disk. The goal was to hit with the white disk the green disk (as in a billiard collision), such that the green disk moved into the pocket. Subjects (n=8) manipulated with the stylus the white disk to effectively choose start point and movement direction. Reward feedback was implicitly given as hitting or missing the pocket with the green disk. In both paradigms, subjects increased the average reward over trials. The surprising result was that in these experiments, humans seem to prefer a strategy that uses a reward-weighted average over previous movements instead of gradient ascent. The literature on reinforcement learning is dominated by gradient-ascent methods. However, our computer simulations and theoretical analysis revealed that reward-weighted averaging is the more robust choice given the amount of movement variance observed in humans. Apparently, humans choose an optimization strategy that is suitable for their own movement variance.

am

[BibTex]

[BibTex]


no image
Bayesian Methods for Autonomous Learning Systems (Phd Thesis)

Ting, J.

Department of Computer Science, University of Southern California, Los Angeles, CA, 2009, clmc (phdthesis)

am

PDF [BibTex]

PDF [BibTex]


no image
On-line learning and modulation of periodic movements with nonlinear dynamical systems

Gams, A., Ijspeert, A., Schaal, S., Lenarčič, J.

Autonomous Robots, 27(1):3-23, 2009, clmc (article)

Abstract
Abstract  The paper presents a two-layered system for (1) learning and encoding a periodic signal without any knowledge on its frequency and waveform, and (2) modulating the learned periodic trajectory in response to external events. The system is used to learn periodic tasks on a humanoid HOAP-2 robot. The first layer of the system is a dynamical system responsible for extracting the fundamental frequency of the input signal, based on adaptive frequency oscillators. The second layer is a dynamical system responsible for learning of the waveform based on a built-in learning algorithm. By combining the two dynamical systems into one system we can rapidly teach new trajectories to robots without any knowledge of the frequency of the demonstration signal. The system extracts and learns only one period of the demonstration signal. Furthermore, the trajectories are robust to perturbations and can be modulated to cope with a dynamic environment. The system is computationally inexpensive, works on-line for any periodic signal, requires no additional signal processing to determine the frequency of the input signal and can be applied in parallel to multiple dimensions. Additionally, it can adapt to changes in frequency and shape, e.g. to non-stationary signals, such as hand-generated signals and human demonstrations.

am

link (url) [BibTex]

link (url) [BibTex]


no image
The SL simulation and real-time control software package

Schaal, S.

University of Southern California, Los Angeles, CA, 2009, clmc (techreport)

Abstract
SL was originally developed as a Simulation Laboratory software package to allow creating complex rigid-body dynamics simulations with minimal development times. It was meant to complement a real-time robotics setup such that robot programs could first be debugged in simulation before trying them on the actual robot. For this purpose, the motor control setup of SL was copied from our experience with real-time robot setups with vxWorks (Windriver Systems, Inc.)Ñindeed, more than 90% of the code is identical to the actual robot software, as will be explained later in detail. As a result, SL is divided into three software components: 1) the generic code that is shared by the actual robot and the simulation, 2) the robot specific code, and 3) the simulation specific code. The robot specific code is tailored to the robotic environments that we have experienced over the years, in particular towards VME-based multi-processor real-time operating systems. The simulation specific code has all the components for OpenGL graphics simulations and mimics the robot multi-processor environment in simple C-code. Importantly, SL can be used stand-alone for creating graphics an-imationsÑthe heritage from real-time robotics does not restrict the complexity of possible simulations. This technical report describes SL in detail and can serve as a manual for new users of SL.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Local dimensionality reduction for non-parametric regression

Hoffman, H., Schaal, S., Vijayakumar, S.

Neural Processing Letters, 2009, clmc (article)

Abstract
Locally-weighted regression is a computationally-efficient technique for non-linear regression. However, for high-dimensional data, this technique becomes numerically brittle and computationally too expensive if many local models need to be maintained simultaneously. Thus, local linear dimensionality reduction combined with locally-weighted regression seems to be a promising solution. In this context, we review linear dimensionality-reduction methods, compare their performance on nonparametric locally-linear regression, and discuss their ability to extend to incremental learning. The considered methods belong to the following three groups: (1) reducing dimensionality only on the input data, (2) modeling the joint input-output data distribution, and (3) optimizing the correlation between projection directions and output data. Group 1 contains principal component regression (PCR); group 2 contains principal component analysis (PCA) in joint input and output space, factor analysis, and probabilistic PCA; and group 3 contains reduced rank regression (RRR) and partial least squares (PLS) regression. Among the tested methods, only group 3 managed to achieve robust performance even for a non-optimal number of components (factors or projection directions). In contrast, group 1 and 2 failed for fewer components since these methods rely on the correct estimate of the true intrinsic dimensionality. In group 3, PLS is the only method for which a computationally-efficient incremental implementation exists. Thus, PLS appears to be ideally suited as a building block for a locally-weighted regressor in which projection directions are incrementally added on the fly.

am

link (url) [BibTex]

link (url) [BibTex]


no image
The SL simulation and real-time control software package

Schaal, S.

University of Southern California, Los Angeles, CA, 2009, clmc (techreport)

Abstract
SL was originally developed as a Simulation Laboratory software package to allow creating complex rigid-body dynamics simulations with minimal development times. It was meant to complement a real-time robotics setup such that robot programs could first be debugged in simulation before trying them on the actual robot. For this purpose, the motor control setup of SL was copied from our experience with real-time robot setups with vxWorks (Windriver Systems, Inc.)â??indeed, more than 90% of the code is identical to the actual robot software, as will be explained later in detail. As a result, SL is divided into three software components: 1) the generic code that is shared by the actual robot and the simulation, 2) the robot specific code, and 3) the simulation specific code. The robot specific code is tailored to the robotic environments that we have experienced over the years, in particular towards VME-based multi-processor real-time operating systems. The simulation specific code has all the components for OpenGL graphics simulations and mimics the robot multi-processor environment in simple C-code. Importantly, SL can be used stand-alone for creating graphics an-imationsâ??the heritage from real-time robotics does not restrict the complexity of possible simulations. This technical report describes SL in detail and can serve as a manual for new users of SL.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Learning and generalization of motor skills by learning from demonstration

Pastor, P., Hoffmann, H., Asfour, T., Schaal, S.

In International Conference on Robotics and Automation (ICRA2009), Kobe, Japan, May 12-19, 2009, 2009, clmc (inproceedings)

Abstract
We provide a general approach for learning robotic motor skills from human demonstration. To represent an observed movement, a non-linear differential equation is learned such that it reproduces this movement. Based on this representation, we build a library of movements by labeling each recorded movement according to task and context (e.g., grasping, placing, and releasing). Our differential equation is formulated such that generalization can be achieved simply by adapting a start and a goal parameter in the equation to the desired position values of a movement. For object manipulation, we present how our framework extends to the control of gripper orientation and finger position. The feasibility of our approach is demonstrated in simulation as well as on a real robot. The robot learned a pick-and-place operation and a water-serving task and could generalize these tasks to novel situations.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Compliant quadruped locomotion over rough terrain

Buchli, J., Kalakrishnan, M., Mistry, M., Pastor, P., Schaal, S.

In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages: 814-820, 2009, clmc (inproceedings)

Abstract
Many critical elements for statically stable walking for legged robots have been known for a long time, including stability criteria based on support polygons, good foothold selection, recovery strategies to name a few. All these criteria have to be accounted for in the planning as well as the control phase. Most legged robots usually employ high gain position control, which means that it is crucially important that the planned reference trajectories are a good match for the actual terrain, and that tracking is accurate. Such an approach leads to conservative controllers, i.e. relatively low speed, ground speed matching, etc. Not surprisingly such controllers are not very robust - they are not suited for the real world use outside of the laboratory where the knowledge of the world is limited and error prone. Thus, to achieve robust robotic locomotion in the archetypical domain of legged systems, namely complex rough terrain, where the size of the obstacles are in the order of leg length, additional elements are required. A possible solution to improve the robustness of legged locomotion is to maximize the compliance of the controller. While compliance is trivially achieved by reduced feedback gains, for terrain requiring precise foot placement (e.g. climbing rocks, walking over pegs or cracks) compliance cannot be introduced at the cost of inferior tracking. Thus, model-based control and - in contrast to passive dynamic walkers - active balance control is required. To achieve these objectives, in this paper we add two crucial elements to legged locomotion, i.e., floating-base inverse dynamics control and predictive force control, and we show that these elements increase robustness in face of unknown and unanticipated perturbations (e.g. obstacles). Furthermore, we introduce a novel line-based COG trajectory planner, which yields a simpler algorithm than traditional polygon based methods and creates the appropriate input to our control system.We show results from bot- h simulation and real world of a robotic dog walking over non-perceived obstacles and rocky terrain. The results prove the effectivity of the inverse dynamics/force controller. The presented results show that we have all elements needed for robust all-terrain locomotion, which should also generalize to other legged systems, e.g., humanoid robots.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Incorporating Muscle Activation-Contraction dynamics to an optimal control framework for finger movements

Theodorou, Evangelos A., Valero-Cuevas, Francisco J.

Abstracts of Neural Control of Movement Conference (NCM 2009), 2009, clmc (article)

Abstract
Recent experimental and theoretical work [1] investigated the neural control of contact transition between motion and force during tapping with the index finger as a nonlinear optimization problem. Such transitions from motion to well-directed contact force are a fundamental part of dexterous manipulation. There are 3 alternative hypotheses of how this transition could be accomplished by the nervous system as a function of changes in direction and magnitude of the torque vector controlling the finger. These hypotheses are 1) an initial change in direction with a subsequent change in magnitude of the torque vector; 2) an initial change in magnitude with a subsequent directional change of the torque vector; and 3) a simultaneous and proportionally equal change of both direction and magnitude of the torque vector. Experimental work in [2] shows that the nervous system selects the first strategy, and in [1] we suggest that this may in fact be the optimal strategy. In [4] the framework of Iterative Linear Quadratic Optimal Regulator (ILQR) was extended to incorporate motion and force control. However, our prior simulation work assumed direct and instantaneous control of joint torques, which ignores the known delays and filtering properties of skeletal muscle. In this study, we implement an ILQR controller for a more biologically plausible biomechanical model of the index finger than [4], and add activation-contraction dynamics to the system to simulate muscle function. The planar biomechanical model includes the kinematics of the 3 joints while the applied torques are driven by activation?contraction dynamics with biologically plausible time constants [3]. In agreement with our experimental work [2], the task is to, within 500 ms, move the finger from a given resting configuration to target configuration with a desired terminal velocity. ILQR does not only stabilize the finger dynamics according to the objective function, but it also generates smooth joint space trajectories with minimal tuning and without an a-priori initial control policy (which is difficult to find for highly dimensional biomechanical systems). Furthemore, the use of this optimal control framework and the addition of activation-contraction dynamics considers the full nonlinear dynamics of the index finger and produces a sequence of postures which are compatible with experimental motion data [2]. These simulations combined with prior experimental results suggest that optimal control is a strong candidate for the generation of finger movements prior to abrupt motion-to-force transitions. This work is funded in part by grants NIH R01 0505520 and NSF EFRI-0836042 to Dr. Francisco J. Valero- Cuevas 1 Venkadesan M, Valero-Cuevas FJ. 
Effects of neuromuscular lags on controlling contact transitions. 
Philosophical Transactions of the Royal Society A: 2008. 2 Venkadesan M, Valero-Cuevas FJ. 
Neural Control of Motion-to-Force Transitions with the Fingertip. 
J. Neurosci., Feb 2008; 28: 1366 - 1373; 3 Zajac. Muscle and tendon: properties, models, scaling, and application to biomechanics and motor control. Crit Rev Biomed Eng, 17 4. Weiwei Li., Francisco Valero Cuevas: ?Linear Quadratic Optimal Control of Contact Transition with Fingertip ? ACC 2009

am

PDF [BibTex]

PDF [BibTex]


no image
Inertial parameter estimation of floating-base humanoid systems using partial force sensing

Mistry, M., Schaal, S., Yamane, K.

In IEEE-RAS International Conference on Humanoid Robots (Humanoids 2009), Paris, Dec.7-10, 2009, clmc (inproceedings)

Abstract
Recently, several controllers have been proposed for humanoid robots which rely on full-body dynamic models. The estimation of inertial parameters from data is a critical component for obtaining accurate models for control. However, floating base systems, such as humanoid robots, incur added challenges to this task (e.g. contact forces must be measured, contact states can change, etc.) In this work, we outline a theoretical framework for whole body inertial parameter estimation, including the unactuated floating base. Using a least squares minimization approach, conducted within the nullspace of unmeasured degrees of freedom, we are able to use a partial force sensor set for full-body estimation, e.g. using only joint torque sensors, allowing for estimation when contact force measurement is unavailable or unreliable (e.g. due to slipping, rolling contacts, etc.). We also propose how to determine the theoretical minimum force sensor set for full body estimation, and discuss the practical limitations of doing so.

am

link (url) [BibTex]

link (url) [BibTex]


no image
On-line learning and modulation of periodic movements with nonlinear dynamical systems

Gams, A., Ijspeert, A., Schaal, S., Lenarčič, J.

Autonomous Robots, 27(1):3-23, 2009, clmc (article)

Abstract
Abstract  The paper presents a two-layered system for (1) learning and encoding a periodic signal without any knowledge on its frequency and waveform, and (2) modulating the learned periodic trajectory in response to external events. The system is used to learn periodic tasks on a humanoid HOAP-2 robot. The first layer of the system is a dynamical system responsible for extracting the fundamental frequency of the input signal, based on adaptive frequency oscillators. The second layer is a dynamical system responsible for learning of the waveform based on a built-in learning algorithm. By combining the two dynamical systems into one system we can rapidly teach new trajectories to robots without any knowledge of the frequency of the demonstration signal. The system extracts and learns only one period of the demonstration signal. Furthermore, the trajectories are robust to perturbations and can be modulated to cope with a dynamic environment. The system is computationally inexpensive, works on-line for any periodic signal, requires no additional signal processing to determine the frequency of the input signal and can be applied in parallel to multiple dimensions. Additionally, it can adapt to changes in frequency and shape, e.g. to non-stationary signals, such as hand-generated signals and human demonstrations.

am

link (url) [BibTex]

link (url) [BibTex]