Header logo is de


2005


no image
Natural Actor-Critic

Peters, J., Vijayakumar, S., Schaal, S.

In Proceedings of the 16th European Conference on Machine Learning, 3720, pages: 280-291, (Editors: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L.), Springer, ECML, 2005, clmc (inproceedings)

Abstract
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing AmariÕs natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regres- sion. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and BradtkeÕs Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Em- pirical evaluations illustrate the effectiveness of our techniques in com- parison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

am ei

link (url) DOI [BibTex]

2005


link (url) DOI [BibTex]


no image
Comparative experiments on task space control with redundancy resolution

Nakanishi, J., Cory, R., Mistry, M., Peters, J., Schaal, S.

In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3901-3908, Edmonton, Alberta, Canada, Aug. 2-6, IROS, 2005, clmc (inproceedings)

Abstract
Understanding the principles of motor coordination with redundant degrees of freedom still remains a challenging problem, particularly for new research in highly redundant robots like humanoids. Even after more than a decade of research, task space control with redundacy resolution still remains an incompletely understood theoretical topic, and also lacks a larger body of thorough experimental investigation on complex robotic systems. This paper presents our first steps towards the development of a working redundancy resolution algorithm which is robust against modeling errors and unforeseen disturbances arising from contact forces. To gain a better understanding of the pros and cons of different approaches to redundancy resolution, we focus on a comparative empirical evaluation. First, we review several redundancy resolution schemes at the velocity, acceleration and torque levels presented in the literature in a common notational framework and also introduce some new variants of these previous approaches. Second, we present experimental comparisons of these approaches on a seven-degree-of-freedom anthropomorphic robot arm. Surprisingly, one of our simplest algorithms empirically demonstrates the best performance, despite, from a theoretical point, the algorithm does not share the same beauty as some of the other methods. Finally, we discuss practical properties of these control algorithms, particularly in light of inevitable modeling errors of the robot dynamics.

am ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares

Ting, J., D’Souza, A., Yamamoto, K., Yoshioka, T., Hoffman, D., Kakei, S., Sergio, L., Kalaska, J., Kawato, M., Strick, P., Schaal, S.

In Advances in Neural Information Processing Systems 18 (NIPS 2005), (Editors: Weiss, Y.;Schölkopf, B.;Platt, J.), Cambridge, MA: MIT Press, Vancouver, BC, Dec. 6-11, 2005, clmc (inproceedings)

Abstract
An increasing number of projects in neuroscience requires the statistical analysis of high dimensional data sets, as, for instance, in predicting behavior from neural firing, or in operating artificial devices from brain recordings in brain-machine interfaces. Linear analysis techniques remain prevalent in such cases, but classi-cal linear regression approaches are often numercially too fragile in high dimen-sions. In this paper, we address the question of whether EMG data collected from arm movements of monkeys can be faithfully reconstructed with linear ap-proaches from neural activity in primary motor cortex (M1). To achieve robust data analysis, we develop a full Bayesian approach to linear regression that automatically detects and excludes irrelevant features in the data, and regular-izes against overfitting. In comparison with ordinary least squares, stepwise re-gression, partial least squares, and a brute force combinatorial search for the most predictive input features in the data, we demonstrate that the new Bayesian method offers a superior mixture of characteristics in terms of regularization against overfitting, computational efficiency, and ease of use, demonstrating its potential as a drop-in replacement for other linear regression techniques. As neuroscientific results, our analyses demonstrate that EMG data can be well pre-dicted from M1 neurons, further opening the path for possible real-time inter-faces between brains and machines.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Rapbid synchronization and accurate phase-locking of rhythmic motor primitives

Pongas, D., Billard, A., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2005), pages: 2911-2916, Edmonton, Alberta, Canada, Aug. 2-6, 2005, clmc (inproceedings)

Abstract
Rhythmic movement is ubiquitous in human and animal behavior, e.g., as in locomotion, dancing, swimming, chewing, scratching, music playing, etc. A particular feature of rhythmic movement in biology is the rapid synchronization and phase locking with other rhythmic events in the environment, for instance music or visual stimuli as in ball juggling. In traditional oscillator theories to rhythmic movement generation, synchronization with another signal is relatively slow, and it is not easy to achieve accurate phase locking with a particular feature of the driving stimulus. Using a recently developed framework of dynamic motor primitives, we demonstrate a novel algorithm for very rapid synchronizaton of a rhythmic movement pattern, which can phase lock any feature of the movement to any particulur event in the driving stimulus. As an example application, we demonstrate how an anthropomorphic robot can use imitation learning to acquire a complex rumming pattern and keep it synchronized with an external rhythm generator that changes its frequency over time.

am

link (url) [BibTex]

link (url) [BibTex]


no image
A new methodology for robot control design

Peters, J., Mistry, M., Udwadia, F. E., Schaal, S.

In The 5th ASME International Conference on Multibody Systems, Nonlinear Dynamics, and Control (MSNDC 2005), Long Beach, CA, Sept. 24-28, 2005, clmc (inproceedings)

Abstract
Gauss principle of least constraint and its generalizations have provided a useful insights for the development of tracking controllers for mechanical systems (Udwadia,2003). Using this concept, we present a novel methodology for the design of a specific class of robot controllers. With our new framework, we demonstrate that well-known and also several novel nonlinear robot control laws can be derived from this generic framework, and show experimental verifications on a Sarcos Master Arm robot for some of these controllers. We believe that the suggested approach unifies and simplifies the design of optimal nonlinear control laws for robots obeying rigid body dynamics equations, both with or without external constraints, holonomic or nonholonomic constraints, with over-actuation or underactuation, as well as open-chain and closed-chain kinematics.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Arm movement experiments with joint space force fields using an exoskeleton robot

Mistry, M., Mohajerian, P., Schaal, S.

In IEEE Ninth International Conference on Rehabilitation Robotics, pages: 408-413, Chicago, Illinois, June 28-July 1, 2005, clmc (inproceedings)

Abstract
A new experimental platform permits us to study a novel variety of issues of human motor control, particularly full 3-D movements involving the major seven degrees-of-freedom (DOF) of the human arm. We incorporate a seven DOF robot exoskeleton, and can minimize weight and inertia through gravity, Coriolis, and inertia compensation, such that subjects' arm movements are largely unaffected by the manipulandum. Torque perturbations can be individually applied to any or all seven joints of the human arm, thus creating novel dynamic environments, or force fields, for subjects to respond and adapt to. Our first study investigates a joint space force field where the shoulder velocity drives a disturbing force in the elbow joint. Results demonstrate that subjects learn to compensate for the force field within about 100 trials, and from the strong presence of aftereffects when removing the field in some randomized catch trials, that an inverse dynamics, or internal model, of the force field is formed by the nervous system. Interestingly, while post-learning hand trajectories return to baseline, joint space trajectories remained changed in response to the field, indicating that besides learning a model of the force field, the nervous system also chose to exploit the space to minimize the effects of the force field on the realization of the endpoint trajectory plan. Further applications for our apparatus include studies in motor system redundancy resolution and inverse kinematics, as well as rehabilitation.

am

link (url) [BibTex]

link (url) [BibTex]


no image
A unifying framework for the control of robotics systems

Peters, J., Mistry, M., Udwadia, F. E., Cory, R., Nakanishi, J., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2005), pages: 1824-1831, Edmonton, Alberta, Canada, Aug. 2-6, 2005, clmc (inproceedings)

Abstract
Recently, [1] suggested to derive tracking controllers for mechanical systems using a generalization of GaussÕ principle of least constraint. This method al-lows us to reformulate control problems as a special class of optimal control. We take this line of reasoning one step further and demonstrate that well-known and also several novel nonlinear robot control laws can be derived from this generic methodology. We show experimental verifications on a Sar-cos Master Arm robot for some of the the derived controllers.We believe that the suggested approach offers a promising unification and simplification of nonlinear control law design for robots obeying rigid body dynamics equa-tions, both with or without external constraints, with over-actuation or under-actuation, as well as open-chain and closed-chain kinematics.

am

link (url) [BibTex]

link (url) [BibTex]

2004


no image
Learning Composite Adaptive Control for a Class of Nonlinear Systems

Nakanishi, J., Farrell, J. A., Schaal, S.

In IEEE International Conference on Robotics and Automation, pages: 2647-2652, New Orleans, LA, USA, April 2004, 2004, clmc (inproceedings)

am

link (url) [BibTex]

2004


link (url) [BibTex]


no image
A framework for learning biped locomotion with dynamic movement primitives

Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., Kawato, M.

In IEEE-RAS/RSJ International Conference on Humanoid Robots (Humanoids 2004), IEEE, Los Angeles, CA: Nov.10-12, Santa Monica, CA, 2004, clmc (inproceedings)

Abstract
This article summarizes our framework for learning biped locomotion using dynamical movement primitives based on nonlinear oscillators. Our ultimate goal is to establish a design principle of a controller in order to achieve natural human-like locomotion. We suggest dynamical movement primitives as a central pattern generator (CPG) of a biped robot, an approach we have previously proposed for learning and encoding complex human movements. Demonstrated trajectories are learned through movement primitives by locally weighted regression, and the frequency of the learned trajectories is adjusted automatically by a frequency adaptation algorithm based on phase resetting and entrainment of coupled oscillators. Numerical simulations and experimental implementation on a physical robot demonstrate the effectiveness of the proposed locomotion controller. Furthermore, we demonstrate that phase resetting contributes to robustness against external perturbations and environmental changes by numerical simulations and experiments.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Learning Motor Primitives with Reinforcement Learning

Peters, J., Schaal, S.

In Proceedings of the 11th Joint Symposium on Neural Computation, http://resolver.caltech.edu/CaltechJSNC:2004.poster020, 2004, clmc (inproceedings)

Abstract
One of the major challenges in action generation for robotics and in the understanding of human motor control is to learn the "building blocks of move- ment generation," or more precisely, motor primitives. Recently, Ijspeert et al. [1, 2] suggested a novel framework how to use nonlinear dynamical systems as motor primitives. While a lot of progress has been made in teaching these mo- tor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this poster, we evaluate different reinforcement learning approaches can be used in order to improve the performance of motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and line out how these lead to a novel algorithm which is based on natural policy gradients [3]. We compare this algorithm to previous reinforcement learning algorithms in the context of dynamic motor primitive learning, and show that it outperforms these by at least an order of magnitude. We demonstrate the efficiency of the resulting reinforcement learning method for creating complex behaviors for automous robotics. The studied behaviors will include both discrete, finite tasks such as baseball swings, as well as complex rhythmic patterns as they occur in biped locomotion

am

[BibTex]

[BibTex]

2001


no image
Humanoid oculomotor control based on concepts of computational neuroscience

Shibata, T., Vijayakumar, S., Conradt, J., Schaal, S.

In Humanoids2001, Second IEEE-RAS International Conference on Humanoid Robots, 2001, clmc (inproceedings)

Abstract
Oculomotor control in a humanoid robot faces similar problems as biological oculomotor systems, i.e., the stabilization of gaze in face of unknown perturbations of the body, selective attention, the complexity of stereo vision and dealing with large information processing delays. In this paper, we suggest control circuits to realize three of the most basic oculomotor behaviors - the vestibulo-ocular and optokinetic reflex (VOR-OKR) for gaze stabilization, smooth pursuit for tracking moving objects, and saccades for overt visual attention. Each of these behaviors was derived from inspirations from computational neuroscience, which proves to be a viable strategy to explore novel control mechanisms for humanoid robotics. Our implementations on a humanoid robot demonstrate good performance of the oculomotor behaviors that appears natural and human-like.

am

link (url) [BibTex]

2001


link (url) [BibTex]


no image
Trajectory formation for imitation with nonlinear dynamical systems

Ijspeert, A., Nakanishi, J., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), pages: 752-757, Weilea, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
This article explores a new approach to learning by imitation and trajectory formation by representing movements as mixtures of nonlinear differential equations with well-defined attractor dynamics. An observed movement is approximated by finding a best fit of the mixture model to its data by a recursive least squares regression technique. In contrast to non-autonomous movement representations like splines, the resultant movement plan remains an autonomous set of nonlinear differential equations that forms a control policy which is robust to strong external perturbations and that can be modified by additional perceptual variables. This movement policy remains the same for a given target, regardless of the initial conditions, and can easily be re-used for new targets. We evaluate the trajectory formation system (TFS) in the context of a humanoid robot simulation that is part of the Virtual Trainer (VT) project, which aims at supervising rehabilitation exercises in stroke-patients. A typical rehabilitation exercise was collected with a Sarcos Sensuit, a device to record joint angular movement from human subjects, and approximated and reproduced with our imitation techniques. Our results demonstrate that multi-joint human movements can be encoded successfully, and that this system allows robust modifications of the movement policy through external variables.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Real-time statistical learning for robotics and human augmentation

Schaal, S., Vijayakumar, S., D’Souza, A., Ijspeert, A., Nakanishi, J.

In International Symposium on Robotics Research, (Editors: Jarvis, R. A.;Zelinsky, A.), Lorne, Victoria, Austrialia Nov.9-12, 2001, clmc (inproceedings)

Abstract
Real-time modeling of complex nonlinear dynamic processes has become increasingly important in various areas of robotics and human augmentation. To address such problems, we have been developing special statistical learning methods that meet the demands of on-line learning, in particular the need for low computational complexity, rapid learning, and scalability to high-dimensional spaces. In this paper, we introduce a novel algorithm that possesses all the necessary properties by combining methods from probabilistic and nonparametric learning. We demonstrate the applicability of our methods for three different applications in humanoid robotics, i.e., the on-line learning of a full-body inverse dynamics model, an inverse kinematics model, and imitation learning. The latter application will also introduce a novel method to shape attractor landscapes of dynamical system by means of statis-tical learning.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Robust learning of arm trajectories through human demonstration

Billard, A., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), Piscataway, NJ: IEEE, Maui, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
We present a model, composed of hierarchy of artificial neural networks, for robot learning by demonstration. The model is implemented in a dynamic simulation of a 41 degrees of freedom humanoid for reproducing 3D human motion of the arm. Results show that the model requires few information about the desired trajectory and learns on-line the relevant features of movement. It can generalize across a small set of data to produce a qualitatively good reproduction of the demonstrated trajectory. Finally, it is shown that reproduction of the trajectory after learning is robust against perturbations.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Overt visual attention for a humanoid robot

Vijayakumar, S., Conradt, J., Shibata, T., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), 2001, clmc (inproceedings)

Abstract
The goal of our research is to investigate the interplay between oculomotor control, visual processing, and limb control in humans and primates by exploring the computational issues of these processes with a biologically inspired artificial oculomotor system on an anthropomorphic robot. In this paper, we investigate the computational mechanisms for visual attention in such a system. Stimuli in the environment excite a dynamical neural network that implements a saliency map, i.e., a winner-take-all competition between stimuli while simultenously smoothing out noise and suppressing irrelevant inputs. In real-time, this system computes new targets for the shift of gaze, executed by the head-eye system of the robot. The redundant degrees-of- freedom of the head-eye system are resolved through a learned inverse kinematics with optimization criterion. We also address important issues how to ensure that the coordinate system of the saliency map remains correct after movement of the robot. The presented attention system is built on principled modules and generally applicable for any sensory modality.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Learning inverse kinematics

D’Souza, A., Vijayakumar, S., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), Piscataway, NJ: IEEE, Maui, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
Real-time control of the endeffector of a humanoid robot in external coordinates requires computationally efficient solutions of the inverse kinematics problem. In this context, this paper investigates learning of inverse kinematics for resolved motion rate control (RMRC) employing an optimization criterion to resolve kinematic redundancies. Our learning approach is based on the key observations that learning an inverse of a non uniquely invertible function can be accomplished by augmenting the input representation to the inverse model and by using a spatially localized learning approach. We apply this strategy to inverse kinematics learning and demonstrate how a recently developed statistical learning algorithm, Locally Weighted Projection Regression, allows efficient learning of inverse kinematic mappings in an incremental fashion even when input spaces become rather high dimensional. The resulting performance of the inverse kinematics is comparable to Liegeois ([1]) analytical pseudo inverse with optimization. Our results are illustrated with a 30 degree-of-freedom humanoid robot.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Biomimetic smooth pursuit based on fast learning of the target dynamics

Shibata, T., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), 2001, clmc (inproceedings)

Abstract
Following a moving target with a narrow-view foveal vision system is one of the essential oculomotor behaviors of humans and humanoids. This oculomotor behavior, called ``Smooth Pursuit'', requires accurate tracking control which cannot be achieved by a simple visual negative feedback controller due to the significant delays in visual information processing. In this paper, we present a biologically inspired and control theoretically sound smooth pursuit controller consisting of two cascaded subsystems. One is an inverse model controller for the oculomotor system, and the other is a learning controller for the dynamics of the visual target. The latter controller learns how to predict the target's motion in head coordinates such that tracking performance can be improved. We investigate our smooth pursuit system in simulations and experiments on a humanoid robot. By using a fast on-line statistical learning network, our humanoid oculomotor system is able to acquire high performance smooth pursuit after about 5 seconds of learning despite significant processing delays in the syste

am

link (url) [BibTex]

link (url) [BibTex]