Header logo is

Solving Deep Memory POMDPs with Recurrent Policy Gradients


Conference Paper


This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a “Long Short-Term Memory” architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.

Author(s): Wierstra, D. and Förster, A. and Peters, J. and Schmidhuber, J.
Book Title: ICANN‘07
Journal: Artificial Neural Networks: ICANN 2007
Pages: 697-706
Year: 2007
Month: September
Day: 0
Publisher: Springer

Department(s): Empirical Inference
Bibtex Type: Conference Paper (inproceedings)

DOI: 10.1007/978-3-540-74690-4_71
Event Name: International Conference on Artificial Neural Networks
Event Place: Porto, Portugal

Address: Berlin, Germany
Digital: 0
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik

Links: PDF


  title = {Solving Deep Memory POMDPs with Recurrent Policy Gradients},
  author = {Wierstra, D. and F{\"o}rster, A. and Peters, J. and Schmidhuber, J.},
  journal = {Artificial Neural Networks: ICANN 2007},
  booktitle = {ICANN‘07},
  pages = {697-706},
  publisher = {Springer},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Berlin, Germany},
  month = sep,
  year = {2007},
  month_numeric = {9}