Header logo is


2017


Thumb xl fig toyex lqr1kernel 1
On the Design of LQR Kernels for Efficient Controller Learning

Marco, A., Hennig, P., Schaal, S., Trimpe, S.

Proceedings of the 56th IEEE Annual Conference on Decision and Control (CDC), pages: 5193-5200, IEEE, IEEE Conference on Decision and Control, December 2017 (conference)

Abstract
Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a first-order system, we construct two kernels that specifically leverage the structure of the well-known Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance.

am ics pn

arXiv PDF On the Design of LQR Kernels for Efficient Controller Learning - CDC presentation DOI Project Page [BibTex]

2017


arXiv PDF On the Design of LQR Kernels for Efficient Controller Learning - CDC presentation DOI Project Page [BibTex]


Thumb xl probls sketch n3 0 ei0
Probabilistic Line Searches for Stochastic Optimization

Mahsereci, M., Hennig, P.

Journal of Machine Learning Research, 18(119):1-59, November 2017 (article)

pn

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


Thumb xl teaser
Coupling Adaptive Batch Sizes with Learning Rates

Balles, L., Romero, J., Hennig, P.

In Proceedings Conference on Uncertainty in Artificial Intelligence (UAI) 2017, pages: 410-419, (Editors: Gal Elidan and Kristian Kersting), Association for Uncertainty in Artificial Intelligence (AUAI), Conference on Uncertainty in Artificial Intelligence (UAI), August 2017 (inproceedings)

Abstract
Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple empirical inspection. The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance of the gradient estimates. This variance also changes over the optimization process; when using a constant batch size, stability and convergence is thus often enforced by means of a (manually tuned) decreasing learning rate schedule. We propose a practical method for dynamic batch size adaptation. It estimates the variance of the stochastic gradients and adapts the batch size to decrease the variance proportionally to the value of the objective function, removing the need for the aforementioned learning rate decrease. In contrast to recent related work, our algorithm couples the batch size to the learning rate, directly reflecting the known relationship between the two. On three image classification benchmarks, our batch size adaptation yields faster optimization convergence, while simultaneously simplifying learning rate tuning. A TensorFlow implementation is available.

ps pn

Code link (url) Project Page [BibTex]

Code link (url) Project Page [BibTex]


no image
Dynamic Time-of-Flight

Schober, M., Adam, A., Yair, O., Mazor, S., Nowozin, S.

Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 170-179, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (conference)

ei pn

DOI [BibTex]

DOI [BibTex]


Thumb xl this one
Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization

Marco, A., Berkenkamp, F., Hennig, P., Schoellig, A. P., Krause, A., Schaal, S., Trimpe, S.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 1557-1563, IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (inproceedings)

am ics pn

PDF arXiv ICRA 2017 Spotlight presentation Virtual vs. Real - Video explanation DOI Project Page [BibTex]

PDF arXiv ICRA 2017 Spotlight presentation Virtual vs. Real - Video explanation DOI Project Page [BibTex]


Thumb xl screen shot 2017 07 20 at 12.31.00 pm
Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets

Klein, A., Falkner, S., Bartels, S., Hennig, P., Hutter, F.

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017), 54, pages: 528-536, Proceedings of Machine Learning Research, (Editors: Sign, Aarti and Zhu, Jerry), PMLR, April 2017 (conference)

pn

pdf link (url) Project Page [BibTex]

pdf link (url) Project Page [BibTex]


Thumb xl early stopping teaser
Early Stopping Without a Validation Set

Mahsereci, M., Balles, L., Lassner, C., Hennig, P.

arXiv preprint arXiv:1703.09580, 2017 (article)

Abstract
Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.

ps pn

link (url) Project Page Project Page [BibTex]


no image
Krylov Subspace Recycling for Fast Iterative Least-Squares in Machine Learning

Roos, F. D., Hennig, P.

arXiv preprint arXiv:1706.00241, 2017 (article)

Abstract
Solving symmetric positive definite linear problems is a fundamental computational task in machine learning. The exact solution, famously, is cubicly expensive in the size of the matrix. To alleviate this problem, several linear-time approximations, such as spectral and inducing-point methods, have been suggested and are now in wide use. These are low-rank approximations that choose the low-rank space a priori and do not refine it over time. While this allows linear cost in the data-set size, it also causes a finite, uncorrected approximation error. Authors from numerical linear algebra have explored ways to iteratively refine such low-rank approximations, at a cost of a small number of matrix-vector multiplications. This idea is particularly interesting in the many situations in machine learning where one has to solve a sequence of related symmetric positive definite linear problems. From the machine learning perspective, such deflation methods can be interpreted as transfer learning of a low-rank approximation across a time-series of numerical tasks. We study the use of such methods for our field. Our empirical results show that, on regression and classification problems of intermediate size, this approach can interpolate between low computational cost and numerical precision.

pn

link (url) Project Page [BibTex]


no image
Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings

Kanagawa, M., Sriperumbudur, B. K., Fukumizu, K.

Arxiv e-prints, arXiv:1709.00147v1 [math.NA], 2017 (article)

Abstract
This paper presents convergence analysis of kernel-based quadrature rules in misspecified settings, focusing on deterministic quadrature in Sobolev spaces. In particular, we deal with misspecified settings where a test integrand is less smooth than a Sobolev RKHS based on which a quadrature rule is constructed. We provide convergence guarantees based on two different assumptions on a quadrature rule: one on quadrature weights, and the other on design points. More precisely, we show that convergence rates can be derived (i) if the sum of absolute weights remains constant (or does not increase quickly), or (ii) if the minimum distance between distance design points does not decrease very quickly. As a consequence of the latter result, we derive a rate of convergence for Bayesian quadrature in misspecified settings. We reveal a condition on design points to make Bayesian quadrature robust to misspecification, and show that, under this condition, it may adaptively achieve the optimal rate of convergence in the Sobolev space of a lesser order (i.e., of the unknown smoothness of a test integrand), under a slightly stronger regularity condition on the integrand.

pn

arXiv [BibTex]

arXiv [BibTex]


no image
New Directions for Learning with Kernels and Gaussian Processes (Dagstuhl Seminar 16481)

Gretton, A., Hennig, P., Rasmussen, C., Schölkopf, B.

Dagstuhl Reports, 6(11):142-167, 2017 (book)

ei pn

DOI [BibTex]

DOI [BibTex]


no image
Efficiency of analytical and sampling-based uncertainty propagation in intensity-modulated proton therapy

Wahl, N., Hennig, P., Wieser, H. P., Bangert, M.

Physics in Medicine & Biology, 62(14):5790-5807, 2017 (article)

Abstract
The sensitivity of intensity-modulated proton therapy (IMPT) treatment plans to uncertainties can be quantified and mitigated with robust/min-max and stochastic/probabilistic treatment analysis and optimization techniques. Those methods usually rely on sparse random, importance, or worst-case sampling. Inevitably, this imposes a trade-off between computational speed and accuracy of the uncertainty propagation. Here, we investigate analytical probabilistic modeling (APM) as an alternative for uncertainty propagation and minimization in IMPT that does not rely on scenario sampling. APM propagates probability distributions over range and setup uncertainties via a Gaussian pencil-beam approximation into moments of the probability distributions over the resulting dose in closed form. It supports arbitrary correlation models and allows for efficient incorporation of fractionation effects regarding random and systematic errors. We evaluate the trade-off between run-time and accuracy of APM uncertainty computations on three patient datasets. Results are compared against reference computations facilitating importance and random sampling. Two approximation techniques to accelerate uncertainty propagation and minimization based on probabilistic treatment plan optimization are presented. Runtimes are measured on CPU and GPU platforms, dosimetric accuracy is quantified in comparison to a sampling-based benchmark (5000 random samples). APM accurately propagates range and setup uncertainties into dose uncertainties at competitive run-times (GPU ##IMG## [http://ej.iop.org/images/0031-9155/62/14/5790/pmbaa6ec5ieqn001.gif] {$\leqslant {5}$} min). The resulting standard deviation (expectation value) of dose show average global ##IMG## [http://ej.iop.org/images/0031-9155/62/14/5790/pmbaa6ec5ieqn002.gif] {$\gamma_{{3}\% / {3}~{\rm mm}}$} pass rates between 94.2% and 99.9% (98.4% and 100.0%). All investigated importance sampling strategies provided less accuracy at higher run-times considering only a single fraction. Considering fractionation, APM uncertainty propagation and treatment plan optimization was proven to be possible at constant time complexity, while run-times of sampling-based computations are linear in the number of fractions. Using sum sampling within APM, uncertainty propagation can only be accelerated at the cost of reduced accuracy in variance calculations. For probabilistic plan optimization, we were able to approximate the necessary pre-computations within seconds, yielding treatment plans of similar quality as gained from exact uncertainty propagation. APM is suited to enhance the trade-off between speed and accuracy in uncertainty propagation and probabilistic treatment plan optimization, especially in the context of fractionation. This brings fully-fledged APM computations within reach of clinical application.

pn

link (url) [BibTex]

link (url) [BibTex]


no image
Analytical probabilistic modeling of RBE-weighted dose for ion therapy

Wieser, H., Hennig, P., Wahl, N., Bangert, M.

Physics in Medicine and Biology (PMB), 62(23):8959-8982, 2017 (article)

pn

link (url) [BibTex]

link (url) [BibTex]

2015


Thumb xl posterior
Automatic LQR Tuning Based on Gaussian Process Optimization: Early Experimental Results

Marco, A., Hennig, P., Bohg, J., Schaal, S., Trimpe, S.

Machine Learning in Planning and Control of Robot Motion Workshop at the IEEE/RSJ International Conference on Intelligent Robots and Systems (iROS), pages: , , Machine Learning in Planning and Control of Robot Motion Workshop, October 2015 (conference)

Abstract
This paper proposes an automatic controller tuning framework based on linear optimal control combined with Bayesian optimization. With this framework, an initial set of controller gains is automatically improved according to a pre-defined performance objective evaluated from experimental data. The underlying Bayesian optimization algorithm is Entropy Search, which represents the latent objective as a Gaussian process and constructs an explicit belief over the location of the objective minimum. This is used to maximize the information gain from each experimental evaluation. Thus, this framework shall yield improved controllers with fewer evaluations compared to alternative approaches. A seven-degree-of-freedom robot arm balancing an inverted pole is used as the experimental demonstrator. Preliminary results of a low-dimensional tuning problem highlight the method’s potential for automatic controller tuning on robotic platforms.

am ei ics pn

PDF DOI Project Page [BibTex]

2015


PDF DOI Project Page [BibTex]


no image
Inference of Cause and Effect with Unsupervised Inverse Regression

Sgouritsa, E., Janzing, D., Hennig, P., Schölkopf, B.

In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, 38, pages: 847-855, JMLR Workshop and Conference Proceedings, (Editors: Lebanon, G. and Vishwanathan, S.V.N.), JMLR.org, AISTATS, 2015 (inproceedings)

ei pn

Web PDF [BibTex]

Web PDF [BibTex]


no image
Probabilistic Interpretation of Linear Solvers

Hennig, P.

SIAM Journal on Optimization, 25(1):234-260, 2015 (article)

ei pn

Web PDF link (url) DOI [BibTex]

Web PDF link (url) DOI [BibTex]


Thumb xl maren ls
Probabilistic Line Searches for Stochastic Optimization

Mahsereci, M., Hennig, P.

In Advances in Neural Information Processing Systems 28, pages: 181-189, (Editors: C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama and R. Garnett), Curran Associates, Inc., 29th Annual Conference on Neural Information Processing Systems (NIPS), 2015 (inproceedings)

Abstract
In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent. [You can find the matlab research code under `attachments' below. The zip-file contains a minimal working example. The docstring in probLineSearch.m contains additional information. A more polished implementation in C++ will be published here at a later point. For comments and questions about the code please write to mmahsereci@tue.mpg.de.]

ei pn

Matlab research code link (url) [BibTex]

Matlab research code link (url) [BibTex]


no image
A Random Riemannian Metric for Probabilistic Shortest-Path Tractography

Hauberg, S., Schober, M., Liptrot, M., Hennig, P., Feragen, A.

In 18th International Conference on Medical Image Computing and Computer Assisted Intervention, 9349, pages: 597-604, Lecture Notes in Computer Science, MICCAI, 2015 (inproceedings)

ei pn

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Probabilistic numerics and uncertainty in computations

Hennig, P., Osborne, M. A., Girolami, M.

Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 471(2179), 2015 (article)

Abstract
We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and industry. Within applications such as climate science and astrophysics, the need to make decisions on the basis of computations with large and complex data have led to a renewed focus on the management of numerical uncertainty. We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimizers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations.

ei pn

PDF DOI [BibTex]

PDF DOI [BibTex]

2014


Thumb xl ps page panel
Probabilistic Progress Bars

Kiefel, M., Schuler, C., Hennig, P.

In Conference on Pattern Recognition (GCPR), 8753, pages: 331-341, Lecture Notes in Computer Science, (Editors: Jiang, X., Hornegger, J., and Koch, R.), Springer, GCPR, September 2014 (inproceedings)

Abstract
Predicting the time at which the integral over a stochastic process reaches a target level is a value of interest in many applications. Often, such computations have to be made at low cost, in real time. As an intuitive example that captures many features of this problem class, we choose progress bars, a ubiquitous element of computer user interfaces. These predictors are usually based on simple point estimators, with no error modelling. This leads to fluctuating behaviour confusing to the user. It also does not provide a distribution prediction (risk values), which are crucial for many other application areas. We construct and empirically evaluate a fast, constant cost algorithm using a Gauss-Markov process model which provides more information to the user.

ei ps pn

website+code pdf DOI [BibTex]

2014


website+code pdf DOI [BibTex]


Thumb xl aistats2014
Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics

Hennig, P., Hauberg, S.

In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, 33, pages: 347-355, JMLR: Workshop and Conference Proceedings, (Editors: S Kaski and J Corander), Microtome Publishing, Brookline, MA, AISTATS, April 2014 (inproceedings)

Abstract
We study a probabilistic numerical method for the solution of both boundary and initial value problems that returns a joint Gaussian process posterior over the solution. Such methods have concrete value in the statistics on Riemannian manifolds, where non-analytic ordinary differential equations are involved in virtually all computations. The probabilistic formulation permits marginalising the uncertainty of the numerical solution such that statistics are less sensitive to inaccuracies. This leads to new Riemannian algorithms for mean value computations and principal geodesic analysis. Marginalisation also means results can be less precise than point estimates, enabling a noticeable speed-up over the state of the art. Our approach is an argument for a wider point that uncertainty caused by numerical calculations should be tracked throughout the pipeline of machine learning algorithms.

ei ps pn

pdf Youtube Supplements Project page link (url) [BibTex]

pdf Youtube Supplements Project page link (url) [BibTex]


no image
Local Gaussian Regression

Meier, F., Hennig, P., Schaal, S.

arXiv preprint, March 2014, clmc (misc)

Abstract
Abstract: Locally weighted regression was created as a nonparametric learning method that is computationally efficient, can learn from very large amounts of data and add data incrementally. An interesting feature of locally weighted regression is that it can work with ...

am pn

Web link (url) [BibTex]

Web link (url) [BibTex]


no image
Probabilistic ODE Solvers with Runge-Kutta Means

Schober, M., Duvenaud, D., Hennig, P.

In Advances in Neural Information Processing Systems 27, pages: 739-747, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), Curran Associates, Inc., 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014 (inproceedings)

ei pn

Web link (url) [BibTex]

Web link (url) [BibTex]


no image
Active Learning of Linear Embeddings for Gaussian Processes

Garnett, R., Osborne, M., Hennig, P.

In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, pages: 230-239, (Editors: NL Zhang and J Tian), AUAI Press , Corvallis, Oregon, UAI2014, 2014, another link: http://arxiv.org/abs/1310.6740 (inproceedings)

ei pn

PDF Web [BibTex]

PDF Web [BibTex]


no image
Probabilistic Shortest Path Tractography in DTI Using Gaussian Process ODE Solvers

Schober, M., Kasenburg, N., Feragen, A., Hennig, P., Hauberg, S.

In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014, Lecture Notes in Computer Science Vol. 8675, pages: 265-272, (Editors: P. Golland, N. Hata, C. Barillot, J. Hornegger and R. Howe), Springer, Heidelberg, MICCAI, 2014 (inproceedings)

ei pn

DOI [BibTex]

DOI [BibTex]


no image
Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature

Gunter, T., Osborne, M., Garnett, R., Hennig, P., Roberts, S.

In Advances in Neural Information Processing Systems 27, pages: 2789-2797, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), Curran Associates, Inc., 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014 (inproceedings)

ei pn

Web link (url) [BibTex]

Web link (url) [BibTex]


no image
Incremental Local Gaussian Regression

Meier, F., Hennig, P., Schaal, S.

In Advances in Neural Information Processing Systems 27, pages: 972-980, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014, clmc (inproceedings)

am ei pn

PDF link (url) [BibTex]

PDF link (url) [BibTex]


no image
Efficient Bayesian Local Model Learning for Control

Meier, F., Hennig, P., Schaal, S.

In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, pages: 2244 - 2249, IROS, 2014, clmc (inproceedings)

Abstract
Model-based control is essential for compliant controland force control in many modern complex robots, like humanoidor disaster robots. Due to many unknown and hard tomodel nonlinearities, analytical models of such robots are oftenonly very rough approximations. However, modern optimizationcontrollers frequently depend on reasonably accurate models,and degrade greatly in robustness and performance if modelerrors are too large. For a long time, machine learning hasbeen expected to provide automatic empirical model synthesis,yet so far, research has only generated feasibility studies butno learning algorithms that run reliably on complex robots.In this paper, we combine two promising worlds of regressiontechniques to generate a more powerful regression learningsystem. On the one hand, locally weighted regression techniquesare computationally efficient, but hard to tune due to avariety of data dependent meta-parameters. On the other hand,Bayesian regression has rather automatic and robust methods toset learning parameters, but becomes quickly computationallyinfeasible for big and high-dimensional data sets. By reducingthe complexity of Bayesian regression in the spirit of local modellearning through variational approximations, we arrive at anovel algorithm that is computationally efficient and easy toinitialize for robust learning. Evaluations on several datasetsdemonstrate very good learning performance and the potentialfor a general regression learning tool for robotics.

am ei pn

PDF link (url) DOI [BibTex]

PDF link (url) DOI [BibTex]

2010


no image
Using an Infinite Von Mises-Fisher Mixture Model to Cluster Treatment Beam Directions in External Radiation Therapy

Bangert, M., Hennig, P., Oelfke, U.

In pages: 746-751 , (Editors: Draghici, S. , T.M. Khoshgoftaar, V. Palade, W. Pedrycz, M.A. Wani, X. Zhu), IEEE, Piscataway, NJ, USA, Ninth International Conference on Machine Learning and Applications (ICMLA), December 2010 (inproceedings)

Abstract
We present a method for fully automated selection of treatment beam ensembles for external radiation therapy. We reformulate the beam angle selection problem as a clustering problem of locally ideal beam orientations distributed on the unit sphere. For this purpose we construct an infinite mixture of von Mises-Fisher distributions, which is suited in general for density estimation from data on the D-dimensional sphere. Using a nonparametric Dirichlet process prior, our model infers probability distributions over both the number of clusters and their parameter values. We describe an efficient Markov chain Monte Carlo inference algorithm for posterior inference from experimental data in this model. The performance of the suggested beam angle selection framework is illustrated for one intra-cranial, pancreas, and prostate case each. The infinite von Mises-Fisher mixture model (iMFMM) creates between 18 and 32 clusters, depending on the patient anatomy. This suggests to use the iMFMM directly for beam ensemble selection in robotic radio surgery, or to generate low-dimensional input for both subsequent optimization of trajectories for arc therapy and beam ensemble selection for conventional radiation therapy.

ei pn

Web DOI [BibTex]

2010


Web DOI [BibTex]


no image
Coherent Inference on Optimal Play in Game Trees

Hennig, P., Stern, D., Graepel, T.

In JMLR Workshop and Conference Proceedings Volume 9: AISTATS 2010, pages: 326-333, (Editors: Teh, Y.W. , M. Titterington ), JMLR, Cambridge, MA, USA, Thirteenth International Conference on Artificial Intelligence and Statistics, May 2010 (inproceedings)

Abstract
Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.

ei pn

PDF Web [BibTex]

PDF Web [BibTex]