Generalized Thompson sampling for sequential decision-making and causal inference




Purpose Sampling an action according to the probability that the action is believed to be the optimal one is sometimes called Thompson sampling. Methods Although mostly applied to bandit problems, Thompson sampling can also be used to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution over actions can then be constructed by a Bayesian superposition of the policies weighted by their posterior probability of being optimal. Results Here we discuss two important features of this approach. First, we show in how far such generalized Thompson sampling can be regarded as an optimal strategy under limited information processing capabilities that constrain the sampling complexity of the decision-making process. Second, we show how such Thompson sampling can be extended to solve causal inference problems when interacting with an environment in a sequential fashion. Conclusion In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.

Author(s): Ortega, PA and Braun, DA
Journal: Complex Adaptive Systems Modeling
Volume: 2
Number (issue): 2
Pages: 1-23
Year: 2014
Month: March

Department(s): Empirical Inference
Bibtex Type: Article (article)

DOI: 10.1186/2194-3206-2-2


