Header logo is


2009


no image
Incorporating Prior Knowledge on Class Probabilities into Local Similarity Measures for Intermodality Image Registration

Hofmann, M., Schölkopf, B., Bezrukov, I., Cahill, N.

In Proceedings of the MICCAI 2009 Workshop on Probabilistic Models for Medical Image Analysis , pages: 220-231, (Editors: W Wells and S Joshi and K Pohl), PMMIA, September 2009 (inproceedings)

Abstract
We present a methodology for incorporating prior knowledge on class probabilities into the registration process. By using knowledge from the imaging modality, pre-segmentations, and/or probabilistic atlases, we construct vectors of class probabilities for each image voxel. By defining new image similarity measures for distribution-valued images, we show how the class probability images can be nonrigidly registered in a variational framework. An experiment on nonrigid registration of MR and CT full-body scans illustrates that the proposed technique outperforms standard mutual information (MI) and normalized mutual information (NMI) based registration techniques when measured in terms of target registration error (TRE) of manually labeled fiducials.

ei

PDF Web [BibTex]

2009


PDF Web [BibTex]


no image
Inference algorithms and learning theory for Bayesian sparse factor analysis

Rattray, M., Stegle, O., Sharp, K., Winn, J.

Journal of Physics: Conference Series , IW-SMI 2009, 197(1: International Workshop on Statistical-Mechanical Informatics 2009):1-10, (Editors: Inoue, M. , S. Ishii, Y. Kabashima, M. Okada), Institute of Physics, Bristol, UK, International Workshop on Statistical-Mechanical Informatics (IW-SMI), September 2009 (article)

Abstract
Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Finite-time output stabilization with second order sliding modes

Dinuzzo, F., Ferrara, A.

Automatica, 45(9):2169-2171, September 2009 (article)

Abstract
In this note, a class of discontinuous feedback laws that switch over branches of parabolas in the auxiliary state plane is analyzed. Conditions are provided under which controllers belonging to this class are second order sliding-mode algorithms: they ensure uniform global finite-time output stability for uncertain systems of relative degree two.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Markerless 3D Face Tracking (DAGM 2009)

Walder, C., Breidt, M., Bülthoff, H., Schölkopf, B., Curio, C.

In Pattern Recognition, Lecture Notes in Computer Science, Vol. 5748 , pages: 41-50, (Editors: J Denzler and G Notni and H Süsse), Springer, Berlin, Germany, 31st Symposium of the German Association for Pattern Recognition (DAGM), September 2009 (inproceedings)

Abstract
We present a novel algorithm for the markerless tracking of deforming surfaces such as faces. We acquire a sequence of 3D scans along with color images at 40Hz. The data is then represented by implicit surface and color functions, using a novel partition-of-unity type method of efficiently combining local regressors using nearest neighbor searches. Both these functions act on the 4D space of 3D plus time, and use temporal information to handle the noise in individual scans. After interactive registration of a template mesh to the first frame, it is then automatically deformed to track the scanned surface, using the variation of both shape and color as features in a dynamic energy minimization problem. Our prototype system yields high-quality animated 3D models in correspondence, at a rate of approximately twenty seconds per timestep. Tracking results for faces and other objects are presented.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Robot Learning

Peters, J., Morimoto, J., Tedrake, R., Roy, N.

IEEE Robotics and Automation Magazine, 16(3):19-20, September 2009 (article)

Abstract
Creating autonomous robots that can learn to act in unpredictable environments has been a long-standing goal of robotics, artificial intelligence, and the cognitive sciences. In contrast, current commercially available industrial and service robots mostly execute fixed tasks and exhibit little adaptability. To bridge this gap, machine learning offers a myriad set of methods, some of which have already been applied with great success to robotics problems. As a result, there is an increasing interest in machine learning and statistics within the robotics community. At the same time, there has been a growth in the learning community in using robots as motivating applications for new algorithms and formalisms. Considerable evidence of this exists in the use of learning in high-profile competitions such as RoboCup and the Defense Advanced Research Projects Agency (DARPA) challenges, and the growing number of research programs funded by governments around the world.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Object Localization with Global and Local Context Kernels

Blaschko, M., Lampert, C.

In British Machine Vision Conference 2009, pages: 1-11, BMVC, September 2009 (inproceedings)

Abstract
Recent research has shown that the use of contextual cues significantly improves performance in sliding window type localization systems. In this work, we propose a method that incorporates both global and local context information through appropriately defined kernel functions. In particular, we make use of a weighted combination of kernels defined over local spatial regions, as well as a global context kernel. The relative importance of the context contributions is learned automatically, and the resulting discriminant function is of a form such that localization at test time can be solved efficiently using a branch and bound optimization scheme. By specifying context directly with a kernel learning approach, we achieve high localization accuracy with a simple and efficient representation. This is in contrast to other systems that incorporate context for which expensive inference needs to be done at test time. We show experimentally on the PASCAL VOC datasets that the inclusion of context can significantly improve localization performance, provided the relative contributions of context cues are learned appropriately.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Efficient Sample Reuse in EM-Based Policy Search

Hachiya, H., Peters, J., Sugiyama, M.

In 16th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages: 469-484, (Editors: Buntine, W. , M. Grobelnik, D. Mladenic, J. Shawe-Taylor), Springer, Berlin, Germany, ECML PKDD, September 2009 (inproceedings)

Abstract
Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend a EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse, is demonstrated through a robot learning experiment.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Active Structured Learning for High-Speed Object Detection

Lampert, C., Peters, J.

In DAGM 2009, pages: 221-231, (Editors: Denzler, J. , G. Notni, H. Süsse), Springer, Berlin, Germany, 31st Annual Symposium of the German Association for Pattern Recognition, September 2009 (inproceedings)

Abstract
High-speed smooth and accurate visual tracking of objects in arbitrary, unstructured environments is essential for robotics and human motion analysis. However, building a system that can adapt to arbitrary objects and a wide range of lighting conditions is a challenging problem, especially if hard real-time constraints apply like in robotics scenarios. In this work, we introduce a method for learning a discriminative object tracking system based on the recent structured regression framework for object localization. Using a kernel function that allows fast evaluation on the GPU, the resulting system can process video streams at speed of 100 frames per second or more. Consecutive frames in high speed video sequences are typically very redundant, and for training an object detection system, it is sufficient to have training labels from only a subset of all images. We propose an active learning method that select training examples in a data-driven way, thereby minimizing the required number of training labeling. Experiments on realistic data show that the active learning is superior to previously used methods for dataset subsampling for this task.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Higher order sliding mode controllers with optimal reaching

Dinuzzo, F., Ferrara, A.

IEEE Transactions on Automatic Control, 54(9):2126-2136, September 2009 (article)

Abstract
Higher order sliding mode (HOSM) control design is considered for systems with a known permanent relative degree. In this paper, we introduce the robust Fuller's problem that is a robust generalization of the Fuller's problem, a standard optimal control problem for a chain of integrators with bounded control. By solving the robust Fuller's problem it is possible to obtain feedback laws that are HOSM algorithms of generic order and, in addition, provide optimal finite-time reaching of the sliding manifold. A common difficulty in the use of existing HOSM algorithms is the tuning of design parameters: our methodology proves useful for the tuning of HOSM controller parameters in order to assure desired performances and prevent instabilities. The convergence and stability properties of the proposed family of controllers are theoretically analyzed. Simulation evidence demonstrates their effectiveness.

ei

DOI [BibTex]

DOI [BibTex]


no image
Kernel Methods in Computer Vision

Lampert, CH.

Foundations and Trends in Computer Graphics and Vision, 4(3):193-285, September 2009 (article)

Abstract
Over the last years, kernel methods have established themselves as powerful tools for computer vision researchers as well as for practitioners. In this tutorial, we give an introduction to kernel methods in computer vision from a geometric perspective, introducing not only the ubiquitous support vector machines, but also less known techniques for regression, dimensionality reduction, outlier detection and clustering. Additionally, we give an outlook on very recent, non-classical techniques for the prediction of structure data, for the estimation of statistical dependency and for learning the kernel function itself. All methods are illustrated with examples of successful application from the recent computer vision research literature.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Generalized Clustering via Kernel Embeddings

Jegelka, S., Gretton, A., Schölkopf, B., Sriperumbudur, B., von Luxburg, U.

In KI 2009: AI and Automation, Lecture Notes in Computer Science, Vol. 5803, pages: 144-152, (Editors: B Mertsching and M Hund and Z Aziz), Springer, Berlin, Germany, 32nd Annual Conference on Artificial Intelligence (KI), September 2009 (inproceedings)

Abstract
We generalize traditional goals of clustering towards distinguishing components in a non-parametric mixture model. The clusters are not necessarily based on point locations, but on higher order criteria. This framework can be implemented by embedding probability distributions in a Hilbert space. The corresponding clustering objective is very general and relates to a range of common clustering concepts.

ei

PDF PDF Web DOI [BibTex]

PDF PDF Web DOI [BibTex]


no image
Discovering Temporal Patterns of Differential Gene Expression in Microarray Time Series

Stegle, O., Denby, KJ., Wild, DL., McHattie, S., Mead, A., Ghahramani, Z., Borgwardt, KM.

In Proceedings of the German Conference on Bioinformatics 2009 (GCB 2009), pages: 133-142, (Editors: Grosse, I. , S. Neumann, S. Posch, F. Schreiber, P. F. Stadler), Gesellschaft für Informatik, Bonn, Germany, German Conference on Bioinformatics (GCB '09), September 2009 (inproceedings)

Abstract
A wealth of time series of microarray measurements have become available over recent years. Several two-sample tests for detecting differential gene expression in these time series have been defined, but they can only answer the question whether a gene is differentially expressed across the whole time series, not in which intervals it is differentially expressed. In this article, we propose a Gaussian process based approach for studying these dynamics of differential gene expression. In experiments on Arabidopsis thaliana gene expression levels, our novel technique helps us to uncover that the family of WRKY transcription factors appears to be involved in the early response to infection by a fungal pathogen.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
A Novel Approach to the Selection of Spatially Invariant Features for the Classification of Hyperspectral Images with Improved Generalization Capability

Bruzzone, L., Persello, C.

IEEE Transactions on Geoscience and Remote Sensing, 47(9):3180-3191, September 2009 (article)

Abstract
This paper presents a novel approach to feature selection for the classification of hyperspectral images. The proposed approach aims at selecting a subset of the original set of features that exhibits at the same time high capability to discriminate among the considered classes and high invariance in the spatial domain of the investigated scene. This approach results in a more robust classification system with improved generalization properties with respect to standard feature-selection methods. The feature selection is accomplished by defining a multiobjective criterion function made up of two terms: (1) a term that measures the class separability and (2) a term that evaluates the spatial invariance of the selected features. In order to assess the spatial invariance of the feature subset, we propose both a supervised method (which assumes that training samples acquired in two or more spatially disjoint areas are available) and a semisupervised method (which requires only a standard training set acquired in a single area of the scene and takes advantage of unlabeled samples selected in portions of the scene spatially disjoint from the training set). The choice for the supervised or semisupervised method depends on the available reference data. The multiobjective problem is solved by an evolutionary algorithm that estimates the set of Pareto-optimal solutions. Experiments carried out on a hyperspectral image acquired by the Hyperion sensor on a complex area confirmed the effectiveness of the proposed approach.

ei

Web DOI [BibTex]


no image
Fast Kernel-Based Independent Component Analysis

Shen, H., Jegelka, S., Gretton, A.

IEEE Transactions on Signal Processing, 57(9):3498-3511, September 2009 (article)

Abstract
Recent approaches to independent component analysis (ICA) have used kernel independence measures to obtain highly accurate solutions, particularly where classical methods experience difficulty (for instance, sources with near-zero kurtosis). FastKICA (fast HSIC-based kernel ICA) is a new optimization method for one such kernel independence measure, the Hilbert-Schmidt Independence Criterion (HSIC). The high computational efficiency of this approach is achieved by combining geometric optimization techniques, specifically an approximate Newton-like method on the orthogonal group, with accurate estimates of the gradient and Hessian based on an incomplete Cholesky decomposition. In contrast to other efficient kernel-based ICA algorithms, FastKICA is applicable to any twice differentiable kernel function. Experimental results for problems with large numbers of sources and observations indicate that FastKICA provides more accurate solutions at a given cost than gradient descent on HSIC. Comparing with other recently published ICA methods, FastKICA is competitive in terms of accuracy, relatively insensitive to local minima when initialized far from independence, and more robust towards outliers. An analysis of the local convergence properties of FastKICA is provided.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


Thumb xl teaser cinc
Parametric Modeling of the Beating Heart with Respiratory Motion Extracted from Magnetic Resonance Images

Pons-Moll, G., Crosas, C., Tadmor, G., MacLeod, R., Rosenhahn, B., Brooks, D.

In IEEE Computers in Cardiology (CINC), September 2009 (inproceedings)

ps

[BibTex]

[BibTex]


no image
Qualia: The Geometry of Integrated Information

Balduzzi, D., Tononi, G.

PLoS Computational Biology, 5(8):1-24, August 2009 (article)

Abstract
According to the integrated information theory, the quantity of consciousness is the amount of integrated information generated by a complex of elements, and the quality of experience is specified by the informational relationships it generates. This paper outlines a framework for characterizing the informational relationships generated by such systems. Qualia space (Q) is a space having an axis for each possible state (activity pattern) of a complex. Within Q, each submechanism specifies a point corresponding to a repertoire of system states. Arrows between repertoires in Q define informational relationships. Together, these arrows specify a quale—a shape that completely and univocally characterizes the quality of a conscious experience. Φ— the height of this shape—is the quantity of consciousness associated with the experience. Entanglement measures how irreducible informational relationships are to their component relationships, specifying concepts and modes. Several corollaries follow from these premises. The quale is determined by both the mechanism and state of the system. Thus, two different systems having identical activity patterns may generate different qualia. Conversely, the same quale may be generated by two systems that differ in both activity and connectivity. Both active and inactive elements specify a quale, but elements that are inactivated do not. Also, the activation of an element affects experience by changing the shape of the quale. The subdivision of experience into modalities and submodalities corresponds to subshapes in Q. In principle, different aspects of experience may be classified as different shapes in Q, and the similarity between experiences reduces to similarities between shapes. Finally, specific qualities, such as the “redness” of red, while generated by a local mechanism, cannot be reduced to it, but require considering the entire quale. Ultimately, the present framework may offer a principled way for translating qualitative properties of experience into mathematics.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Guest editorial: Special issue on robot learning, Part B

Peters, J., Ng, A.

Autonomous Robots, 27(2):91-92, August 2009 (article)

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Policy Search for Motor Primitives

Peters, J., Kober, J.

KI - Zeitschrift K{\"u}nstliche Intelligenz, 23(3):38-40, August 2009 (article)

Abstract
Many motor skills in humanoid robotics can be learned using parametrized motor primitives from demonstrations. However, most interesting motor learning problems require self-improvement often beyond the reach of current reinforcement learning methods due to the high dimensionality of the state-space. We develop an EM-inspired algorithm applicable to complex motor learning tasks. We compare this algorithm to several well-known parametrized policy search methods and show that it outperforms them. We apply it to motor learning problems and show that it can learn the complex Ball-in-a-Cup task using a real Barrett WAM robot arm.

ei

Web [BibTex]

Web [BibTex]


no image
Control Design Based on Analytical Stability Criteria for Optimized Kinesthetic Perception in Scaled Teleoperation

Son, HI., Bhattacharjee, T., Lee, DY.

In ICCAS-SICE International Joint Conference, pages: 3365-3370, IEEE, Piscataway, NJ, USA, ICCAS-SICE International Joint Conference, August 2009 (inproceedings)

Abstract
This paper considers kinesthetic perception as the main performance objective for a scaled teleoperation system, and devises a scheme to optimize it with constraints of position tracking and absolute stability. Analytical criteria for monitoring stability have been derived for position-position, force-position, and four-channel control architectures using Llewellyn's absolute stability criteria. This helps to reduce the optimization complexity and provides an easy and effective design guideline for selecting control gains amongst the range. Optimization results indicate that trade-offs exist among different control architectures. This paper provides guidelines based on application-dependent selection of control scheme.

ei

Web [BibTex]

Web [BibTex]


Thumb xl ascc09
Computer cursor control by motor cortical signals in humans with tetraplegia

Kim, S., Simeral, J. D., Hochberg, L. R., Donoghue, J. P., Black, M. J.

In 7th Asian Control Conference, ASCC09, pages: 988-993, Hong Kong, China, August 2009 (inproceedings)

ps

pdf [BibTex]

pdf [BibTex]


no image
A neurophysiologically plausible population code model for human contrast discrimination

Goris, R., Wichmann, F., Henning, G.

Journal of Vision, 9(7):1-22, July 2009 (article)

Abstract
The pedestal effect is the improvement in the detectability of a sinusoidal grating in the presence of another grating of the same orientation, spatial frequency, and phase—usually called the pedestal. Recent evidence has demonstrated that the pedestal effect is differently modified by spectrally flat and notch-filtered noise: The pedestal effect is reduced in flat noise but virtually disappears in the presence of notched noise (G. B. Henning & F. A. Wichmann, 2007). Here we consider a network consisting of units whose contrast response functions resemble those of the cortical cells believed to underlie human pattern vision and demonstrate that, when the outputs of multiple units are combined by simple weighted summation—a heuristic decision rule that resembles optimal information combination and produces a contrast-dependent weighting profile—the network produces contrast-discrimination data consistent with psychophysical observations: The pedestal effect is present without noise, reduced in broadband noise, but almost disappears in notched noise. These findings follow naturally from the normalization model of simple cells in primary visual cortex, followed by response-based pooling, and suggest that in processing even low-contrast sinusoidal gratings, the visual system may combine information across neurons tuned to different spatial frequencies and orientations.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Novel Context-Sensitive Semisupervised SVM Classifier Robust to Mislabeled Training Samples

Bruzzone, L., Persello, C.

IEEE Transactions on Geoscience and Remote Sensing, 47(7):2142-2154, July 2009 (article)

Abstract
This paper presents a novel context-sensitive semisupervised support vector machine (CS4VM) classifier, which is aimed at addressing classification problems where the available training set is not fully reliable, i.e., some labeled samples may be associated to the wrong information class (mislabeled patterns). Unlike standard context-sensitive methods, the proposed CS4VM classifier exploits the contextual information of the pixels belonging to the neighborhood system of each training sample in the learning phase to improve the robustness to possible mislabeled training patterns. This is achieved according to both the design of a semisupervised procedure and the definition of a novel contextual term in the cost function associated with the learning of the classifier. In order to assess the effectiveness of the proposed CS4VM and to understand the impact of the addressed problem in real applications, we also present an extensive experimental analysis carried out on training sets that include different percentages of mislabeled patterns having different distributions on the classes. In the analysis, we also study the robustness to mislabeled training patterns of some widely used supervised and semisupervised classification algorithms (i.e., conventional support vector machine (SVM), progressive semisupervised SVM, maximum likelihood, and k-nearest neighbor). Results obtained on a very high resolution image and on a medium resolution image confirm both the robustness and the effectiveness of the proposed CS4VM with respect to standard classification algorithms and allow us to derive interesting conclusions on the effects of mislabeled patterns on different classifiers.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Falsificationism and Statistical Learning Theory: Comparing the Popper and Vapnik-Chervonenkis Dimensions

Corfield, D., Schölkopf, B., Vapnik, V.

Journal for General Philosophy of Science, 40(1):51-58, July 2009 (article)

Abstract
We compare Karl Popper’s ideas concerning the falsifiability of a theory with similar notions from the part of statistical learning theory known as VC-theory. Popper’s notion of the dimension of a theory is contrasted with the apparently very similar VC-dimension. Having located some divergences, we discuss how best to view Popper’s work from the perspective of statistical learning theory, either as a precursor or as aiming to capture a different learning activity.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Active learning for classification of remote sensing images

Bruzzone, L., Persello, C.

In pages: III-693-III-696 , IEEE, Piscataway, NJ, USA, IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2009 (inproceedings)

Abstract
This paper presents an analysis of active learning techniques for the classification of remote sensing images and proposes a novel active learning method based on support vector machines (SVMs). The proposed method exploits a query function for the inclusion of batches of unlabeled samples in the training set, which is based on the evaluation of two criteria: uncertainty and diversity. This query function adopts a stochastic approach to the selection of unlabeled samples, which is based on a function of uncertainty estimated from the distribution of errors on the validation set (which is assumed available for the model selection of the SVM classifier). Experimental results carried out on a very high resolution image confirm the effectiveness of the proposed active learning technique, which results more accurate than standard methods.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A novel approach to the selection of spatially invariant features for classification of hyperspectral images

Persello, C., Bruzzone, L.

In pages: II-61-II-64 , IEEE, Piscataway, NJ, USA, IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2009 (inproceedings)

Abstract
This paper presents a novel approach to feature selection for the classification of hyperspectral images. The proposed approach aims at selecting a subset of the original set of features that exhibits two main properties: i) high capability to discriminate among the considered classes, ii) high invariance in the spatial domain of the investigated scene. This approach results in a more robust classification system with improved generalization properties with respect to standard feature-selection methods. The feature selection is accomplished by defining a multi-objective criterion function made up of two terms: i) a term that measures the class separability, ii) a term that evaluates the spatial invariance of the selected features. In order to assess the spatial invariance of the feature subset we propose both a supervised method and a semisupervised method (which choice depends on the available reference data). The multi-objective problem is solved by an evolutionary algorithm that estimates the set of Pareto-optimal solutions. Experiments carried out on a hyperspectral image acquired by the Hyperion sensor on a complex area confirmed the effectiveness of the proposed approach.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Guest editorial: Special issue on robot learning, Part A

Peters, J., Ng, A.

Autonomous Robots, 27(1):1-2, July 2009 (article)

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
A Geometric Approach to Confidence Sets for Ratios: Fieller’s Theorem, Generalizations, and Bootstrap

von Luxburg, U., Franz, V.

Statistica Sinica, 19(3):1095-1117, July 2009 (article)

Abstract
We present a geometric method to determine confidence sets for the ratio E(Y)/E(X) of the means of random variables X and Y. This method reduces the problem of constructing confidence sets for the ratio of two random variables to the problem of constructing confidence sets for the means of one-dimensional random variables. It is valid in a large variety of circumstances. In the case of normally distributed random variables, the so constructed confidence sets coincide with the standard Fieller confidence sets. Generalizations of our construction lead to definitions of exact and conservative confidence sets for very general classes of distributions, provided the joint expectation of (X,Y) exists and the linear combinations of the form aX + bY are well-behaved. Finally, our geometric method allows to derive a very simple bootstrap approach for constructing conservative confidence sets for ratios which perform favorably in certain situations, in particular in the asymmetric heavy-tailed regime.

ei

PDF PDF Web [BibTex]


no image
Varieties of Justification in Machine Learning

Corfield, D.

In Proceedings of Multiplicity and Unification in Statistics and Probability, pages: 1-10, Multiplicity and Unification in Statistics and Probability, June 2009 (inproceedings)

Abstract
The field of machine learning has flourished over the past couple of decades. With huge amounts of data available, efficient algorithms can learn to extrapolate from their training sets to become very accurate classifiers. For example, it is straightforward now to develop classifiers which achieve accuracies of around 99% on databases of handwritten digits. Now these algorithms have been devised by theorists who arrive at the problem of machine learning with a range of different philosophical outlooks on the subject of inductive reasoning. This has led to a wide range of theoretical rationales for their work. In this talk I shall classify the different forms of justification for inductive machine learning into four kinds, and make some comparisons between them. With little by way of theoretical knowledge to aid in the learning tasks, while the relevance of these justificatory approaches for the inductive reasoning of the natural sciences is questionable, certain issues surrounding the presuppositions of inductive reasoning are brought sharply into focus. In particular, Frequentist, Bayesian and MDL outlooks can be compared.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Effects of Stimulus Type and of Error-Correcting Code Design on BCI Speller Performance

Hill, J., Farquhar, J., Martens, S., Biessmann, F., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 665-672, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
From an information-theoretic perspective, a noisy transmission system such as a visual Brain-Computer Interface (BCI) speller could benefit from the use of errorcorrecting codes. However, optimizing the code solely according to the maximal minimum-Hamming-distance criterion tends to lead to an overall increase in target frequency of target stimuli, and hence a significantly reduced average target-to-target interval (TTI), leading to difficulties in classifying the individual event-related potentials (ERPs) due to overlap and refractory effects. Clearly any change to the stimulus setup must also respect the possible psychophysiological consequences. Here we report new EEG data from experiments in which we explore stimulus types and codebooks in a within-subject design, finding an interaction between the two factors. Our data demonstrate that the traditional, rowcolumn code has particular spatial properties that lead to better performance than one would expect from its TTIs and Hamming-distances alone, but nonetheless error-correcting codes can improve performance provided the right stimulus type is used.

ei

PDF PDF Web [BibTex]

PDF PDF Web [BibTex]


no image
Influence of graph construction on graph-based clustering measures

Maier, M., von Luxburg, U., Hein, M.

In Advances in neural information processing systems 21, pages: 1025-1032, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Local Gaussian Process Regression for Real Time Online Model Learning and Control

Nguyen-Tuong, D., Seeger, M., Peters, J.

In Advances in neural information processing systems 21, pages: 1193-1200, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Learning in real-time applications, e.g., online approximation of the inverse dynamics model for model-based robot control, requires fast online regression techniques. Inspired by local learning, we propose a method to speed up standard Gaussian Process regression (GPR) with local GP models (LGP). The training data is partitioned in local regions, for each an individual GP model is trained. The prediction for a query point is performed by weighted estimation using nearby local models. Unlike other GP approximations, such as mixtures of experts, we use a distance based measure for partitioning of the data and weighted prediction. The proposed method achieves online learning and prediction in real-time. Comparisons with other nonparametric regression methods show that LGP has higher accuracy than LWPR and close to the performance of standard GPR and nu-SVR.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Detecting the Direction of Causal Time Series

Peters, J., Janzing, D., Gretton, A., Schölkopf, B.

In Proceedings of the 26th International Conference on Machine Learning, pages: 801-808, (Editors: A Danyluk and L Bottou and ML Littman), ACM Press, New York, NY, USA, ICML, June 2009 (inproceedings)

Abstract
We propose a method that detects the true direction of time series, by fitting an autoregressive moving average model to the data. Whenever the noise is independent of the previous samples for one ordering of the observations, but dependent for the opposite ordering, we infer the former direction to be the true one. We prove that our method works in the population case as long as the noise of the process is not normally distributed (for the latter case, the direction is not identificable). A new and important implication of our result is that it confirms a fundamental conjecture in causal reasoning - if after regression the noise is independent of signal for one direction and dependent for the other, then the former represents the true causal direction - in the case of time series. We test our approach on two types of data: simulated data sets conforming to our modeling assumptions, and real world EEG time series. Our method makes a decision for a significant fraction of both data sets, and these decisions are mostly correct. For real world data, our approach outperforms alternative solutions to the problem of time direction recovery.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer

Lampert, C., Nickisch, H., Harmeling, S.

In CVPR 2009, pages: 951-958, IEEE Service Center, Piscataway, NJ, USA, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009 (inproceedings)

Abstract
We study the problem of object classification when training and test classes are disjoint, i.e. no training examples of the target classes are available. This setup has hardly been studied in computer vision research, but it is the rule rather than the exception, because the world contains tens of thousands of different object classes and for only a very few of them image, collections have been formed and annotated with suitable class labels. In this paper, we tackle the problem by introducing attribute-based classification. It performs object detection based on a human-specified high-level description of the target objects instead of training images. The description consists of arbitrary semantic attributes, like shape, color or even geographic information. Because such properties transcend the specific learning task at hand, they can be pre-learned, e.g. from image datasets unrelated to the current task. Afterwards, new classes can be detected based on their attribute representation, without the need for a new training phase. In order to evaluate our method and to facilitate research in this area, we have assembled a new large-scale dataset, ldquoAnimals with Attributesrdquo, of over 30,000 animal images that match the 50 classes in Osherson's classic table of how strongly humans associate 85 semantic attributes with animal classes. Our experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Learning Taxonomies by Dependence Maximization

Blaschko, M., Gretton, A.

In Advances in neural information processing systems 21, pages: 153-160, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning object-specific grasp affordance densities

Detry, R., Baseski, E., Popovic, M., Touati, Y., Krüger, N., Kroemer, O., Peters, J., Piater, J.

In 8th IEEE International Conference on Development and Learning, pages: 1-7, IEEE Service Center, Piscataway, NJ, USA, ICDL, June 2009 (inproceedings)

Abstract
This paper addresses the issue of learning and representing object grasp affordances, i.e. object-gripper relative configurations that lead to successful grasps. The purpose of grasp affordances is to organize and store the whole knowledge that an agent has about the grasping of an object, in order to facilitate reasoning on grasping solutions and their achievability. The affordance representation consists in a continuous probability density function defined on the 6D gripper pose space-3D position and orientation-, within an object-relative reference frame. Grasp affordances are initially learned from various sources, e.g. from imitation or from visual cues, leading to grasp hypothesis densities. Grasp densities are attached to a learned 3D visual object model, and pose estimation of the visual model allows a robotic agent to execute samples from a grasp hypothesis density under various object poses. Grasp outcomes are used to learn grasp empirical densities, i.e. grasps that have been confirmed through experience. We show the result of learning grasp hypothesis densities from both imitation and visual cues, and present grasp empirical densities learned from physical experience by a robot.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Understanding Brain Connectivity Patterns during Motor Imagery for Brain-Computer Interfacing

Grosse-Wentrup, M.

In Advances in neural information processing systems 21, pages: 561-568, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
EEG connectivity measures could provide a new type of feature space for inferring a subject‘s intention in Brain-Computer Interfaces (BCIs). However, very little is known on EEG connectivity patterns for BCIs. In this study, EEG connectivity during motor imagery (MI) of the left and right is investigated in a broad frequency range across the whole scalp by combining Beamforming with Transfer Entropy and taking into account possible volume conduction effects. Observed connectivity patterns indicate that modulation intentionally induced by MI is strongest in the gamma-band, i.e., above 35 Hz. Furthermore, modulation between MI and rest is found to be more pronounced than between MI of different hands. This is in contrast to results on MI obtained with bandpower features, and might provide an explanation for the so far only moderate success of connectivity features in BCIs. It is concluded that future studies on connectivity based BCIs should focus on high frequency bands and con side r ex peri mental paradigms that maximally vary cognitive demands between conditions.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Nonlinear causal discovery with additive noise models

Hoyer, P., Janzing, D., Mooij, J., Peters, J., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 689-696, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models are often used because these models are well understood and there are well-known methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that in fact the basic linear framework can be generalized to nonlinear models with additive noise. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Bounds on marginal probability distributions

Mooij, JM., Kappen, B.

In Advances in neural information processing systems 21, pages: 1105-1112, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We propose a novel bound on single-variable marginal probability distributions in factor graphs with discrete variables. The bound is obtained by propagating local bounds (convex sets of probability distributions) over a subtree of the factor graph, rooted in the variable of interest. By construction, the method not only bounds the exact marginal probability distribution of a variable, but also its approximate Belief Propagation marginal ("belief"). Thus, apart from providing a practical means to calculate bounds on marginals, our contribution also lies in providing a better understanding of the error made by Belief Propagation. We show that our bound outperforms the state-of-the-art on some inference problems arising in medical diagnosis.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Convex variational Bayesian inference for large scale generalized linear models

Nickisch, H., Seeger, M.

In ICML 2009, pages: 761-768, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning, June 2009 (inproceedings)

Abstract
We show how variational Bayesian inference can be implemented for very large generalized linear models. Our relaxation is proven to be a convex problem for any log-concave model. We provide a generic double loop algorithm for solving this relaxation on models with arbitrary super-Gaussian potentials. By iteratively decoupling the criterion, most of the work can be done by solving large linear systems, rendering our algorithm orders of magnitude faster than previously proposed solvers for the same problem. We evaluate our method on problems of Bayesian active learning for large binary classification models, and show how to address settings with many candidates and sequential inclusion steps.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis

Schweikert, G., Widmer, C., Schölkopf, B., Rätsch, G.

In Advances in neural information processing systems 21, pages: 1433-1440, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We study the problem of domain transfer for a supervised classification task in mRNA splicing. We consider a number of recent domain transfer methods from machine learning, including some that are novel, and evaluate them on genomic sequence data from model organisms of varying evolutionary distance. We find that in cases where the organisms are not closely related, the use of domain adaptation methods can help improve classification performance.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Diffeomorphic Dimensionality Reduction

Walder, C., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 1713-1720, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. We demonstrate the efficacy of our approach on various real world data sets.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
On the Identifiability of the Post-Nonlinear Causal Model

Zhang, K., Hyvärinen, A.

In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), pages: 647-655, (Editors: Bilmes, J. , A. Y. Ng, D. A. McAllester), AUAI Press, Corvallis, OR, USA, 25th Conference on Uncertainty in Artificial Intelligence (UAI), June 2009 (inproceedings)

Abstract
By taking into account the nonlinear effect of the cause, the inner noise effect, and the measurement distortion effect in the observed variables, the post-nonlinear (PNL) causal model has demonstrated its excellent performance in distinguishing the cause from effect. However, its identifiability has not been properly addressed, and how to apply it in the case of more than two variables is also a problem. In this paper, we conduct a systematic investigation on its identifiability in the two-variable case. We show that this model is identifiable in most cases; by enumerating all possible situations in which the model is not identifiable, we provide sufficient conditions for its identifiability. Simulations are given to support the theoretical results. Moreover, in the case of more than two variables, we show that the whole causal structure can be found by applying the PNL causal model to each structure in the Markov equivalent class and testing if the disturbance is independent of the direct causes for each variable. In this way the exhaustive search over all possible causal structures is avoided.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Combining appearance and motion for human action classification in videos

Dhillon, P., Nowozin, S., Lampert, C.

In 1st International Workshop on Visual Scene Understanding, pages: 22-29, IEEE Service Center, Piscataway, NJ, USA, ViSU, June 2009 (inproceedings)

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Let the Kernel Figure it Out: Principled Learning of Pre-processing for Kernel Classifiers

Gehler, P., Nowozin, S.

In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages: 2836-2843, IEEE Service Center, Piscataway, NJ, USA, CVPR, June 2009 (inproceedings)

Abstract
Most modern computer vision systems for high-level tasks, such as image classification, object recognition and segmentation, are based on learning algorithms that are able to separate discriminative information from noise. In practice, however, the typical system consists of a long pipeline of pre-processing steps, such as extraction of different kinds of features, various kinds of normalizations, feature selection, and quantization into aggregated representations such as histograms. Along this pipeline, there are many parameters to set and choices to make, and their effect on the overall system performance is a-priori unclear. In this work, we shorten the pipeline in a principled way. We move pre-processing steps into the learning system by means of kernel parameters, letting the learning algorithm decide upon suitable parameter values. Learning to optimize the pre-processing choices becomes learning the kernel parameters. We realize this paradigm by extending the recent Multiple Kernel Learning formulation from the finite case of having a fixed number of kernels which can be combined to the general infinite case where each possible parameter setting induces an associated kernel. We evaluate the new paradigm extensively on image classification and object classification tasks. We show that it is possible to learn optimal discriminative codebooks and optimal spatial pyramid schemes, consistently outperforming all previous state-of-the-art approaches.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Identifying confounders using additive noise models

Janzing, D., Peters, J., Mooij, J., Schölkopf, B.

In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pages: 249-257, (Editors: J Bilmes and AY Ng), AUAI Press, Corvallis, OR, USA, UAI, June 2009 (inproceedings)

Abstract
We propose a method for inferring the existence of a latent common cause ("confounder") of two observed random variables. The method assumes that the two effects of the confounder are (possibly nonlinear) functions of the confounder plus independent, additive noise. We discuss under which conditions the model is identifiable (up to an arbitrary reparameterization of the confounder) from the joint distribution of the effects. We state and prove a theoretical result that provides evidence for the conjecture that the model is generically identifiable under suitable technical conditions. In addition, we propose a practical method to estimate the confounder from a finite i.i.d. sample of the effects and illustrate that the method works well on both simulated and real-world data.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Regression by dependence minimization and its application to causal inference in additive noise models

Mooij, J., Janzing, D., Peters, J., Schölkopf, B.

In Proceedings of the 26th International Conference on Machine Learning, pages: 745-752, (Editors: A Danyluk and L Bottou and M Littman), ACM Press, New York, NY, USA, ICML, June 2009 (inproceedings)

Abstract
Motivated by causal inference problems, we propose a novel method for regression that minimizes the statistical dependence between regressors and residuals. The key advantage of this approach to regression is that it does not assume a particular distribution of the noise, i.e., it is non-parametric with respect to the noise distribution. We argue that the proposed regression method is well suited to the task of causal inference in additive noise models. A practical disadvantage is that the resulting optimization problem is generally non-convex and can be difficult to solve. Nevertheless, we report good results on one of the tasks of the NIPS 2008 Causality Challenge, where the goal is to distinguish causes from effects in pairs of statistically dependent variables. In addition, we propose an algorithm for efficiently inferring causal models from observational data for more than two variables. The required number of regressions and independence tests is quadratic in the number of variables, which is a significant improvement over the simple method that tests all possible DAGs.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Fitted Q-iteration by Advantage Weighted Regression

Neumann, G., Peters, J.

In Advances in neural information processing systems 21, pages: 1177-1184, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage-weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Global Connectivity Potentials for Random Field Models

Nowozin, S., Lampert, C.

In CVPR 2009, pages: 818-825, IEEE Service Center, Piscataway, NJ, USA, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009 (inproceedings)

Abstract
Markov random field (MRF, CRF) models are popular in computer vision. However, in order to be computationally tractable they are limited to incorporate only local interactions and cannot model global properties, such as connectedness, which is a potentially useful high-level prior for object segmentation. In this work, we overcome this limitation by deriving a potential function that enforces the output labeling to be connected and that can naturally be used in the framework of recent MAP-MRF LP relaxations. Using techniques from polyhedral combinatorics, we show that a provably tight approximation to the MAP solution of the resulting MRF can still be found efficiently by solving a sequence of max-flow problems. The efficiency of the inference procedure also allows us to learn the parameters of a MRF with global connectivity potentials by means of a cutting plane algorithm. We experimentally evaluate our algorithm on both synthetic data and on the challenging segmentation task of the PASCAL VOC 2008 data set. We show that in both cases the addition of a connectedness prior significantly reduces the segmentation error.

ei

PDF PDF Web DOI [BibTex]

PDF PDF Web DOI [BibTex]


no image
Bayesian Experimental Design of Magnetic Resonance Imaging Sequences

Seeger, M., Nickisch, H., Pohmann, R., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 1441-1448, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We show how improved sequences for magnetic resonance imaging can be found through automated optimization of Bayesian design scores. Combining recent advances in approximate Bayesian inference and natural image statistics with high-performance numerical computation, we propose the first scalable Bayesian experimental design framework for this problem of high relevance to clinical and brain research. Our solution requires approximate inference for dense, non-Gaussian models on a scale seldom addressed before. We propose a novel scalable variational inference algorithm, and show how powerful methods of numerical mathematics can be modified to compute primitives in our framework. Our approach is evaluated on a realistic setup with raw data from a 3T MR scanner.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Characteristic Kernels on Groups and Semigroups

Fukumizu, K., Sriperumbudur, B., Gretton, A., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 473-480, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments. For sufficiently rich (characteristic) RKHSs, each probability distribution has a unique embedding, allowing all statistical properties of the distribution to be taken into consideration. Necessary and sufficient conditions for an RKHS to be characteristic exist for Rn. In the present work, conditions are established for an RKHS to be characteristic on groups and semigroups. Illustrative examples are provided, including characteristic kernels on periodic domains, rotation matrices, and Rn+.

ei

PDF Web [BibTex]

PDF Web [BibTex]