Header logo is


2008


no image
Manifold-valued Thin-plate Splines with Applications in Computer Graphics

Steinke, F., Hein, M., Peters, J., Schölkopf, B.

Computer Graphics Forum, 27(2):437-448, April 2008 (article)

Abstract
We present a generalization of thin-plate splines for interpolation and approximation of manifold-valued data, and demonstrate its usefulness in computer graphics with several applications from different fields. The cornerstone of our theoretical framework is an energy functional for mappings between two Riemannian manifolds which is independent of parametrization and respects the geometry of both manifolds. If the manifolds are Euclidean, the energy functional reduces to the classical thin-plate spline energy. We show how the resulting optimization problems can be solved efficiently in many cases. Our example applications range from orientation interpolation and motion planning in animation over geometric modelling tasks to color interpolation.

ei

PDF AVI Web DOI [BibTex]


no image
Model-Based Reinforcement Learning with Continuous States and Actions

Deisenroth, M., Rasmussen, C., Peters, J.

In ESANN 2008, pages: 19-24, (Editors: Verleysen, M. ), d-side, Evere, Belgium, European Symposium on Artificial Neural Networks, April 2008 (inproceedings)

Abstract
Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning Inverse Dynamics: A Comparison

Nguyen-Tuong, D., Peters, J., Seeger, M., Schölkopf, B.

In Advances in Computational Intelligence and Learning: Proceedings of the European Symposium on Artificial Neural Networks, pages: 13-18, (Editors: M Verleysen), d-side, Evere, Belgium, 16th European Symposium on Artificial Neural Networks (ESANN), April 2008 (inproceedings)

Abstract
While it is well-known that model can enhance the control performance in terms of precision or energy efficiency, the practical application has often been limited by the complexities of manually obtaining sufficiently accurate models. In the past, learning has proven a viable alternative to using a combination of rigid-body dynamics and handcrafted approximations of nonlinearities. However, a major open question is what nonparametric learning method is suited best for learning dynamics? Traditionally, locally weighted projection regression (LWPR), has been the standard method as it is capable of online, real-time learning for very complex robots. However, while LWPR has had significant impact on learning in robotics, alternative nonparametric regression methods such as support vector regression (SVR) and Gaussian processes regression (GPR) offer interesting alternatives with fewer open parameters and potentially higher accuracy. In this paper, we evaluate these three alternatives for model learning. Our comparison consists out of the evaluation of learning quality for each regression method using original data from SARCOS robot arm, as well as the robot tracking performance employing learned models. The results show that GPR and SVR achieve a superior learning precision and can be applied for real-time control obtaining higher accuracy. However, for the online learning LWPR presents the better method due to its lower computational requirements.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
The Metric Nearness Problem

Brickell, J., Dhillon, I., Sra, S., Tropp, J.

SIAM Journal on Matrix Analysis and Applications, 30(1):375-396, April 2008 (article)

Abstract
Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set of distances that satisfy the properties of a metric—principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustratio n. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Bayesian Inference and Optimal Design for the Sparse Linear Model

Seeger, MW.

Journal of Machine Learning Research, 9, pages: 759-813, April 2008 (article)

Abstract
The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Consistency of Spectral Clustering

von Luxburg, U., Belkin, M., Bousquet, O.

Annals of Statistics, 36(2):555-586, April 2008 (article)

Abstract
Consistency is a key property of statistical algorithms when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Plant Classification from Bat-Like Echolocation Signals

Yovel, Y., Franz, MO., Stilz, P., Schnitzler, H-U.

PLoS Computational Biology, 4(3, e1000032):1-13, March 2008 (article)

Abstract
Classification of plants according to their echoes is an elementary component of bat behavior that plays an important role in spatial orientation and food acquisition. Vegetation echoes are, however, highly complex stochastic signals: from an acoustical point of view, a plant can be thought of as a three-dimensional array of leaves reflecting the emitted bat call. The received echo is therefore a superposition of many reflections. In this work we suggest that the classification of these echoes might not be such a troublesome routine for bats as formerly thought. We present a rather simple approach to classifying signals from a large database of plant echoes that were created by ensonifying plants with a frequency-modulated bat-like ultrasonic pulse. Our algorithm uses the spectrogram of a single echo from which it only uses features that are undoubtedly accessible to bats. We used a standard machine learning algorithm (SVM) to automatically extract suitable linear combinations of time and frequency cues from the spectrograms such that classification with high accuracy is enabled. This demonstrates that ultrasonic echoes are highly informative about the species membership of an ensonified plant, and that this information can be extracted with rather simple, biologically plausible analysis. Thus, our findings provide a new explanatory basis for the poorly understood observed abilities of bats in classifying vegetation and other complex objects.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Causal Reasoning by Evaluating the Complexity of Conditional Densities with Kernel Methods

Sun, X., Janzing, D., Schölkopf, B.

Neurocomputing, 71(7-9):1248-1256, March 2008 (article)

Abstract
We propose a method to quantify the complexity of conditional probability measures by a Hilbert space seminorm of the logarithm of its density. The concept of reproducing kernel Hilbert spaces (RKHSs) is a flexible tool to define such a seminorm by choosing an appropriate kernel. We present several examples with artificial data sets where our kernel-based complexity measure is consistent with our intuitive understanding of complexity of densities. The intention behind the complexity measure is to provide a new approach to inferring causal directions. The idea is that the factorization of the joint probability measure P(effect, cause) into P(effect|cause)P(cause) leads typically to "simpler" and "smoother" terms than the factorization into P(cause|effect)P(effect). Since the conventional constraint-based approach of causal discovery is not able to determine the causal direction between only two variables, our inference principle can in particular be useful when combined with other existing methods. We provide several simple examples with real-world data where the true causal directions indeed lead to simpler (conditional) densities.

ei

Web DOI [BibTex]


no image
Natural Actor-Critic

Peters, J., Schaal, S.

Neurocomputing, 71(7-9):1180-1190, March 2008 (article)

Abstract
In this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-Critic. The actor updates are achieved using stochastic policy gradients em- ploying Amari’s natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by lin- ear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gra- dients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke’s Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Poisson Geometry of Parabolic Bundles on Elliptic Curves

Balduzzi, D.

International Journal of Mathematics , 19(3):339-367, March 2008 (article)

Abstract
The moduli space of G-bundles on an elliptic curve with additional flag structure admits a Poisson structure. The bivector can be defined using double loop group, loop group and sheaf cohomology constructions. We investigate the links between these methods and for the case SL2 perform explicit computations, describing the bracket and its leaves in detail.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Inferring Spike Trains From Local Field Potentials

Rasch, M., Gretton, A., Murayama, Y., Maass, W., Logothetis, N.

Journal of Neurophysiology, 99(3):1461-1476, March 2008 (article)

Abstract
We investigated whether it is possible to infer spike trains solely on the basis of the underlying local field potentials (LFPs). Using support vector machines and linear regression models, we found that in the primary visual cortex (V1) of monkeys, spikes can indeed be inferred from LFPs, at least with moderate success. Although there is a considerable degree of variation across electrodes, the low-frequency structure in spike trains (in the 100-ms range) can be inferred with reasonable accuracy, whereas exact spike positions are not reliably predicted. Two kinds of features of the LFP are exploited for prediction: the frequency power of bands in the high gamma-range (40–90 Hz) and information contained in lowfrequency oscillations ( 10 Hz), where both phase and power modulations are informative. Information analysis revealed that both features code (mainly) independent aspects of the spike-to-LFP relationship, with the low-frequency LFP phase coding for temporally clustered spiking activity. Although both features and prediction quality are similar during seminatural movie stimuli and spontaneous activity, prediction performance during spontaneous activity degrades much more slowly with increasing electrode distance. The general trend of data obtained with anesthetized animals is qualitatively mirrored in that of a more limited data set recorded in V1 of non-anesthetized monkeys. In contrast to the cortical field potentials, thalamic LFPs (e.g., LFPs derived from recordings in the dorsal lateral geniculate nucleus) hold no useful information for predicting spiking activity.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
ISD: A Software Package for Bayesian NMR Structure Calculation

Rieping, W., Nilges, M., Habeck, M.

Bioinformatics, 24(8):1104-1105, February 2008 (article)

Abstract
SUMMARY: The conventional approach to calculating biomolecular structures from nuclear magnetic resonance (NMR) data is often viewed as subjective due to its dependence on rules of thumb for deriving geometric constraints and suitable values for theory parameters from noisy experimental data. As a result, it can be difficult to judge the precision of an NMR structure in an objective manner. The Inferential Structure Determination (ISD) framework, which has been introduced recently, addresses this problem by using Bayesian inference to derive a probability distribution that represents both the unknown structure and its uncertainty. It also determines additional unknowns, such as theory parameters, that normally need be chosen empirically. Here we give an overview of the ISD software package, which implements this methodology. AVAILABILITY: The program is available at http://www.bioc.cam.ac.uk/isd

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Probabilistic Structure Calculation

Nilges, M., Habeck, M., Rieping, W.

Comptes Rendus Chimie, 11(4-5):356-369, February 2008 (article)

Abstract
Molecular structures are usually calculated from experimental data with some method of energy minimisation or non-linear optimisation. Key aims of a structure calculation are to estimate the coordinate uncertainty, and to provide a meaningful measure of the quality of the fit to the data. We discuss approaches to optimally combine prior information and experimental data and the connection to probability theory. We analyse the appropriate statistics for NOEs and NOE-derived distances, and the related question of restraint potentials. Finally, we will discuss approaches to determine the appropriate weight on the experimental evidence and to obtain in this way an estimate of the data quality from the structure calculation. Whereas objective estimates of coordinates and their uncertainties can only be obtained by a full Bayesian treatment of the problem, standard structure calculation methods continue to play an important role. To obtain the full benefit of these methods, they should be founded on a rigorous Baye sian analysis.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Optimization Techniques for Semi-Supervised Support Vector Machines

Chapelle, O., Sindhwani, V., Keerthi, S.

Journal of Machine Learning Research, 9, pages: 203-233, February 2008 (article)

Abstract
Due to its wide applicability, the problem of semi-supervised classification is attracting increasing attention in machine learning. Semi-Supervised Support Vector Machines (S3VMs) are based on applying the margin maximization principle to both labeled and unlabeled examples. Unlike SVMs, their formulation leads to a non-convex optimization problem. A suite of algorithms have recently been proposed for solving S3VMs. This paper reviews key ideas in this literature. The performance and behavior of various S3VMs algorithms is studied together, under a common experimental setting.

ei

PDF [BibTex]

PDF [BibTex]


no image
Fast Projection-based Methods for the Least Squares Nonnegative Matrix Approximation Problem

Kim, D., Sra, S., Dhillon, I.

Statistical Analysis and Data Mining, 1(1):38-51, February 2008 (article)

Abstract
Nonnegative matrix approximation (NNMA) is a popular matrix decomposition technique that has proven to be useful across a diverse variety of fields with applications ranging from document analysis and image processing to bioinformatics and signal processing. Over the years, several algorithms for NNMA have been proposed, e.g. Lee and Seung‘s multiplicative updates, alternating least squares (ALS), and gradient descent-based procedures. However, most of these procedures suffer from either slow convergence, numerical instability, or at worst, serious theoretical drawbacks. In this paper, we develop a new and improved algorithmic framework for the least-squares NNMA problem, which is not only theoretically well-founded, but also overcomes many deficiencies of other methods. Our framework readily admits powerful optimization techniques and as concrete realizations we present implementations based on the Newton, BFGS and conjugate gradient methods. Our algorithms provide numerical resu lts supe rior to both Lee and Seung‘s method as well as to the alternating least squares heuristic, which was reported to work well in some situations but has no theoretical guarantees[1]. Our approach extends naturally to include regularization and box-constraints without sacrificing convergence guarantees. We present experimental results on both synthetic and real-world datasets that demonstrate the superiority of our methods, both in terms of better approximations as well as computational efficiency.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Unifying Probabilistic Framework for Analyzing Residual Dipolar Couplings

Habeck, M., Nilges, M., Rieping, W.

Journal of Biomolecular NMR, 40(2):135-144, February 2008 (article)

Abstract
Residual dipolar couplings provide complementary information to the nuclear Overhauser effect measurements that are traditionally used in biomolecular structure determination by NMR. In a de novo structure determination, however, lack of knowledge about the degree and orientation of molecular alignment complicates the analysis of dipolar coupling data. We present a probabilistic framework for analyzing residual dipolar couplings and demonstrate that it is possible to estimate the atomic coordinates, the complete molecular alignment tensor, and the error of the couplings simultaneously. As a by-product, we also obtain estimates of the uncertainty in the coordinates and the alignment tensor. We show that our approach encompasses existing methods for determining the alignment tensor as special cases, including least squares estimation, histogram fitting, and elimination of an explicit alignment tensor in the restraint energy.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Contour-propagation Algorithms for Semi-automated Reconstruction of Neural Processes

Macke, J., Maack, N., Gupta, R., Denk, W., Schölkopf, B., Borst, A.

Journal of Neuroscience Methods, 167(2):349-357, January 2008 (article)

Abstract
A new technique, ”Serial Block Face Scanning Electron Microscopy” (SBFSEM), allows for automatic sectioning and imaging of biological tissue with a scanning electron microscope. Image stacks generated with this technology have a resolution sufficient to distinguish different cellular compartments, including synaptic structures, which should make it possible to obtain detailed anatomical knowledge of complete neuronal circuits. Such an image stack contains several thousands of images and is recorded with a minimal voxel size of 10-20nm in the x and y- and 30nm in z-direction. Consequently, a tissue block of 1mm3 (the approximate volume of the Calliphora vicina brain) will produce several hundred terabytes of data. Therefore, highly automated 3D reconstruction algorithms are needed. As a first step in this direction we have developed semiautomated segmentation algorithms for a precise contour tracing of cell membranes. These algorithms were embedded into an easy-to-operate user interface, which allows direct 3D observation of the extracted objects during the segmentation of image stacks. Compared to purely manual tracing, processing time is greatly accelerated.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Quantum-Statistical-Mechanical Extension of Gaussian Mixture Model

Tanaka, K., Tsuda, K.

Journal of Physics: Conference Series, 95(012023):1-9, January 2008 (article)

Abstract
We propose an extension of Gaussian mixture models in the statistical-mechanical point of view. The conventional Gaussian mixture models are formulated to divide all points in given data to some kinds of classes. We introduce some quantum states constructed by superposing conventional classes in linear combinations. Our extension can provide a new algorithm in classifications of data by means of linear response formulas in the statistical mechanics.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Minimal Nonlinear Distortion Principle for Nonlinear Independent Component Analysis

Zhang, K., Chan, L.

Journal of Machine Learning Research, 9, pages: 2455-2487, 2008 (article)

Abstract
It is well known that solutions to the nonlinear independent component analysis (ICA) problem are highly non-unique. In this paper we propose the "minimal nonlinear distortion" (MND) principle for tackling the ill-posedness of nonlinear ICA problems. MND prefers the nonlinear ICA solution with the estimated mixing procedure as close as possible to linear, among all possible solutions. It also helps to avoid local optima in the solutions. To achieve MND, we exploit a regularization term to minimize the mean square error between the nonlinear mixing mapping and the best-fitting linear one. The effect of MND on the inherent trivial and non-trivial indeterminacies in nonlinear ICA solutions is investigated. Moreover, we show that local MND is closely related to the smoothness regularizer penalizing large curvature, which provides another useful regularization condition for nonlinear ICA. Experiments on synthetic data show the usefulness of the MND principle for separating various nonlinear mixtures. Finally, as an application, we use nonlinear ICA with MND to separate daily returns of a set of stocks in Hong Kong, and the linear causal relations among them are successfully discovered. The resulting causal relations give some interesting insights into the stock market. Such a result can not be achieved by linear ICA. Simulation studies also verify that when doing causality discovery, sometimes one should not ignore the nonlinear distortion in the data generation procedure, even if it is weak.

ei

Web [BibTex]

Web [BibTex]


no image
Transport processes in networks with scattering ramification nodes

Radl, A.

Journal of Applied Functional Analysis, 3, pages: 461-483, 2008 (article)

ei

Web [BibTex]

Web [BibTex]


no image
Learning to control in operational space

Peters, J., Schaal, S.

International Journal of Robotics Research, 27, pages: 197-212, 2008, clmc (article)

Abstract
One of the most general frameworks for phrasing control problems for complex, redundant robots is operational space control. However, while this framework is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in com- plex robots, e.g., humanoid robots. In this paper, we suggest a learning approach for opertional space control as a direct inverse model learning problem. A first important insight for this paper is that a physically cor- rect solution to the inverse problem with redundant degrees-of-freedom does exist when learning of the inverse map is performed in a suitable piecewise linear way. The second crucial component for our work is based on the insight that many operational space controllers can be understood in terms of a constrained optimal control problem. The cost function as- sociated with this optimal control problem allows us to formulate a learn- ing algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the machine learning point of view, this learning problem corre- sponds to a reinforcement learning problem that maximizes an immediate reward. We employ an expectation-maximization policy search algorithm in order to solve this problem. Evaluations on a three degrees of freedom robot arm are used to illustrate the suggested approach. The applica- tion to a physically realistic simulator of the anthropomorphic SARCOS Master arm demonstrates feasibility for complex high degree-of-freedom robots. We also show that the proposed method works in the setting of learning resolved motion rate control on real, physical Mitsubishi PA-10 medical robotics arm.

am ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2004


no image
Attentional Modulation of Auditory Event-Related Potentials in a Brain-Computer Interface

Hill, J., Lal, T., Bierig, K., Birbaumer, N., Schölkopf, B.

In BioCAS04, (S3/5/INV- S3/17-20):4, IEEE Computer Society, Los Alamitos, CA, USA, 2004 IEEE International Workshop on Biomedical Circuits and Systems, December 2004 (inproceedings)

Abstract
Motivated by the particular problems involved in communicating with "locked-in" paralysed patients, we aim to develop a brain-computer interface that uses auditory stimuli. We describe a paradigm that allows a user to make a binary decision by focusing attention on one of two concurrent auditory stimulus sequences. Using Support Vector Machine classification and Recursive Channel Elimination on the independent components of averaged event-related potentials, we show that an untrained user‘s EEG data can be classified with an encouragingly high level of accuracy. This suggests that it is possible for users to modulate EEG signals in a single trial by the conscious direction of attention, well enough to be useful in BCI.

ei

PDF Web DOI [BibTex]

2004


PDF Web DOI [BibTex]


no image
On the representation, learning and transfer of spatio-temporal movement characteristics

Ilg, W., Bakir, GH., Mezger, J., Giese, M.

International Journal of Humanoid Robotics, 1(4):613-636, December 2004 (article)

ei

[BibTex]

[BibTex]


no image
Insect-inspired estimation of egomotion

Franz, MO., Chahl, JS., Krapp, HG.

Neural Computation, 16(11):2245-2260, November 2004 (article)

Abstract
Tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during egomotion. In this study, we examine whether a simplified linear model based on the organization principles in tangential neurons can be used to estimate egomotion from the optic flow. We present a theory for the construction of an estimator consisting of a linear combination of optic flow vectors that incorporates prior knowledge both about the distance distribution of the environment, and about the noise and egomotion statistics of the sensor. The estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates are of reasonable quality, albeit less reliable.

ei

PDF PostScript Web DOI [BibTex]

PDF PostScript Web DOI [BibTex]


no image
Efficient face detection by a cascaded support-vector machine expansion

Romdhani, S., Torr, P., Schölkopf, B., Blake, A.

Proceedings of The Royal Society of London A, 460(2501):3283-3297, A, November 2004 (article)

Abstract
We describe a fast system for the detection and localization of human faces in images using a nonlinear ‘support-vector machine‘. We approximate the decision surface in terms of a reduced set of expansion vectors and propose a cascaded evaluation which has the property that the full support-vector expansion is only evaluated on the face-like parts of the image, while the largest part of typical images is classified using a single expansion vector (a simpler and more efficient classifier). As a result, only three reduced-set vectors are used, on average, to classify an image patch. Hence, the cascaded evaluation, presented in this paper, offers a thirtyfold speed-up over an evaluation using the full set of reduced-set vectors, which is itself already thirty times faster than classification using all the support vectors.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Using kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Honkela, A., Harmeling, S., Lundqvist, L., Valpola, H.

In ICA 2004, pages: 790-797, (Editors: Puntonet, C. G., A. Prieto), Springer, Berlin, Germany, Fifth International Conference on Independent Component Analysis and Blind Signal Separation, October 2004 (inproceedings)

Abstract
The variational Bayesian nonlinear blind source separation method introduced by Lappalainen and Honkela in 2000 is initialised with linear principal component analysis (PCA). Because of the multilayer perceptron (MLP) network used to model the nonlinearity, the method is susceptible to local minima and therefore sensitive to the initialisation used. As the method is used for nonlinear separation, the linear initialisation may in some cases lead it astray. In this paper we study the use of kernel PCA (KPCA) in the initialisation. KPCA is a rather straightforward generalisation of linear PCA and it is much faster to compute than the variational Bayesian method. The experiments show that it can produce significantly better initialisations than linear PCA. Additionally, the model comparison methods provided by the variational Bayesian framework can be easily applied to compare different kernels.

ei

DOI [BibTex]

DOI [BibTex]


no image
Robust ICA for Super-Gaussian Sources

Meinecke, F., Harmeling, S., Müller, K.

In ICA 2004, pages: 217-224, (Editors: Puntonet, C. G., A. Prieto), Springer, Berlin, Germany, Fifth International Conference on Independent Component Analysis and Blind Signal Separation, October 2004 (inproceedings)

Abstract
Most ICA algorithms are sensitive to outliers. Instead of robustifying existing algorithms by outlier rejection techniques, we show how a simple outlier index can be used directly to solve the ICA problem for super-Gaussian source signals. This ICA method is outlier-robust by construction and can be used for standard ICA as well as for over-complete ICA (i.e. more source signals than observed signals (mixtures)).

ei

DOI [BibTex]

DOI [BibTex]


no image
Modelling Spikes with Mixtures of Factor Analysers

Görür, D., Rasmussen, C., Tolias, A., Sinz, F., Logothetis, N.

In Pattern Recognition, pages: 391-398, LNCS 3175, (Editors: Rasmussen, C. E. , H.H. Bülthoff, B. Schölkopf, M.A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, September 2004 (inproceedings)

Abstract
Identifying the action potentials of individual neurons from extracellular recordings, known as spike sorting, is a challenging problem. We consider the spike sorting problem using a generative model,mixtures of factor analysers, which concurrently performs clustering and feature extraction. The most important advantage of this method is that it quantifies the certainty with which the spikes are classified. This can be used as a means for evaluating the quality of clustering and therefore spike isolation. Using this method, nearly simultaneously occurring spikes can also be modelled which is a hard task for many of the spike sorting methods. Furthermore, modelling the data with a generative model allows us to generate simulated data.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Learning Depth From Stereo

Sinz, F., Candela, J., BakIr, G., Rasmussen, C., Franz, M.

In 26th DAGM Symposium, pages: 245-252, LNCS 3175, (Editors: Rasmussen, C. E., H. H. Bülthoff, B. Schölkopf, M. A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, September 2004 (inproceedings)

Abstract
We compare two approaches to the problem of estimating the depth of a point in space from observing its image position in two different cameras: 1.~The classical photogrammetric approach explicitly models the two cameras and estimates their intrinsic and extrinsic parameters using a tedious calibration procedure; 2.~A generic machine learning approach where the mapping from image to spatial coordinates is directly approximated by a Gaussian Process regression. Our results show that the generic learning approach, in addition to simplifying the procedure of calibration, can lead to higher depth accuracies than classical calibration although no specific domain knowledge is used.

ei

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
Stability of Hausdorff-based Distance Measures

Shapiro, MD., Blaschko, MB.

In VIIP, pages: 1-6, VIIP, September 2004 (inproceedings)

ei

[BibTex]

[BibTex]


no image
Learning kernels from biological networks by maximizing entropy

Tsuda, K., Noble, W.

Bioinformatics, 20(Suppl. 1):i326-i333, August 2004 (article)

Abstract
Motivation: The diffusion kernel is a general method for computing pairwise distances among all nodes in a graph, based on the sum of weighted paths between each pair of nodes. This technique has been used successfully, in conjunction with kernel-based learning methods, to draw inferences from several types of biological networks. Results: We show that computing the diffusion kernel is equivalent to maximizing the von Neumann entropy, subject to a global constraint on the sum of the Euclidean distances between nodes. This global constraint allows for high variance in the pairwise distances. Accordingly, we propose an alternative, locally constrained diffusion kernel, and we demonstrate that the resulting kernel allows for more accurate support vector machine prediction of protein functional classifications from metabolic and protein–protein interaction networks.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning to Find Graph Pre-Images

BakIr, G., Zien, A., Tsuda, K.

In Pattern Recognition, pages: 253-261, (Editors: Rasmussen, C. E., H. H. Bülthoff, B. Schölkopf, M. A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, August 2004 (inproceedings)

Abstract
The recent development of graph kernel functions has made it possible to apply well-established machine learning methods to graphs. However, to allow for analyses that yield a graph as a result, it is necessary to solve the so-called pre-image problem: to reconstruct a graph from its feature space representation induced by the kernel. Here, we suggest a practical solution to this problem.

ei

PostScript PDF DOI [BibTex]

PostScript PDF DOI [BibTex]


no image
Masking effect produced by Mach bands on the detection of narrow bars of random polarity

Henning, GB., Hoddinott, KT., Wilson-Smith, ZJ., Hill, NJ.

Journal of the Optical Society of America, 21(8):1379-1387, A, August 2004 (article)

ei

[BibTex]

[BibTex]


no image
Gaussian Process Classification for Segmenting and Annotating Sequences

Altun, Y., Hofmann, T., Smola, A.

In Proceedings of the 21st International Conference on Machine Learning (ICML 2004), pages: 25-32, (Editors: Greiner, R. , D. Schuurmans), ACM Press, New York, USA, 21st International Conference on Machine Learning (ICML), July 2004 (inproceedings)

Abstract
Many real-world classification tasks involve the prediction of multiple, inter-dependent class labels. A prototypical case of this sort deals with prediction of a sequence of labels for a sequence of observations. Such problems arise naturally in the context of annotating and segmenting observation sequences. This paper generalizes Gaussian Process classification to predict multiple labels by taking dependencies between neighboring labels into account. Our approach is motivated by the desire to retain rigorous probabilistic semantics, while overcoming limitations of parametric methods like Conditional Random Fields, which exhibit conceptual and computational difficulties in high-dimensional input spaces. Experiments on named entity recognition and pitch accent prediction tasks demonstrate the competitiveness of our approach.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Learning with Non-Positive Kernels

Ong, CS., Mary, X., Canu, S., Smola, AJ.

In ICML 2004, pages: 81-81, ACM Press, New York, NY, USA, Twenty-First International Conference on Machine Learning, July 2004 (inproceedings)

Abstract
n this paper we show that many kernel methods can be adapted to deal with indefinite kernels, that is, kernels which are not positive semidefinite. They do not satisfy Mercer‘s condition and they induce associated functional spaces called Reproducing Kernel Kre&icaron;n Spaces (RKKS), a generalization of Reproducing Kernel Hilbert Spaces (RKHS).Machine learning in RKKS shares many "nice" properties of learning in RKHS, such as orthogonality and projection. However, since the kernels are indefinite, we can no longer minimize the loss, instead we stabilize it. We show a general representer theorem for constrained stabilization and prove generalization bounds by computing the Rademacher averages of the kernel class. We list several examples of indefinite kernels and investigate regularization methods to solve spline interpolation. Some preliminary experiments with indefinite kernels for spline smoothing are reported for truncated spectral factorization, Landweber-Fridman iterations, and MR-II.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Exponential Families for Conditional Random Fields

Altun, Y., Smola, A., Hofmann, T.

In Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2004), pages: 2-9, (Editors: Chickering, D.M. , J.Y. Halpern), Morgan Kaufmann, San Francisco, CA, USA, 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI), July 2004 (inproceedings)

Abstract
In this paper we define conditional random fields in reproducing kernel Hilbert spaces and show connections to Gaussian Process classification. More specifically, we prove decomposition results for undirected graphical models and we give constructions for kernels. Finally we present efficient means of solving the optimization problem using reduced rank decompositions and we show how stationarity can be exploited efficiently in the optimization process.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Using Conditional Random Fields to Predict Pitch Accent in Conversational Speech

Gregory, M., Altun, Y.

In pages: 677-684, (Editors: Scott, D. , W. Daelemans, M. Walker), ACL, East Stroudsburg, PA, USA, 42nd Annual Meeting of the Association for Computational Linguistics (ACL), July 2004 (inproceedings)

Abstract
The detection of prosodic characteristics is an important aspect of both speech synthesis and speech recognition. Correct placement of pitch accents aids in more natural sounding speech, while automatic detection of accents can contribute to better wordlevel recognition and better textual understanding. In this paper we investigate probabilistic, contextual, and phonological factors that influence pitch accent placement in natural, conversational speech in a sequence labeling setting. We introduce Conditional Random Fields (CRFs) to pitch accent prediction task in order to incorporate these factors efficiently in a sequence model. We demonstrate the usefulness and the incremental effect of these factors in a sequence model by performing experiments on hand labeled data from the Switchboard Corpus. Our model outperforms the baseline and previous models of pitch accent prediction on the Switchboard Corpus.

ei

Web [BibTex]

Web [BibTex]


no image
Support vector machine learning for interdependent and structured output spaces

Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.

In pages: 1-8, (Editors: Greiner, R. , D. Schuurmans), AAAI Press, Menlo Park, CA, USA, Twenty-first International Conference on Machine Learning (ICML), July 2004 (inproceedings)

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
PAC-Bayesian Generic Chaining

Audibert, J., Bousquet, O.

In Advances in Neural Information Processing Systems 16, pages: 1125-1132 , (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
There exist many different generalization error bounds for classification. Each of these bounds contains an improvement over the others for certain situations. Our goal is to combine these different improvements into a single bound. In particular we combine the PAC-Bayes approach introduced by McAllester, which is interesting for averaging classifiers, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand. This combination is quite natural since the generic chaining is based on the notion of majorizing measures, which can be considered as priors on the set of classifiers, and such priors also arise in the PAC-bayesian setting.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Prediction on Spike Data Using Kernel Algorithms

Eichhorn, J., Tolias, A., Zien, A., Kuss, M., Rasmussen, C., Weston, J., Logothetis, N., Schölkopf, B.

In Advances in Neural Information Processing Systems 16, pages: 1367-1374, (Editors: S Thrun and LK Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We report and compare the performance of different learning algorithms based on data from cortical recordings. The task is to predict the orientation of visual stimuli from the activity of a population of simultaneously recorded neurons. We compare several ways of improving the coding of the input (i.e., the spike data) as well as of the output (i.e., the orientation), and report the results obtained using different kernel algorithms.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Warped Gaussian Processes

Snelson, E., Rasmussen, CE., Ghahramani, Z.

In Advances in Neural Information Processing Systems 16, pages: 337-344, (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We generalise the Gaussian process (GP) framework for regression by learning a nonlinear transformation of the GP outputs. This allows for non-Gaussian processes and non-Gaussian noise. The learning algorithm chooses a nonlinear transformation such that transformed data is well-modelled by a GP. This can be seen as including a preprocessing transformation as an integral part of the probabilistic modelling problem, rather than as an ad-hoc step. We demonstrate on several real regression problems that learning the transformation can lead to significantly better performance than using a regular GP, or a GP with a fixed transformation.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Ranking on Data Manifolds

Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B.

In Advances in neural information processing systems 16, pages: 169-176, (Editors: S Thrun and L Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
The Google search engine has enjoyed a huge success with its web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the web using random walks. Here we propose a simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data. The core idea of our method is to rank the data with respect to the intrinsic manifold structure collectively revealed by a great amount of data. Encouraging experimental results from synthetic, image, and text data illustrate the validity of our method.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Support Vector Channel Selection in BCI

Lal, T., Schröder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Schölkopf, B.

IEEE Transactions on Biomedical Engineering, 51(6):1003-1010, June 2004 (article)

Abstract
Designing a Brain Computer Interface (BCI) system one can choose from a variety of features that may be useful for classifying brain activity during a mental task. For the special case of classifying EEG signals we propose the usage of the state of the art feature selection algorithms Recursive Feature Elimination and Zero-Norm Optimization which are based on the training of Support Vector Machines (SVM). These algorithms can provide more accurate solutions than standard filter methods for feature selection. We adapt the methods for the purpose of selecting EEG channels. For a motor imagery paradigm we show that the number of used channels can be reduced significantly without increasing the classification error. The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks. Furthermore we show how time dependent task specific information can be visualized.

ei

DOI [BibTex]

DOI [BibTex]


no image
Distance-Based Classification with Lipschitz Functions

von Luxburg, U., Bousquet, O.

Journal of Machine Learning Research, 5, pages: 669-695, June 2004 (article)

Abstract
The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. To analyze the resulting algorithm, we prove several representer theorems. They state that there always exist solutions of the Lipschitz classifier which can be expressed in terms of distance functions to training points. We provide generalization bounds for Lipschitz classifiers in terms of the Rademacher complexities of some Lipschitz function classes. The generality of our approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz classifier, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.

ei

PDF PostScript PDF [BibTex]

PDF PostScript PDF [BibTex]


no image
Gaussian Processes in Reinforcement Learning

Rasmussen, C., Kuss, M.

In Advances in Neural Information Processing Systems 16, pages: 751-759, (Editors: Thrun, S., L. K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning with Local and Global Consistency

Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 16, pages: 321-328, (Editors: S Thrun and LK Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning to Find Pre-Images

Bakir, G., Weston, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 16, pages: 449-456, (Editors: S Thrun and LK Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids difficult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Measure Based Regularization

Bousquet, O., Chapelle, O., Hein, M.

In Advances in Neural Information Processing Systems 16, pages: 1221-1228, (Editors: Thrun, S., L. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We address in this paper the question of how the knowledge of the marginal distribution $P(x)$ can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Insights from Machine Learning Applied to Human Visual Classification

Graf, A., Wichmann, F.

In Advances in Neural Information Processing Systems 16, pages: 905-912, (Editors: Thrun, S., L. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We attempt to understand visual classification in humans using both psychophysical and machine learning techniques. Frontal views of human faces were used for a gender classification task. Human subjects classified the faces and their gender judgment, reaction time and confidence rating were recorded. Several hyperplane learning algorithms were used on the same classification task using the Principal Components of the texture and flowfield representation of the faces. The classification performance of the learning algorithms was estimated using the face database with the true gender of the faces as labels, and also with the gender estimated by the subjects. We then correlated the human responses to the distance of the stimuli to the separating hyperplane of the learning algorithms. Our results suggest that human classification can be modeled by some hyperplane algorithms in the feature space we used. For classification, the brain needs more processing for stimuli close to that hyperplane than for those further away.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Image Construction by Linear Programming

Tsuda, K., Rätsch, G.

In Advances in Neural Information Processing Systems 16, pages: 57-64, (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by 1-norm penalization and update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be solved efficiently. Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g. sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are able to show the ν-property also for this extended LP leading a method which is easy to use. Experimental results impressively demonstrate the power of our approach.

ei

PDF Web [BibTex]

PDF Web [BibTex]