Header logo is


2008


no image
Kernel Methods in Machine Learning

Hofmann, T., Schölkopf, B., Smola, A.

Annals of Statistics, 36(3):1171-1220, June 2008 (article)

Abstract
We review machine learning methods employing positive definite kernels. These methods formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of function has the benefit of facilitating the construction and analysis of learning algorithms while at the same time allowing large classes of functions. The latter include nonlinear functions as well as functions defined on nonvectorial data.

ei

PDF PDF DOI [BibTex]

2008


PDF PDF DOI [BibTex]


no image
Cross-validation Optimization for Large Scale Structured Classification Kernel Methods

Seeger, M.

Journal of Machine Learning Research, 9, pages: 1147-1178, June 2008 (article)

Abstract
We propose a highly efficient framework for penalized likelihood kernel methods applied to multi-class models with a large, structured set of classes. As opposed to many previous approaches which try to decompose the fitting problem into many smaller ones, we focus on a Newton optimization of the complete model, making use of model structure and linear conjugate gradients in order to approximate Newton search directions. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels, and focusing code optimization efforts to these primitives only. Kernel parameters are learned automatically, by maximizing the cross-validation log likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate our approach on large scale text classification tasks with hierarchical structure on thousands of classes, achieving state-of-the-art results in an order of magnitude less time than previous work.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Reinforcement Learning of Motor Skills with Policy Gradients

Peters, J., Schaal, S.

Neural Networks, 21(4):682-697, May 2008 (article)

ei

PDF Web DOI [BibTex]


no image
Information Consistency of Nonparametric Gaussian Process Methods

Seeger, MW., Kakade, SM., Foster, DP.

IEEE Transactions on Information Theory, 54(5):2376-2382, May 2008 (article)

Abstract
Abstract—Bayesian nonparametric models are widely and successfully used for statistical prediction. While posterior consistency properties are well studied in quite general settings, results have been proved using abstract concepts such as metric entropy, and they come with subtle conditions which are hard to validate and not intuitive when applied to concrete models. Furthermore, convergence rates are difficult to obtain. By focussing on the concept of information consistency for Bayesian Gaussian process (GP)models, consistency results and convergence rates are obtained via a regret bound on cumulative log loss. These results depend strongly on the covariance function of the prior process, thereby giving a novel interpretation to penalization with reproducing kernel Hilbert space norms and to commonly used covariance function classes and their parameters. The proof of the main result employs elementary convexity arguments only. A theorem of Widom is used in order to obtain precise convergence rates for several covariance functions widely used in practice.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Relating the Thermodynamic Arrow of Time to the Causal Arrow

Allahverdyan, A., Janzing, D.

Journal of Statistical Mechanics, 2008(P04001):1-21, April 2008 (article)

Abstract
Consider a Hamiltonian system that consists of a slow subsystem S and a fast subsystem F. The autonomous dynamics of S is driven by an effective Hamiltonian, but its thermodynamics is unexpected. We show that a well-defined thermodynamic arrow of time (second law) emerges for S whenever there is a well-defined causal arrow from S to F and the back-action is negligible. This is because the back-action of F on S is described by a non-globally Hamiltonian Born–Oppenheimer term that violates the Liouville theorem, and makes the second law inapplicable to S. If S and F are mixing, under the causal arrow condition they are described by microcanonical distributions P(S) and P(S|F). Their structure supports a causal inference principle proposed recently in machine learning.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Generalization and Similarity in Exemplar Models of Categorization: Insights from Machine Learning

Jäkel, F., Schölkopf, B., Wichmann, F.

Psychonomic Bulletin and Review, 15(2):256-271, April 2008 (article)

Abstract
Exemplar theories of categorization depend on similarity for explaining subjects’ ability to generalize to new stimuli. A major criticism of exemplar theories concerns their lack of abstraction mechanisms and thus, seemingly, generalization ability. Here, we use insights from machine learning to demonstrate that exemplar models can actually generalize very well. Kernel methods in machine learning are akin to exemplar models and very successful in real-world applications. Their generalization performance depends crucially on the chosen similaritymeasure. While similarity plays an important role in describing generalization behavior it is not the only factor that controls generalization performance. In machine learning, kernel methods are often combined with regularization techniques to ensure good generalization. These same techniques are easily incorporated in exemplar models. We show that the Generalized Context Model (Nosofsky, 1986) and ALCOVE (Kruschke, 1992) are closely related to a statistical model called kernel logistic regression. We argue that generalization is central to the enterprise of understanding categorization behavior and suggest how insights from machine learning can offer some guidance. Keywords: kernel, similarity, regularization, generalization, categorization.

ei

PDF Web DOI [BibTex]


no image
Manifold-valued Thin-plate Splines with Applications in Computer Graphics

Steinke, F., Hein, M., Peters, J., Schölkopf, B.

Computer Graphics Forum, 27(2):437-448, April 2008 (article)

Abstract
We present a generalization of thin-plate splines for interpolation and approximation of manifold-valued data, and demonstrate its usefulness in computer graphics with several applications from different fields. The cornerstone of our theoretical framework is an energy functional for mappings between two Riemannian manifolds which is independent of parametrization and respects the geometry of both manifolds. If the manifolds are Euclidean, the energy functional reduces to the classical thin-plate spline energy. We show how the resulting optimization problems can be solved efficiently in many cases. Our example applications range from orientation interpolation and motion planning in animation over geometric modelling tasks to color interpolation.

ei

PDF AVI Web DOI [BibTex]


no image
The Metric Nearness Problem

Brickell, J., Dhillon, I., Sra, S., Tropp, J.

SIAM Journal on Matrix Analysis and Applications, 30(1):375-396, April 2008 (article)

Abstract
Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set of distances that satisfy the properties of a metric—principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustratio n. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Bayesian Inference and Optimal Design for the Sparse Linear Model

Seeger, MW.

Journal of Machine Learning Research, 9, pages: 759-813, April 2008 (article)

Abstract
The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Consistency of Spectral Clustering

von Luxburg, U., Belkin, M., Bousquet, O.

Annals of Statistics, 36(2):555-586, April 2008 (article)

Abstract
Consistency is a key property of statistical algorithms when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Plant Classification from Bat-Like Echolocation Signals

Yovel, Y., Franz, MO., Stilz, P., Schnitzler, H-U.

PLoS Computational Biology, 4(3, e1000032):1-13, March 2008 (article)

Abstract
Classification of plants according to their echoes is an elementary component of bat behavior that plays an important role in spatial orientation and food acquisition. Vegetation echoes are, however, highly complex stochastic signals: from an acoustical point of view, a plant can be thought of as a three-dimensional array of leaves reflecting the emitted bat call. The received echo is therefore a superposition of many reflections. In this work we suggest that the classification of these echoes might not be such a troublesome routine for bats as formerly thought. We present a rather simple approach to classifying signals from a large database of plant echoes that were created by ensonifying plants with a frequency-modulated bat-like ultrasonic pulse. Our algorithm uses the spectrogram of a single echo from which it only uses features that are undoubtedly accessible to bats. We used a standard machine learning algorithm (SVM) to automatically extract suitable linear combinations of time and frequency cues from the spectrograms such that classification with high accuracy is enabled. This demonstrates that ultrasonic echoes are highly informative about the species membership of an ensonified plant, and that this information can be extracted with rather simple, biologically plausible analysis. Thus, our findings provide a new explanatory basis for the poorly understood observed abilities of bats in classifying vegetation and other complex objects.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Causal Reasoning by Evaluating the Complexity of Conditional Densities with Kernel Methods

Sun, X., Janzing, D., Schölkopf, B.

Neurocomputing, 71(7-9):1248-1256, March 2008 (article)

Abstract
We propose a method to quantify the complexity of conditional probability measures by a Hilbert space seminorm of the logarithm of its density. The concept of reproducing kernel Hilbert spaces (RKHSs) is a flexible tool to define such a seminorm by choosing an appropriate kernel. We present several examples with artificial data sets where our kernel-based complexity measure is consistent with our intuitive understanding of complexity of densities. The intention behind the complexity measure is to provide a new approach to inferring causal directions. The idea is that the factorization of the joint probability measure P(effect, cause) into P(effect|cause)P(cause) leads typically to "simpler" and "smoother" terms than the factorization into P(cause|effect)P(effect). Since the conventional constraint-based approach of causal discovery is not able to determine the causal direction between only two variables, our inference principle can in particular be useful when combined with other existing methods. We provide several simple examples with real-world data where the true causal directions indeed lead to simpler (conditional) densities.

ei

Web DOI [BibTex]


no image
Natural Actor-Critic

Peters, J., Schaal, S.

Neurocomputing, 71(7-9):1180-1190, March 2008 (article)

Abstract
In this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-Critic. The actor updates are achieved using stochastic policy gradients em- ploying Amari’s natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by lin- ear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gra- dients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke’s Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Inferring Spike Trains From Local Field Potentials

Rasch, M., Gretton, A., Murayama, Y., Maass, W., Logothetis, N.

Journal of Neurophysiology, 99(3):1461-1476, March 2008 (article)

Abstract
We investigated whether it is possible to infer spike trains solely on the basis of the underlying local field potentials (LFPs). Using support vector machines and linear regression models, we found that in the primary visual cortex (V1) of monkeys, spikes can indeed be inferred from LFPs, at least with moderate success. Although there is a considerable degree of variation across electrodes, the low-frequency structure in spike trains (in the 100-ms range) can be inferred with reasonable accuracy, whereas exact spike positions are not reliably predicted. Two kinds of features of the LFP are exploited for prediction: the frequency power of bands in the high gamma-range (40–90 Hz) and information contained in lowfrequency oscillations ( 10 Hz), where both phase and power modulations are informative. Information analysis revealed that both features code (mainly) independent aspects of the spike-to-LFP relationship, with the low-frequency LFP phase coding for temporally clustered spiking activity. Although both features and prediction quality are similar during seminatural movie stimuli and spontaneous activity, prediction performance during spontaneous activity degrades much more slowly with increasing electrode distance. The general trend of data obtained with anesthetized animals is qualitatively mirrored in that of a more limited data set recorded in V1 of non-anesthetized monkeys. In contrast to the cortical field potentials, thalamic LFPs (e.g., LFPs derived from recordings in the dorsal lateral geniculate nucleus) hold no useful information for predicting spiking activity.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
ISD: A Software Package for Bayesian NMR Structure Calculation

Rieping, W., Nilges, M., Habeck, M.

Bioinformatics, 24(8):1104-1105, February 2008 (article)

Abstract
SUMMARY: The conventional approach to calculating biomolecular structures from nuclear magnetic resonance (NMR) data is often viewed as subjective due to its dependence on rules of thumb for deriving geometric constraints and suitable values for theory parameters from noisy experimental data. As a result, it can be difficult to judge the precision of an NMR structure in an objective manner. The Inferential Structure Determination (ISD) framework, which has been introduced recently, addresses this problem by using Bayesian inference to derive a probability distribution that represents both the unknown structure and its uncertainty. It also determines additional unknowns, such as theory parameters, that normally need be chosen empirically. Here we give an overview of the ISD software package, which implements this methodology. AVAILABILITY: The program is available at http://www.bioc.cam.ac.uk/isd

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Probabilistic Structure Calculation

Nilges, M., Habeck, M., Rieping, W.

Comptes Rendus Chimie, 11(4-5):356-369, February 2008 (article)

Abstract
Molecular structures are usually calculated from experimental data with some method of energy minimisation or non-linear optimisation. Key aims of a structure calculation are to estimate the coordinate uncertainty, and to provide a meaningful measure of the quality of the fit to the data. We discuss approaches to optimally combine prior information and experimental data and the connection to probability theory. We analyse the appropriate statistics for NOEs and NOE-derived distances, and the related question of restraint potentials. Finally, we will discuss approaches to determine the appropriate weight on the experimental evidence and to obtain in this way an estimate of the data quality from the structure calculation. Whereas objective estimates of coordinates and their uncertainties can only be obtained by a full Bayesian treatment of the problem, standard structure calculation methods continue to play an important role. To obtain the full benefit of these methods, they should be founded on a rigorous Baye sian analysis.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Fast Projection-based Methods for the Least Squares Nonnegative Matrix Approximation Problem

Kim, D., Sra, S., Dhillon, I.

Statistical Analysis and Data Mining, 1(1):38-51, February 2008 (article)

Abstract
Nonnegative matrix approximation (NNMA) is a popular matrix decomposition technique that has proven to be useful across a diverse variety of fields with applications ranging from document analysis and image processing to bioinformatics and signal processing. Over the years, several algorithms for NNMA have been proposed, e.g. Lee and Seung‘s multiplicative updates, alternating least squares (ALS), and gradient descent-based procedures. However, most of these procedures suffer from either slow convergence, numerical instability, or at worst, serious theoretical drawbacks. In this paper, we develop a new and improved algorithmic framework for the least-squares NNMA problem, which is not only theoretically well-founded, but also overcomes many deficiencies of other methods. Our framework readily admits powerful optimization techniques and as concrete realizations we present implementations based on the Newton, BFGS and conjugate gradient methods. Our algorithms provide numerical resu lts supe rior to both Lee and Seung‘s method as well as to the alternating least squares heuristic, which was reported to work well in some situations but has no theoretical guarantees[1]. Our approach extends naturally to include regularization and box-constraints without sacrificing convergence guarantees. We present experimental results on both synthetic and real-world datasets that demonstrate the superiority of our methods, both in terms of better approximations as well as computational efficiency.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Unifying Probabilistic Framework for Analyzing Residual Dipolar Couplings

Habeck, M., Nilges, M., Rieping, W.

Journal of Biomolecular NMR, 40(2):135-144, February 2008 (article)

Abstract
Residual dipolar couplings provide complementary information to the nuclear Overhauser effect measurements that are traditionally used in biomolecular structure determination by NMR. In a de novo structure determination, however, lack of knowledge about the degree and orientation of molecular alignment complicates the analysis of dipolar coupling data. We present a probabilistic framework for analyzing residual dipolar couplings and demonstrate that it is possible to estimate the atomic coordinates, the complete molecular alignment tensor, and the error of the couplings simultaneously. As a by-product, we also obtain estimates of the uncertainty in the coordinates and the alignment tensor. We show that our approach encompasses existing methods for determining the alignment tensor as special cases, including least squares estimation, histogram fitting, and elimination of an explicit alignment tensor in the restraint energy.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Contour-propagation Algorithms for Semi-automated Reconstruction of Neural Processes

Macke, J., Maack, N., Gupta, R., Denk, W., Schölkopf, B., Borst, A.

Journal of Neuroscience Methods, 167(2):349-357, January 2008 (article)

Abstract
A new technique, ”Serial Block Face Scanning Electron Microscopy” (SBFSEM), allows for automatic sectioning and imaging of biological tissue with a scanning electron microscope. Image stacks generated with this technology have a resolution sufficient to distinguish different cellular compartments, including synaptic structures, which should make it possible to obtain detailed anatomical knowledge of complete neuronal circuits. Such an image stack contains several thousands of images and is recorded with a minimal voxel size of 10-20nm in the x and y- and 30nm in z-direction. Consequently, a tissue block of 1mm3 (the approximate volume of the Calliphora vicina brain) will produce several hundred terabytes of data. Therefore, highly automated 3D reconstruction algorithms are needed. As a first step in this direction we have developed semiautomated segmentation algorithms for a precise contour tracing of cell membranes. These algorithms were embedded into an easy-to-operate user interface, which allows direct 3D observation of the extracted objects during the segmentation of image stacks. Compared to purely manual tracing, processing time is greatly accelerated.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Quantum-Statistical-Mechanical Extension of Gaussian Mixture Model

Tanaka, K., Tsuda, K.

Journal of Physics: Conference Series, 95(012023):1-9, January 2008 (article)

Abstract
We propose an extension of Gaussian mixture models in the statistical-mechanical point of view. The conventional Gaussian mixture models are formulated to divide all points in given data to some kinds of classes. We introduce some quantum states constructed by superposing conventional classes in linear combinations. Our extension can provide a new algorithm in classifications of data by means of linear response formulas in the statistical mechanics.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Learning to control in operational space

Peters, J., Schaal, S.

International Journal of Robotics Research, 27, pages: 197-212, 2008, clmc (article)

Abstract
One of the most general frameworks for phrasing control problems for complex, redundant robots is operational space control. However, while this framework is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in com- plex robots, e.g., humanoid robots. In this paper, we suggest a learning approach for opertional space control as a direct inverse model learning problem. A first important insight for this paper is that a physically cor- rect solution to the inverse problem with redundant degrees-of-freedom does exist when learning of the inverse map is performed in a suitable piecewise linear way. The second crucial component for our work is based on the insight that many operational space controllers can be understood in terms of a constrained optimal control problem. The cost function as- sociated with this optimal control problem allows us to formulate a learn- ing algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the machine learning point of view, this learning problem corre- sponds to a reinforcement learning problem that maximizes an immediate reward. We employ an expectation-maximization policy search algorithm in order to solve this problem. Evaluations on a three degrees of freedom robot arm are used to illustrate the suggested approach. The applica- tion to a physically realistic simulator of the anthropomorphic SARCOS Master arm demonstrates feasibility for complex high degree-of-freedom robots. We also show that the proposed method works in the setting of learning resolved motion rate control on real, physical Mitsubishi PA-10 medical robotics arm.

am ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2003


no image
Concentration Inequalities for Sub-Additive Functions Using the Entropy Method

Bousquet, O.

Stochastic Inequalities and Applications, 56, pages: 213-247, Progress in Probability, (Editors: Giné, E., C. Houdré and D. Nualart), November 2003 (article)

Abstract
We obtain exponential concentration inequalities for sub-additive functions of independent random variables under weak conditions on the increments of those functions, like the existence of exponential moments for these increments. As a consequence of these general inequalities, we obtain refinements of Talagrand's inequality for empirical processes and new bounds for randomized empirical processes. These results are obtained by further developing the entropy method introduced by Ledoux.

ei

PostScript [BibTex]

2003


PostScript [BibTex]


no image
Statistical Learning Theory, Capacity and Complexity

Schölkopf, B.

Complexity, 8(4):87-94, July 2003 (article)

Abstract
We give an exposition of the ideas of statistical learning theory, followed by a discussion of how a reinterpretation of the insights of learning theory could potentially also benefit our understanding of a certain notion of complexity.

ei

Web DOI [BibTex]


no image
Dealing with large Diagonals in Kernel Matrices

Weston, J., Schölkopf, B., Eskin, E., Leslie, C., Noble, W.

Annals of the Institute of Statistical Mathematics, 55(2):391-408, June 2003 (article)

Abstract
In kernel methods, all the information about the training data is contained in the Gram matrix. If this matrix has large diagonal values, which arises for many types of kernels, then kernel methods do not perform well: We propose and test several methods for dealing with this problem by reducing the dynamic range of the matrix while preserving the positive definiteness of the Hessian of the quadratic programming problem that one has to solve when training a Support Vector Machine, which is a common kernel approach for pattern recognition.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
The em Algorithm for Kernel Matrix Completion with Auxiliary Data

Tsuda, K., Akaho, S., Asai, K.

Journal of Machine Learning Research, 4, pages: 67-81, May 2003 (article)

ei

PDF [BibTex]

PDF [BibTex]


no image
Constructing Descriptive and Discriminative Non-linear Features: Rayleigh Coefficients in Kernel Feature Spaces

Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., Müller, K.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):623-628, May 2003 (article)

Abstract
We incorporate prior knowledge to construct nonlinear algorithms for invariant feature extraction and discrimination. Employing a unified framework in terms of a nonlinearized variant of the Rayleigh coefficient, we propose nonlinear generalizations of Fisher‘s discriminant and oriented PCA using support vector kernel functions. Extensive simulations show the utility of our approach.

ei

DOI [BibTex]

DOI [BibTex]


no image
Tractable Inference for Probabilistic Data Models

Csato, L., Opper, M., Winther, O.

Complexity, 8(4):64-68, April 2003 (article)

Abstract
We present an approximation technique for probabilistic data models with a large number of hidden variables, based on ideas from statistical physics. We give examples for two nontrivial applications. © 2003 Wiley Periodicals, Inc.

ei

PDF GZIP Web [BibTex]

PDF GZIP Web [BibTex]


no image
Feature selection and transduction for prediction of molecular bioactivity for drug design

Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A., Schölkopf, B.

Bioinformatics, 19(6):764-771, April 2003 (article)

Abstract
Motivation: In drug discovery a key task is to identify characteristics that separate active (binding) compounds from inactive (non-binding) ones. An automated prediction system can help reduce resources necessary to carry out this task. Results: Two methods for prediction of molecular bioactivity for drug design are introduced and shown to perform well in a data set previously studied as part of the KDD (Knowledge Discovery and Data Mining) Cup 2001. The data is characterized by very few positive examples, a very large number of features (describing three-dimensional properties of the molecules) and rather different distributions between training and test data. Two techniques are introduced specifically to tackle these problems: a feature selection method for unbalanced data and a classifier which adapts to the distribution of the the unlabeled test data (a so-called transductive method). We show both techniques improve identification performance and in conjunction provide an improvement over using only one of the techniques. Our results suggest the importance of taking into account the characteristics in this data which may also be relevant in other problems of a similar type.

ei

Web [BibTex]


no image
Use of the Zero-Norm with Linear Models and Kernel Methods

Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.

Journal of Machine Learning Research, 3, pages: 1439-1461, March 2003 (article)

Abstract
We explore the use of the so-called zero-norm of the parameters of linear models in learning. Minimization of such a quantity has many uses in a machine learning context: for variable or feature selection, minimizing training error and ensuring sparsity in solutions. We derive a simple but practical method for achieving these goals and discuss its relationship to existing techniques of minimizing the zero-norm. The method boils down to implementing a simple modification of vanilla SVM, namely via an iterative multiplicative rescaling of the training data. Applications we investigate which aid our discussion include variable and feature selection on biological microarray data, and multicategory classification.

ei

PDF PostScript PDF [BibTex]

PDF PostScript PDF [BibTex]


no image
An Introduction to Variable and Feature Selection.

Guyon, I., Elisseeff, A.

Journal of Machine Learning, 3, pages: 1157-1182, 2003 (article)

ei

[BibTex]

[BibTex]


no image
New Approaches to Statistical Learning Theory

Bousquet, O.

Annals of the Institute of Statistical Mathematics, 55(2):371-389, 2003 (article)

Abstract
We present new tools from probability theory that can be applied to the analysis of learning algorithms. These tools allow to derive new bounds on the generalization performance of learning algorithms and to propose alternative measures of the complexity of the learning task, which in turn can be used to derive new learning algorithms.

ei

PostScript [BibTex]

PostScript [BibTex]

2001


no image
Regularized principal manifolds

Smola, A., Mika, S., Schölkopf, B., Williamson, R.

Journal of Machine Learning Research, 1, pages: 179-209, June 2001 (article)

Abstract
Many settings of unsupervised learning can be viewed as quantization problems - the minimization of the expected quantization error subject to some restrictions. This allows the use of tools such as regularization from the theory of (supervised) risk minimization for unsupervised learning. This setting turns out to be closely related to principal curves, the generative topographic map, and robust coding. We explore this connection in two ways: (1) we propose an algorithm for finding principal manifolds that can be regularized in a variety of ways; and (2) we derive uniform convergence bounds and hence bounds on the learning rates of the algorithm. In particular, we give bounds on the covering numbers which allows us to obtain nearly optimal learning rates for certain types of regularization operators. Experimental results demonstrate the feasibility of the approach.

ei

PDF [BibTex]

2001


PDF [BibTex]


no image
The psychometric function: II. Bootstrap-based confidence intervals and sampling

Wichmann, F., Hill, N.

Perception and Psychophysics, 63 (8), pages: 1314-1329, 2001 (article)

ei

PDF [BibTex]

PDF [BibTex]


no image
The psychometric function: I. Fitting, sampling and goodness-of-fit

Wichmann, F., Hill, N.

Perception and Psychophysics, 63 (8), pages: 1293-1313, 2001 (article)

Abstract
The psychometric function relates an observer'sperformance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions, (2) assessing the goodness of fit, and (3) providing confidence intervals for the function'sparameters and other estimates derived from them, for the purposes of hypothesis testing. The present paper deals with the first two topics, describing a constrained maximum-likelihood method of parameter estimation and developing several goodness-of-fit tests. Using Monte Carlo simulations, we deal with two specific difficulties that arise when fitting functions to psychophysical data. First, we note that human observers are prone to stimulus-independent errors (or lapses ). We show that failure to account for this can lead to serious biases in estimates of the psychometric function'sparameters and illustrate how the problem may be overcome. Second, we note that psychophysical data sets are usually rather small by the standards required by most of the commonly applied statistical tests. We demonstrate the potential errors of applying traditional X^2 methods to psychophysical data and advocate use of Monte Carlo resampling techniques that do not rely on asymptotic theory. We have made available the software to implement our methods

ei

PDF [BibTex]

PDF [BibTex]


no image
Markovian domain fingerprinting: statistical segmentation of protein sequences

Bejerano, G., Seldin, Y., Margalit, H., Tishby, N.

Bioinformatics, 17(10):927-934, 2001 (article)

ei

PDF Web [BibTex]

PDF Web [BibTex]