Header logo is


2004


no image
A kernel view of the dimensionality reduction of manifolds

Ham, J., Lee, D., Mika, S., Schölkopf, B.

In Proceedings of the Twenty-First International Conference on Machine Learning, pages: 369-376, (Editors: CE Brodley), ACM, New York, NY, USA, ICML, 2004, also appeared as MPI-TR 110 (inproceedings)

Abstract
We interpret several well-known algorithms for dimensionality reduction of manifolds as kernel methods. Isomap, graph Laplacian eigenmap, and locally linear embedding (LLE) all utilize local neighborhood information to construct a global embedding of the manifold. We show how all three algorithms can be described as kernel PCA on specially constructed Gram matrices, and illustrate the similarities and differences between the algorithms with representative examples.

ei

PDF [BibTex]

2004


PDF [BibTex]


no image
Protein Functional Class Prediction with a Combined Graph

Shin, H., Tsuda, K., Schölkopf, B.

In Proceedings of the Korean Data Mining Conference, pages: 200-219, Proceedings of the Korean Data Mining Conference, 2004 (inproceedings)

Abstract
In bioinformatics, there exist multiple descriptions of graphs for the same set of genes or proteins. For instance, in yeast systems, graph edges can represent different relationships such as protein-protein interactions, genetic interactions, or co-participation in a protein complex, etc. Relying on similarities between nodes, each graph can be used independently for prediction of protein function. However, since different graphs contain partly independent and partly complementary information about the problem at hand, one can enhance the total information extracted by combining all graphs. In this paper, we propose a method for integrating multiple graphs within a framework of semi-supervised learning. The method alternates between minimizing the objective function with respect to network output and with respect to combining weights. We apply the method to the task of protein functional class prediction in yeast. The proposed method performs significantly better than the same algorithm trained on any single graph.

ei

PDF [BibTex]

PDF [BibTex]


no image
Learning from Labeled and Unlabeled Data Using Random Walks

Zhou, D., Schölkopf, B.

In Pattern Recognition, Proceedings of the 26th DAGM Symposium, pages: 237-244, (Editors: Rasmussen, C.E., H.H. Bülthoff, M.A. Giese and B. Schölkopf), Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004 (inproceedings)

Abstract
We consider the general problem of learning from labeled and unlabeled data. Given a set of points, some of them are labeled, and the remaining points are unlabeled. The goal is to predict the labels of the unlabeled points. Any supervised learning algorithm can be applied to this problem, for instance, Support Vector Machines (SVMs). The problem of our interest is if we can implement a classifier which uses the unlabeled data information in some way and has higher accuracy than the classifiers which use the labeled data only. Recently we proposed a simple algorithm, which can substantially benefit from large amounts of unlabeled data and demonstrates clear superiority to supervised learning methods. In this paper we further investigate the algorithm using random walks and spectral graph theory, which shed light on the key steps in this algorithm.

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Minimizing the Cross Validation Error to Mix Kernel Matrices of Heterogeneous Biological Data

Tsuda, K., Uda, S., Kin, T., Asai, K.

Neural Processing Letters, 19, pages: 63-72, 2004 (article)

ei

PDF [BibTex]

PDF [BibTex]


no image
A Tutorial on Support Vector Regression

Smola, A., Schölkopf, B.

Statistics and Computing, 14(3):199-222, 2004 (article)

ei

Web [BibTex]

Web [BibTex]


no image
Multivariate Regression via Stiefel Manifold Constraints

BakIr, G., Gretton, A., Franz, M., Schölkopf, B.

In Lecture Notes in Computer Science, Vol. 3175, pages: 262-269, (Editors: CE Rasmussen and HH Bülthoff and B Schölkopf and MA Giese), Springer, Berlin, Germany, Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004 (inproceedings)

Abstract
We introduce a learning technique for regression between high-dimensional spaces. Standard methods typically reduce this task to many one-dimensional problems, with each output dimension considered independently. By contrast, in our approach the feature construction and the regression estimation are performed jointly, directly minimizing a loss function that we specify, subject to a rank constraint. A major advantage of this approach is that the loss is no longer chosen according to the algorithmic requirements, but can be tailored to the characteristics of the task at hand; the features will then be optimal with respect to this objective, and dependence between the outputs can be exploited.

ei

PostScript [BibTex]

PostScript [BibTex]


no image
Implicit estimation of Wiener series

Franz, M., Schölkopf, B.

In Machine Learning for Signal Processing XIV, Proc. 2004 IEEE Signal Processing Society Workshop, pages: 735-744, (Editors: A Barros and J Principe and J Larsen and T Adali and S Douglas), IEEE, New York, Machine Learning for Signal Processing XIV, Proc. IEEE Signal Processing Society Workshop, 2004 (inproceedings)

Abstract
The Wiener series is one of the standard methods to systematically characterize the nonlinearity of a system. The classical estimation method of the expansion coefficients via cross-correlation suffers from severe problems that prevent its application to high-dimensional and strongly nonlinear systems. We propose an implicit estimation method based on regression in a reproducing kernel Hilbert space that alleviates these problems. Experiments show performance advantages in terms of convergence, interpretability, and system sizes that can be handled.

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Hilbertian Metrics on Probability Measures and their Application in SVM’s

Hein, H., Lal, T., Bousquet, O.

In Pattern Recognition, Proceedings of th 26th DAGM Symposium, 3175, pages: 270-277, Lecture Notes in Computer Science, (Editors: Rasmussen, C. E., H. H. Bülthoff, M. Giese and B. Schölkopf), Pattern Recognition, Proceedings of th 26th DAGM Symposium, 2004 (inproceedings)

Abstract
The goal of this article is to investigate the field of Hilbertian metrics on probability measures. Since they are very versatile and can therefore be applied in various problems they are of great interest in kernel methods. Quit recently Tops{o}e and Fuglede introduced a family of Hilbertian metrics on probability measures. We give basic properties of the Hilbertian metrics of this family and other used metrics in the literature. Then we propose an extension of the considered metrics which incorporates structural information of the probability space into the Hilbertian metric. Finally we compare all proposed metrics in an image and text classification problem using histogram data.

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Gasussian process model based predictive control

Kocijan, J., Murray-Smith, R., Rasmussen, CE., Girard, A.

In Proceedings of the ACC 2004, pages: 2214-2219, Proceedings of the ACC, 2004 (inproceedings)

Abstract
Gaussian process models provide a probabilistic non-parametric modelling approach for black-box identi cation of non-linear dynamic systems. The Gaussian processes can highlight areas of the input space where prediction quality is poor, due to the lack of data or its complexity, by indicating the higher variance around the predicted mean. Gaussian process models contain noticeably less coef cients to be optimised. This paper illustrates possible application of Gaussian process models within model-based predictive control. The extra information provided within Gaussian process model is used in predictive control, where optimisation of control signal takes the variance information into account. The predictive control principle is demonstrated on control of pH process benchmark.

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
A New Variational Framework for Rigid-Body Alignment

Kato, T., Tsuda, K., Tomii, K., Asai, K.

In Joint IAPR International Workshops on Syntactical and Structural Pattern Recognition (SSPR 2004) and Statistical Pattern Recognition (SPR 2004), pages: 171-179, (Editors: Fred, A.,T. Caelli, R.P.W. Duin, A. Campilho and D. de Ridder), Joint IAPR International Workshops on Syntactical and Structural Pattern Recognition (SSPR) and Statistical Pattern Recognition (SPR), 2004 (inproceedings)

ei

PDF [BibTex]

PDF [BibTex]


no image
Bayesian analysis of the Scatterometer Wind Retrieval Inverse Problem: Some New Approaches

Cornford, D., Csato, L., Evans, D., Opper, M.

Journal of the Royal Statistical Society B, 66, pages: 1-17, 3, 2004 (article)

Abstract
The retrieval of wind vectors from satellite scatterometer observations is a non-linear inverse problem.A common approach to solving inverse problems is to adopt a Bayesian framework and to infer the posterior distribution of the parameters of interest given the observations by using a likelihood model relating the observations to the parameters, and a prior distribution over the parameters.We show how Gaussian process priors can be used efficiently with a variety of likelihood models, using local forward (observation) models and direct inverse models for the scatterometer.We present an enhanced Markov chain Monte Carlo method to sample from the resulting multimodal posterior distribution.We go on to show how the computational complexity of the inference can be controlled by using a sparse, sequential Bayes algorithm for estimation with Gaussian processes.This helps to overcome the most serious barrier to the use of probabilistic, Gaussian process methods in remote sensing inverse problems, which is the prohibitively large size of the data sets.We contrast the sampling results with the approximations that are found by using the sparse, sequential Bayes algorithm.

ei

PDF [BibTex]

PDF [BibTex]


no image
Feature Selection for Support Vector Machines Using Genetic Algorithms

Fröhlich, H., Chapelle, O., Schölkopf, B.

International Journal on Artificial Intelligence Tools (Special Issue on Selected Papers from the 15th IEEE International Conference on Tools with Artificial Intelligence 2003), 13(4):791-800, 2004 (article)

ei

Web [BibTex]

Web [BibTex]


no image
Semi-supervised kernel regression using whitened function classes

Franz, M., Kwon, Y., Rasmussen, C., Schölkopf, B.

In Pattern Recognition, Proceedings of the 26th DAGM Symposium, Lecture Notes in Computer Science, Vol. 3175, LNCS 3175, pages: 18-26, (Editors: CE Rasmussen and HH Bülthoff and MA Giese and B Schölkopf), Springer, Berlin, Gerrmany, 26th DAGM Symposium, 2004 (inproceedings)

Abstract
The use of non-orthonormal basis functions in ridge regression leads to an often undesired non-isotropic prior in function space. In this study, we investigate an alternative regularization technique that results in an implicit whitening of the basis functions by penalizing directions in function space with a large prior variance. The regularization term is computed from unlabelled input data that characterizes the input distribution. Tests on two datasets using polynomial basis functions showed an improved average performance compared to standard ridge regression.

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Maximal Margin Classification for Metric Spaces

Hein, M., Bousquet, O.

In Learning Theory and Kernel Machines, pages: 72-86, (Editors: Schölkopf, B. and Warmuth, M. K.), Springer, Heidelberg, Germany, 16. Annual Conference on Computational Learning Theory / COLT Kernel, 2004 (inproceedings)

Abstract
In this article we construct a maximal margin classification algorithm for arbitrary metric spaces. At first we show that the Support Vector Machine (SVM) is a maximal margin algorithm for the class of metric spaces where the negative squared distance is conditionally positive definite (CPD). This means that the metric space can be isometrically embedded into a Hilbert space, where one performs linear maximal margin separation. We will show that the solution only depends on the metric, but not on the kernel. Following the framework we develop for the SVM, we construct an algorithm for maximal margin classification in arbitrary metric spaces. The main difference compared with SVM is that we no longer embed isometrically into a Hilbert space, but a Banach space. We further give an estimate of the capacity of the function class involved in this algorithm via Rademacher averages. We recover an algorithm of Graepel et al. [6].

ei

PDF PostScript PDF DOI [BibTex]

PDF PostScript PDF DOI [BibTex]


no image
On the Convergence of Spectral Clustering on Random Samples: The Normalized Case

von Luxburg, U., Bousquet, O., Belkin, M.

In Proceedings of the 17th Annual Conference on Learning Theory, pages: 457-471, Proceedings of the 17th Annual Conference on Learning Theory, 2004 (inproceedings)

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Phenotypic Characterization of Human Chondrocyte Cell Line C-20/A4: A Comparison between Monolayer and Alginate Suspension Culture

Finger, F., Schorle, C., Söder, S., Zien, A., Goldring, M., Aigner, T.

Cells Tissues Organs, 178(2):65-77, 2004 (article)

Abstract
DNA microarray analysis was used to investigate the molecular phenotype of one of the first human chondrocyte cell lines, C-20/A4, derived from juvenile costal chondrocytes by immortalization with origin-defective simian virus 40 large T antigen. Clontech Human Cancer Arrays 1.2 and quantitative PCR were used to examine gene expression profiles of C-20/A4 cells cultured in the presence of serum in monolayer and alginate beads. In monolayer cultures, genes involved in cell proliferation were strongly upregulated compared to those expressed by human adult articular chondrocytes in primary culture. Of the cell cycle-regulated genes, only two, the CDK regulatory subunit and histone H4, were downregulated after culture in alginate beads, consistent with the ability of these cells to proliferate in suspension culture. In contrast, the expression of several genes that are involved in pericellular matrix formation, including MMP-14, COL6A1, fibronectin, biglycan and decorin, was upregulated when the C-20/A4 cells were transferred to suspension culture in alginate. Also, nexin-1, vimentin, and IGFBP-3, which are known to be expressed by primary chondrocytes, were differentially expressed in our study. Consistent with the proliferative phenotype of this cell line, few genes involved in matrix synthesis and turnover were highly expressed in the presence of serum. These results indicate that immortalized chondrocyte cell lines, rather than substituting for primary chondrocytes, may serve as models for extending findings on chondrocyte function not achievable by the use of primary chondrocytes.

ei

[BibTex]

[BibTex]


no image
Kernel Methods and their Potential Use in Signal Processing

Perez-Cruz, F., Bousquet, O.

IEEE Signal Processing Magazine, (Special issue on Signal Processing for Mining), 2004 (article) Accepted

ei

PostScript [BibTex]

PostScript [BibTex]

2003


no image
Concentration Inequalities for Sub-Additive Functions Using the Entropy Method

Bousquet, O.

Stochastic Inequalities and Applications, 56, pages: 213-247, Progress in Probability, (Editors: Giné, E., C. Houdré and D. Nualart), November 2003 (article)

Abstract
We obtain exponential concentration inequalities for sub-additive functions of independent random variables under weak conditions on the increments of those functions, like the existence of exponential moments for these increments. As a consequence of these general inequalities, we obtain refinements of Talagrand's inequality for empirical processes and new bounds for randomized empirical processes. These results are obtained by further developing the entropy method introduced by Ledoux.

ei

PostScript [BibTex]

2003


PostScript [BibTex]


no image
On the Complexity of Learning the Kernel Matrix

Bousquet, O., Herrmann, D.

In Advances in Neural Information Processing Systems 15, pages: 399-406, (Editors: Becker, S. , S. Thrun, K. Obermayer), The MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We investigate data based procedures for selecting the kernel when learning with Support Vector Machines. We provide generalization error bounds by estimating the Rademacher complexities of the corresponding function classes. In particular we obtain a complexity bound for function classes induced by kernels with given eigenvectors, i.e., we allow to vary the spectrum and keep the eigenvectors fix. This bound is only a logarithmic factor bigger than the complexity of the function class induced by a single kernel. However, optimizing the margin over such classes leads to overfitting. We thus propose a suitable way of constraining the class. We use an efficient algorithm to solve the resulting optimization problem, present preliminary experimental results, and compare them to an alignment-based approach.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Cluster Kernels for Semi-Supervised Learning

Chapelle, O., Weston, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 15, pages: 585-592, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We propose a framework to incorporate unlabeled data in kernel classifier, based on the idea that two points in the same cluster are more likely to have the same label. This is achieved by modifying the eigenspectrum of the kernel matrix. Experimental results assess the validity of this approach.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Mismatch String Kernels for SVM Protein Classification

Leslie, C., Eskin, E., Weston, J., Noble, W.

In Advances in Neural Information Processing Systems 15, pages: 1417-1424, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Dependency Estimation

Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.

In Advances in Neural Information Processing Systems 15, pages: 873-880, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Linear Combinations of Optic Flow Vectors for Estimating Self-Motion: a Real-World Test of a Neural Model

Franz, MO., Chahl, JS.

In Advances in Neural Information Processing Systems 15, pages: 1319-1326, (Editors: Becker, S., S. Thrun and K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion. In this study, we examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an estimator consisting of a linear combination of optic flow vectors that incorporates prior knowledge both about the distance distribution of the environment, and about the noise and self-motion statistics of the sensor. The estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates turn out to be less reliable.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Clustering with the Fisher score

Tsuda, K., Kawanabe, M., Müller, K.

In Advances in Neural Information Processing Systems 15, pages: 729-736, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Marginalized Kernels between Labeled Graphs

Kashima, H., Tsuda, K., Inokuchi, A.

In 20th International Conference on Machine Learning, pages: 321-328, (Editors: Faucett, T. and N. Mishra), 20th International Conference on Machine Learning, August 2003 (inproceedings)

ei

PDF [BibTex]

PDF [BibTex]


no image
Sparse Gaussian Processes: inference, subspace identification and model selection

Csato, L., Opper, M.

In Proceedings, pages: 1-6, (Editors: Van der Hof, , Wahlberg), The Netherlands, 13th IFAC Symposium on System Identifiaction, August 2003, electronical version; Index ThA02-2 (inproceedings)

Abstract
Gaussian Process (GP) inference is a probabilistic kernel method where the GP is treated as a latent function. The inference is carried out using the Bayesian online learning and its extension to the more general iterative approach which we call TAP/EP learning. Sparsity is introduced in this context to make the TAP/EP method applicable to large datasets. We address the prohibitive scaling of the number of parameters by defining a subset of the training data that is used as the support the GP, thus the number of required parameters is independent of the training set, similar to the case of ``Support--‘‘ or ``Relevance--Vectors‘‘. An advantage of the full probabilistic treatment is that allows the computation of the marginal data likelihood or evidence, leading to hyper-parameter estimation within the GP inference. An EM algorithm to choose the hyper-parameters is proposed. The TAP/EP learning is the E-step and the M-step then updates the hyper-parameters. Due to the sparse E-step the resulting algorithm does not involve manipulation of large matrices. The presented algorithm is applicable to a wide variety of likelihood functions. We present results of applying the algorithm on classification and nonstandard regression problems for artificial and real datasets.

ei

PDF GZIP [BibTex]

PDF GZIP [BibTex]


no image
Adaptive, Cautious, Predictive control with Gaussian Process Priors

Murray-Smith, R., Sbarbaro, D., Rasmussen, CE., Girard, A.

In Proceedings of the 13th IFAC Symposium on System Identification, pages: 1195-1200, (Editors: Van den Hof, P., B. Wahlberg and S. Weiland), Proceedings of the 13th IFAC Symposium on System Identification, August 2003 (inproceedings)

Abstract
Nonparametric Gaussian Process models, a Bayesian statistics approach, are used to implement a nonlinear adaptive control law. Predictions, including propagation of the state uncertainty are made over a k-step horizon. The expected value of a quadratic cost function is minimised, over this prediction horizon, without ignoring the variance of the model predictions. The general method and its main features are illustrated on a simulation example.

ei

PDF [BibTex]

PDF [BibTex]


no image
On the Representation, Learning and Transfer of Spatio-Temporal Movement Characteristics

Ilg, W., Bakir, GH., Mezger, J., Giese, MA.

In Humanoids Proceedings, pages: 0-0, Humanoids Proceedings, July 2003, electronical version (inproceedings)

Abstract
In this paper we present a learning-based approach for the modelling of complex movement sequences. Based on the method of Spatio-Temporal Morphable Models (STMMS. We derive a hierarchical algorithm that, in a first step, identifies automatically movement elements in movement sequences based on a coarse spatio-temporal description, and in a second step models these movement primitives by approximation through linear combinations of learned example movement trajectories. We describe the different steps of the algorithm and show how it can be applied for modelling and synthesis of complex sequences of human movements that contain movement elements with variable style. The proposed method is demonstrated on different applications of movement representation relevant for imitation learning of movement styles in humanoid robotics.

ei

PDF [BibTex]

PDF [BibTex]


no image
Statistical Learning Theory, Capacity and Complexity

Schölkopf, B.

Complexity, 8(4):87-94, July 2003 (article)

Abstract
We give an exposition of the ideas of statistical learning theory, followed by a discussion of how a reinterpretation of the insights of learning theory could potentially also benefit our understanding of a certain notion of complexity.

ei

Web DOI [BibTex]


no image
Dealing with large Diagonals in Kernel Matrices

Weston, J., Schölkopf, B., Eskin, E., Leslie, C., Noble, W.

Annals of the Institute of Statistical Mathematics, 55(2):391-408, June 2003 (article)

Abstract
In kernel methods, all the information about the training data is contained in the Gram matrix. If this matrix has large diagonal values, which arises for many types of kernels, then kernel methods do not perform well: We propose and test several methods for dealing with this problem by reducing the dynamic range of the matrix while preserving the positive definiteness of the Hessian of the quadratic programming problem that one has to solve when training a Support Vector Machine, which is a common kernel approach for pattern recognition.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
The em Algorithm for Kernel Matrix Completion with Auxiliary Data

Tsuda, K., Akaho, S., Asai, K.

Journal of Machine Learning Research, 4, pages: 67-81, May 2003 (article)

ei

PDF [BibTex]

PDF [BibTex]


no image
Constructing Descriptive and Discriminative Non-linear Features: Rayleigh Coefficients in Kernel Feature Spaces

Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., Müller, K.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):623-628, May 2003 (article)

Abstract
We incorporate prior knowledge to construct nonlinear algorithms for invariant feature extraction and discrimination. Employing a unified framework in terms of a nonlinearized variant of the Rayleigh coefficient, we propose nonlinear generalizations of Fisher‘s discriminant and oriented PCA using support vector kernel functions. Extensive simulations show the utility of our approach.

ei

DOI [BibTex]

DOI [BibTex]


no image
A case based comparison of identification with neural network and Gaussian process models.

Kocijan, J., Banko, B., Likar, B., Girard, A., Murray-Smith, R., Rasmussen, CE.

In Proceedings of the International Conference on Intelligent Control Systems and Signal Processing ICONS 2003, 1, pages: 137-142, (Editors: Ruano, E.A.), Proceedings of the International Conference on Intelligent Control Systems and Signal Processing ICONS, April 2003 (inproceedings)

Abstract
In this paper an alternative approach to black-box identification of non-linear dynamic systems is compared with the more established approach of using artificial neural networks. The Gaussian process prior approach is a representative of non-parametric modelling approaches. It was compared on a pH process modelling case study. The purpose of modelling was to use the model for control design. The comparison revealed that even though Gaussian process models can be effectively used for modelling dynamic systems caution has to be axercised when signals are selected.

ei

PDF [BibTex]

PDF [BibTex]


no image
On-Line One-Class Support Vector Machines. An Application to Signal Segmentation

Gretton, A., Desobry, ..

In IEEE ICASSP Vol. 2, pages: 709-712, IEEE ICASSP, April 2003 (inproceedings)

Abstract
In this paper, we describe an efficient algorithm to sequentially update a density support estimate obtained using one-class support vector machines. The solution provided is an exact solution, which proves to be far more computationally attractive than a batch approach. This deterministic technique is applied to the problem of audio signal segmentation, with simulations demonstrating the computational performance gain on toy data sets, and the accuracy of the segmentation on audio signals.

ei

PostScript [BibTex]

PostScript [BibTex]


no image
The Kernel Mutual Information

Gretton, A., Herbrich, R., Smola, A.

In IEEE ICASSP Vol. 4, pages: 880-883, IEEE ICASSP, April 2003 (inproceedings)

Abstract
We introduce a new contrast function, the kernel mutual information (KMI), to measure the degree of independence of continuous random variables. This contrast function provides an approximate upper bound on the mutual information, as measured near independence, and is based on a kernel density estimate of the mutual information between a discretised approximation of the continuous random variables. We show that Bach and Jordan‘s kernel generalised variance (KGV) is also an upper bound on the same kernel density estimate, but is looser. Finally, we suggest that the addition of a regularising term in the KGV causes it to approach the KMI, which motivates the introduction of this regularisation.

ei

PostScript [BibTex]

PostScript [BibTex]


no image
Tractable Inference for Probabilistic Data Models

Csato, L., Opper, M., Winther, O.

Complexity, 8(4):64-68, April 2003 (article)

Abstract
We present an approximation technique for probabilistic data models with a large number of hidden variables, based on ideas from statistical physics. We give examples for two nontrivial applications. © 2003 Wiley Periodicals, Inc.

ei

PDF GZIP Web [BibTex]

PDF GZIP Web [BibTex]


no image
Feature selection and transduction for prediction of molecular bioactivity for drug design

Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A., Schölkopf, B.

Bioinformatics, 19(6):764-771, April 2003 (article)

Abstract
Motivation: In drug discovery a key task is to identify characteristics that separate active (binding) compounds from inactive (non-binding) ones. An automated prediction system can help reduce resources necessary to carry out this task. Results: Two methods for prediction of molecular bioactivity for drug design are introduced and shown to perform well in a data set previously studied as part of the KDD (Knowledge Discovery and Data Mining) Cup 2001. The data is characterized by very few positive examples, a very large number of features (describing three-dimensional properties of the molecules) and rather different distributions between training and test data. Two techniques are introduced specifically to tackle these problems: a feature selection method for unbalanced data and a classifier which adapts to the distribution of the the unlabeled test data (a so-called transductive method). We show both techniques improve identification performance and in conjunction provide an improvement over using only one of the techniques. Our results suggest the importance of taking into account the characteristics in this data which may also be relevant in other problems of a similar type.

ei

Web [BibTex]


no image
Use of the Zero-Norm with Linear Models and Kernel Methods

Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.

Journal of Machine Learning Research, 3, pages: 1439-1461, March 2003 (article)

Abstract
We explore the use of the so-called zero-norm of the parameters of linear models in learning. Minimization of such a quantity has many uses in a machine learning context: for variable or feature selection, minimizing training error and ensuring sparsity in solutions. We derive a simple but practical method for achieving these goals and discuss its relationship to existing techniques of minimizing the zero-norm. The method boils down to implementing a simple modification of vanilla SVM, namely via an iterative multiplicative rescaling of the training data. Applications we investigate which aid our discussion include variable and feature selection on biological microarray data, and multicategory classification.

ei

PDF PostScript PDF [BibTex]

PDF PostScript PDF [BibTex]


no image
Hierarchical Spatio-Temporal Morphable Models for Representation of complex movements for Imitation Learning

Ilg, W., Bakir, GH., Franz, MO., Giese, M.

In 11th International Conference on Advanced Robotics, (2):453-458, (Editors: Nunes, U., A. de Almeida, A. Bejczy, K. Kosuge and J.A.T. Machado), 11th International Conference on Advanced Robotics, January 2003 (inproceedings)

ei

PDF [BibTex]

PDF [BibTex]


no image
An Introduction to Variable and Feature Selection.

Guyon, I., Elisseeff, A.

Journal of Machine Learning, 3, pages: 1157-1182, 2003 (article)

ei

[BibTex]

[BibTex]


no image
Feature Selection for Support Vector Machines by Means of Genetic Algorithms

Fröhlich, H., Chapelle, O., Schölkopf, B.

In 15th IEEE International Conference on Tools with AI, pages: 142-148, 15th IEEE International Conference on Tools with AI, 2003 (inproceedings)

ei

[BibTex]

[BibTex]


no image
Propagation of Uncertainty in Bayesian Kernel Models - Application to Multiple-Step Ahead Forecasting

Quiñonero-Candela, J., Girard, A., Larsen, J., Rasmussen, CE.

In IEEE International Conference on Acoustics, Speech and Signal Processing, 2, pages: 701-704, IEEE International Conference on Acoustics, Speech and Signal Processing, 2003 (inproceedings)

Abstract
The object of Bayesian modelling is the predictive distribution, which in a forecasting scenario enables improved estimates of forecasted values and their uncertainties. In this paper we focus on reliably estimating the predictive mean and variance of forecasted values using Bayesian kernel based models such as the Gaussian Process and the Relevance Vector Machine. We derive novel analytic expressions for the predictive mean and variance for Gaussian kernel shapes under the assumption of a Gaussian input distribution in the static case, and of a recursive Gaussian predictive density in iterative forecasting. The capability of the method is demonstrated for forecasting of time-series and compared to approximate methods.

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Unsupervised Clustering of Images using their Joint Segmentation

Seldin, Y., Starik, S., Werman, M.

In The 3rd International Workshop on Statistical and Computational Theories of Vision (SCTV 2003), pages: 1-24, 3rd International Workshop on Statistical and Computational Theories of Vision (SCTV), 2003 (inproceedings)

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Methods and Their Applications to Signal Processing

Bousquet, O., Perez-Cruz, F.

In Proceedings. (ICASSP ‘03), Special Session on Kernel Methods, pages: 860 , ICASSP, 2003 (inproceedings)

Abstract
Recently introduced in Machine Learning, the notion of kernels has drawn a lot of interest as it allows to obtain non-linear algorithms from linear ones in a simple and elegant manner. This, in conjunction with the introduction of new linear classification methods such as the Support Vector Machines has produced significant progress. The successes of such algorithms is now spreading as they are applied to more and more domains. Many Signal Processing problems, by their non-linear and high-dimensional nature may benefit from such techniques. We give an overview of kernel methods and their recent applications.

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Predictive control with Gaussian process models

Kocijan, J., Murray-Smith, R., Rasmussen, CE., Likar, B.

In Proceedings of IEEE Region 8 Eurocon 2003: Computer as a Tool, pages: 352-356, (Editors: Zajc, B. and M. Tkal), Proceedings of IEEE Region 8 Eurocon: Computer as a Tool, 2003 (inproceedings)

Abstract
This paper describes model-based predictive control based on Gaussian processes.Gaussian process models provide a probabilistic non-parametric modelling approach for black-box identification of non-linear dynamic systems. It offers more insight in variance of obtained model response, as well as fewer parameters to determine than other models. The Gaussian processes can highlight areas of the input space where prediction quality is poor, due to the lack of data or its complexity, by indicating the higher variance around the predicted mean. This property is used in predictive control, where optimisation of control signal takes the variance information into account. The predictive control principle is demonstrated on a simulated example of nonlinear system.

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
New Approaches to Statistical Learning Theory

Bousquet, O.

Annals of the Institute of Statistical Mathematics, 55(2):371-389, 2003 (article)

Abstract
We present new tools from probability theory that can be applied to the analysis of learning algorithms. These tools allow to derive new bounds on the generalization performance of learning algorithms and to propose alternative measures of the complexity of the learning task, which in turn can be used to derive new learning algorithms.

ei

PostScript [BibTex]

PostScript [BibTex]


no image
Distance-based classification with Lipschitz functions

von Luxburg, U., Bousquet, O.

In Learning Theory and Kernel Machines, Proceedings of the 16th Annual Conference on Computational Learning Theory, pages: 314-328, (Editors: Schölkopf, B. and M.K. Warmuth), Learning Theory and Kernel Machines, Proceedings of the 16th Annual Conference on Computational Learning Theory, 2003 (inproceedings)

Abstract
The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. Our approach leads to a general large margin algorithm for classification in metric spaces. To analyze this algorithm, we first prove a representer theorem. It states that there exists a solution which can be expressed as linear combination of distances to sets of training points. Then we analyze the Rademacher complexity of some Lipschitz function classes. The generality of the Lipschitz approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz algorithm, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]