Header logo is


2005


no image
Experimentally optimal v in support vector regression for different noise models and parameter settings

Chalimourda, A., Schölkopf, B., Smola, A.

Neural Networks, 18(2):205-205, March 2005 (article)

ei

PDF DOI [BibTex]

2005


PDF DOI [BibTex]


no image
Active Learning for Parzen Window Classifier

Chapelle, O.

In AISTATS 2005, pages: 49-56, (Editors: Cowell, R. , Z. Ghahramani), Tenth International Workshop on Artificial Intelligence and Statistics (AI & Statistics), January 2005 (inproceedings)

Abstract
The problem of active learning is approached in this paper by minimizing directly an estimate of the expected test error. The main difficulty in this ``optimal'' strategy is that output probabilities need to be estimated accurately. We suggest here different methods for estimating those efficiently. In this context, the Parzen window classifier is considered because it is both simple and probabilistic. The analysis of experimental results highlights that regularization is a key ingredient for this strategy.

ei

Web [BibTex]

Web [BibTex]


no image
Semi-Supervised Classification by Low Density Separation

Chapelle, O., Zien, A.

In AISTATS 2005, pages: 57-64, (Editors: Cowell, R. , Z. Ghahramani), Tenth International Workshop on Artificial Intelligence and Statistics (AI & Statistics), January 2005 (inproceedings)

Abstract
We believe that the cluster assumption is key to successful semi-supervised learning. Based on this, we propose three semi-supervised algorithms: 1. deriving graph-based distances that emphazise low density regions between clusters, followed by training a standard SVM; 2. optimizing the Transductive SVM objective function, which places the decision boundary in low density regions, by gradient descent; 3. combining the first two to make maximum use of the cluster assumption. We compare with state of the art algorithms and demonstrate superior accuracy for the latter two methods.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Constrained Covariance for Dependence Measurement

Gretton, A., Smola, A., Bousquet, O., Herbrich, R., Belitski, A., Augath, M., Murayama, Y., Pauls, J., Schölkopf, B., Logothetis, N.

In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pages: 112-119, (Editors: R Cowell, R and Z Ghahramani), AISTATS, January 2005 (inproceedings)

Abstract
We discuss reproducing kernel Hilbert space (RKHS)-based measures of statistical dependence, with emphasis on constrained covariance (COCO), a novel criterion to test dependence of random variables. We show that COCO is a test for independence if and only if the associated RKHSs are universal. That said, no independence test exists that can distinguish dependent and independent random variables in all circumstances. Dependent random variables can result in a COCO which is arbitrarily close to zero when the source densities are highly non-smooth. All current kernel-based independence tests share this behaviour. We demonstrate exponential convergence between the population and empirical COCO. Finally, we use COCO as a measure of joint neural activity between voxels in MRI recordings of the macaque monkey, and compare the results to the mutual information and the correlation. We also show the effect of removing breathing artefacts from the MRI recording.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Hilbertian Metrics and Positive Definite Kernels on Probability Measures

Hein, M., Bousquet, O.

In AISTATS 2005, pages: 136-143, (Editors: Cowell, R. , Z. Ghahramani), Tenth International Workshop on Artificial Intelligence and Statistics (AI & Statistics), January 2005 (inproceedings)

Abstract
We investigate the problem of defining Hilbertian metrics resp. positive definite kernels on probability measures, continuing previous work. This type of kernels has shown very good results in text classification and has a wide range of possible applications. In this paper we extend the two-parameter family of Hilbertian metrics of Topsoe such that it now includes all commonly used Hilbertian metrics on probability measures. This allows us to do model selection among these metrics in an elegant and unified way. Second we investigate further our approach to incorporate similarity information of the probability space into the kernel. The analysis provides a better understanding of these kernels and gives in some cases a more efficient way to compute them. Finally we compare all proposed kernels in two text and two image classification problems.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Invariance of Neighborhood Relation under Input Space to Feature Space Mapping

Shin, H., Cho, S.

Pattern Recognition Letters, 26(6):707-718, 2005 (article)

Abstract
If the training pattern set is large, it takes a large memory and a long time to train support vector machine (SVM). Recently, we proposed neighborhood property based pattern selection algorithm (NPPS) which selects only the patterns that are likely to be near the decision boundary ahead of SVM training [Proc. of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Lecture Notes in Artificial Intelligence (LNAI 2637), Seoul, Korea, pp. 376–387]. NPPS tries to identify those patterns that are likely to become support vectors in feature space. Preliminary reports show its effectiveness: SVM training time was reduced by two orders of magnitude with almost no loss in accuracy for various datasets. It has to be noted, however, that decision boundary of SVM and support vectors are all defined in feature space while NPPS described above operates in input space. If neighborhood relation in input space is not preserved in feature space, NPPS may not always be effective. In this paper, we sh ow that the neighborhood relation is invariant under input to feature space mapping. The result assures that the patterns selected by NPPS in input space are likely to be located near decision boundary in feature space.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Intrinsic Dimensionality Estimation of Submanifolds in Euclidean space

Hein, M., Audibert, Y.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 289 , (Editors: De Raedt, L. , S. Wrobel), ICML Bonn, 2005 (inproceedings)

Abstract
We present a new method to estimate the intrinsic dimensionality of a submanifold M in Euclidean space from random samples. The method is based on the convergence rates of a certain U-statistic on the manifold. We solve at least partially the question of the choice of the scale of the data. Moreover the proposed method is easy to implement, can handle large data sets and performs very well even for small sample sizes. We compare the proposed method to two standard estimators on several artificial as well as real data sets.

ei

PDF [BibTex]

PDF [BibTex]


no image
Large Scale Genomic Sequence SVM Classifiers

Sonnenburg, S., Rätsch, G., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 849-856, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
In genomic sequence analysis tasks like splice site recognition or promoter identification, large amounts of training sequences are available, and indeed needed to achieve sufficiently high classification performances. In this work we study two recently proposed and successfully used kernels, namely the Spectrum kernel and the Weighted Degree kernel (WD). In particular, we suggest several extensions using Suffix Trees and modi cations of an SMO-like SVM training algorithm in order to accelerate the training of the SVMs and their evaluation on test sequences. Our simulations show that for the spectrum kernel and WD kernel, large scale SVM training can be accelerated by factors of 20 and 4 times, respectively, while using much less memory (e.g. no kernel caching). The evaluation on new sequences is often several thousand times faster using the new techniques (depending on the number of Support Vectors). Our method allows us to train on sets as large as one million sequences.

ei

PDF [BibTex]

PDF [BibTex]


no image
Joint Kernel Maps

Weston, J., Schölkopf, B., Bousquet, O.

In Proceedings of the 8th InternationalWork-Conference on Artificial Neural Networks, LNCS 3512, pages: 176-191, (Editors: J Cabestany and A Prieto and F Sandoval), Springer, Berlin Heidelberg, Germany, IWANN, 2005 (inproceedings)

Abstract
We develop a methodology for solving high dimensional dependency estimation problems between pairs of data types, which is viable in the case where the output of interest has very high dimension, e.g., thousands of dimensions. This is achieved by mapping the objects into continuous or discrete spaces, using joint kernels. Known correlations between input and output can be defined by such kernels, some of which can maintain linearity in the outputs to provide simple (closed form) pre-images. We provide examples of such kernels and empirical results.

ei

PostScript DOI [BibTex]

PostScript DOI [BibTex]


no image
Analysis of Some Methods for Reduced Rank Gaussian Process Regression

Quinonero Candela, J., Rasmussen, C.

In Switching and Learning in Feedback Systems, pages: 98-127, (Editors: Murray Smith, R. , R. Shorten), Springer, Berlin, Germany, European Summer School on Multi-Agent Control, 2005 (inproceedings)

Abstract
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning the covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also review the Sparse Greedy GP (SGGP) approximation (Smola and Bartlett, 2001), which is a way of learning the support set for given hyperparameters based on approximating the posterior. We propose an alternative method to the SGGP that has better generalization capabilities. Finally we make experiments to compare the different ways of training a RRGP. We provide some Matlab code for learning RRGPs.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Theory of Classification: A Survey of Some Recent Advances

Boucheron, S., Bousquet, O., Lugosi, G.

ESAIM: Probability and Statistics, 9, pages: 323 , 2005 (article)

Abstract
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have lead to these important recent developments.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
From Graphs to Manifolds - Weak and Strong Pointwise Consistency of Graph Laplacians

Hein, M., Audibert, J., von Luxburg, U.

In Proceedings of the 18th Conference on Learning Theory (COLT), pages: 470-485, Conference on Learning Theory, 2005, Student Paper Award (inproceedings)

Abstract
In the machine learning community it is generally believed that graph Laplacians corresponding to a finite sample of data points converge to a continuous Laplace operator if the sample size increases. Even though this assertion serves as a justification for many Laplacian-based algorithms, so far only some aspects of this claim have been rigorously proved. In this paper we close this gap by establishing the strong pointwise consistency of a family of graph Laplacians with data-dependent weights to some weighted Laplace operator. Our investigation also includes the important case where the data lies on a submanifold of $R^d$.

ei

PDF [BibTex]

PDF [BibTex]


no image
Propagating Distributions on a Hypergraph by Dual Information Regularization

Tsuda, K.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 921 , (Editors: De Raedt, L. , S. Wrobel), ICML Bonn, 2005 (inproceedings)

Abstract
In the information regularization framework by Corduneanu and Jaakkola (2005), the distributions of labels are propagated on a hypergraph for semi-supervised learning. The learning is efficiently done by a Blahut-Arimoto-like two step algorithm, but, unfortunately, one of the steps cannot be solved in a closed form. In this paper, we propose a dual version of information regularization, which is considered as more natural in terms of information geometry. Our learning algorithm has two steps, each of which can be solved in a closed form. Also it can be naturally applied to exponential family distributions such as Gaussians. In experiments, our algorithm is applied to protein classification based on a metabolic network and known functional categories.

ei

[BibTex]

[BibTex]


no image
Moment Inequalities for Functions of Independent Random Variables

Boucheron, S., Bousquet, O., Lugosi, G., Massart, P.

To appear in Annals of Probability, 33, pages: 514-560, 2005 (article)

Abstract
A general method for obtaining moment inequalities for functions of independent random variables is presented. It is a generalization of the entropy method which has been used to derive concentration inequalities for such functions cite{BoLuMa01}, and is based on a generalized tensorization inequality due to Lata{l}a and Oleszkiewicz cite{LaOl00}. The new inequalities prove to be a versatile tool in a wide range of applications. We illustrate the power of the method by showing how it can be used to effortlessly re-derive classical inequalities including Rosenthal and Kahane-Khinchine-type inequalities for sums of independent random variables, moment inequalities for suprema of empirical processes, and moment inequalities for Rademacher chaos and $U$-statistics. Some of these corollaries are apparently new. In particular, we generalize Talagrands exponential inequality for Rademacher chaos of order two to any order. We also discuss applications for other complex functions of independent random variables, such as suprema of boolean polynomials which include, as special cases, subgraph counting problems in random graphs.

ei

PDF [BibTex]

PDF [BibTex]


no image
A Brain Computer Interface with Online Feedback based on Magnetoencephalography

Lal, T., Schröder, M., Hill, J., Preissl, H., Hinterberger, T., Mellinger, J., Bogdan, M., Rosenstiel, W., Hofmann, T., Birbaumer, N., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 465-472, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
The aim of this paper is to show that machine learning techniques can be used to derive a classifying function for human brain signal data measured by magnetoencephalography (MEG), for the use in a brain computer interface (BCI). This is especially helpful for evaluating quickly whether a BCI approach based on electroencephalography, on which training may be slower due to lower signalto- noise ratio, is likely to succeed. We apply recursive channel elimination and regularized SVMs to the experimental data of ten healthy subjects performing a motor imagery task. Four subjects were able to use a trained classifier together with a decision tree interface to write a short name. Further analysis gives evidence that the proposed imagination task is suboptimal for the possible extension to a multiclass interface. To the best of our knowledge this paper is the first working online BCI based on MEG recordings and is therefore a “proof of concept”.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Healing the Relevance Vector Machine through Augmentation

Rasmussen, CE., Candela, JQ.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 689 , (Editors: De Raedt, L. , S. Wrobel), ICML, 2005 (inproceedings)

Abstract
The Relevance Vector Machine (RVM) is a sparse approximate Bayesian kernel method. It provides full predictive distributions for test cases. However, the predictive uncertainties have the unintuitive property, that emph{they get smaller the further you move away from the training cases}. We give a thorough analysis. Inspired by the analogy to non-degenerate Gaussian Processes, we suggest augmentation to solve the problem. The purpose of the resulting model, RVM*, is primarily to corroborate the theoretical and experimental analysis. Although RVM* could be used in practical applications, it is no longer a truly sparse model. Experiments show that sparsity comes at the expense of worse predictive distributions.

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Long Term Prediction of Product Quality in a Glass Manufacturing Process Using a Kernel Based Approach

Jung, T., Herrera, L., Schölkopf, B.

In Proceedings of the 8th International Work-Conferenceon Artificial Neural Networks (Computational Intelligence and Bioinspired Systems), Lecture Notes in Computer Science, Vol. 3512, LNCS 3512, pages: 960-967, (Editors: J Cabestany and A Prieto and F Sandoval), Springer, Berlin Heidelberg, Germany, IWANN, 2005 (inproceedings)

Abstract
In this paper we report the results obtained using a kernel-based approach to predict the temporal development of four response signals in the process control of a glass melting tank with 16 input parameters. The data set is a revised version1 from the modelling challenge in EUNITE-2003. The central difficulties are: large time-delays between changes in the inputs and the outputs, large number of data, and a general lack of knowledge about the relevant variables that intervene in the process. The methodology proposed here comprises Support Vector Machines (SVM) and Regularization Networks (RN). We use the idea of sparse approximation both as a means of regularization and as a means of reducing the computational complexity. Furthermore, we will use an incremental approach to add new training examples to the kernel-based method and efficiently update the current solution. This allows us to use a sophisticated learning scheme, where we iterate between prediction and training, with good computational efficiency and satisfactory results.

ei

DOI [BibTex]

DOI [BibTex]


no image
Object correspondence as a machine learning problem

Schölkopf, B., Steinke, F., Blanz, V.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 777-784, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
We propose machine learning methods for the estimation of deformation fields that transform two given objects into each other, thereby establishing a dense point to point correspondence. The fields are computed using a modified support vector machine containing a penalty enforcing that points of one object will be mapped to ``similar‘‘ points on the other one. Our system, which contains little engineering or domain knowledge, delivers state of the art performance. We present application results including close to photorealistic morphs of 3D head models.

ei

PDF [BibTex]

PDF [BibTex]


no image
A tutorial on v-support vector machines

Chen, P., Lin, C., Schölkopf, B.

Applied Stochastic Models in Business and Industry, 21(2):111-136, 2005 (article)

Abstract
We briefly describe the main ideas of statistical learning theory, support vector machines (SVMs), and kernel feature spaces. We place particular emphasis on a description of the so-called -SVM, including details of the algorithm and its implementation, theoretical results, and practical applications. Copyright © 2005 John Wiley & Sons, Ltd.

ei

PDF [BibTex]

PDF [BibTex]


no image
Robust EEG Channel Selection Across Subjects for Brain Computer Interfaces

Schröder, M., Lal, T., Hinterberger, T., Bogdan, M., Hill, J., Birbaumer, N., Rosenstiel, W., Schölkopf, B.

EURASIP Journal on Applied Signal Processing, 2005(19, Special Issue: Trends in Brain Computer Interfaces):3103-3112, (Editors: Vesin, J. M., T. Ebrahimi), 2005 (article)

Abstract
Most EEG-based Brain Computer Interface (BCI) paradigms come along with specific electrode positions, e.g.~for a visual based BCI electrode positions close to the primary visual cortex are used. For new BCI paradigms it is usually not known where task relevant activity can be measured from the scalp. For individual subjects Lal et.~al showed that recording positions can be found without the use of prior knowledge about the paradigm used. However it remains unclear to what extend their method of Recursive Channel Elimination (RCE) can be generalized across subjects. In this paper we transfer channel rankings from a group of subjects to a new subject. For motor imagery tasks the results are promising, although cross-subject channel selection does not quite achieve the performance of channel selection on data of single subjects. Although the RCE method was not provided with prior knowledge about the mental task, channels that are well known to be important (from a physiological point of view) were consistently selected whereas task-irrelevant channels were reliably disregarded.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Implicit Surface Modelling as an Eigenvalue Problem

Walder, C., Chapelle, O., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 937-944, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
We discuss the problem of fitting an implicit shape model to a set of points sampled from a co-dimension one manifold of arbitrary topology. The method solves a non-convex optimisation problem in the embedding function that defines the implicit by way of its zero level set. By assuming that the solution is a mixture of radial basis functions of varying widths we attain the globally optimal solution by way of an equivalent eigenvalue problem, without using or constructing as an intermediate step the normal vectors of the manifold at each data point. We demonstrate the system on two and three dimensional data, with examples of missing data interpolation and set operations on the resultant shapes.

ei

PDF [BibTex]

PDF [BibTex]


no image
Natural Actor-Critic

Peters, J., Vijayakumar, S., Schaal, S.

In Proceedings of the 16th European Conference on Machine Learning, 3720, pages: 280-291, (Editors: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L.), Springer, ECML, 2005, clmc (inproceedings)

Abstract
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing AmariÕs natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regres- sion. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and BradtkeÕs Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Em- pirical evaluations illustrate the effectiveness of our techniques in com- parison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

am ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Comparative experiments on task space control with redundancy resolution

Nakanishi, J., Cory, R., Mistry, M., Peters, J., Schaal, S.

In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3901-3908, Edmonton, Alberta, Canada, Aug. 2-6, IROS, 2005, clmc (inproceedings)

Abstract
Understanding the principles of motor coordination with redundant degrees of freedom still remains a challenging problem, particularly for new research in highly redundant robots like humanoids. Even after more than a decade of research, task space control with redundacy resolution still remains an incompletely understood theoretical topic, and also lacks a larger body of thorough experimental investigation on complex robotic systems. This paper presents our first steps towards the development of a working redundancy resolution algorithm which is robust against modeling errors and unforeseen disturbances arising from contact forces. To gain a better understanding of the pros and cons of different approaches to redundancy resolution, we focus on a comparative empirical evaluation. First, we review several redundancy resolution schemes at the velocity, acceleration and torque levels presented in the literature in a common notational framework and also introduce some new variants of these previous approaches. Second, we present experimental comparisons of these approaches on a seven-degree-of-freedom anthropomorphic robot arm. Surprisingly, one of our simplest algorithms empirically demonstrates the best performance, despite, from a theoretical point, the algorithm does not share the same beauty as some of the other methods. Finally, we discuss practical properties of these control algorithms, particularly in light of inevitable modeling errors of the robot dynamics.

am ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Modeling and testing of a biomimetic flagellar propulsion method for microscale biomedical swimming robots

Behkam, B., Sitti, M.

In Proceedings of Advanced Intelligent Mechatronics Conference, pages: 37-42, 2005 (inproceedings)

pi

Project Page [BibTex]

Project Page [BibTex]


no image
Biologically inspired adhesion based surface climbing robots

Menon, C., Sitti, M.

In Robotics and Automation, 2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on, pages: 2715-2720, 2005 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Claytronics: highly scalable communications, sensing, and actuation networks

Aksak, Burak, Bhat, Preethi Srinivas, Campbell, Jason, DeRosa, Michael, Funiak, Stanislav, Gibbons, Phillip B, Goldstein, Seth Copen, Guestrin, Carlos, Gupta, Ashish, Helfrich, Casey, others

In Proceedings of the 3rd international conference on Embedded networked sensor systems, pages: 299-299, 2005 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Biologically Inspired Miniature Water Strider Robot.

Suhr, S. H., Song, Y. S., Lee, S. J., Sitti, M.

In Robotics: Science and Systems, pages: 319-326, 2005 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Polymer micro/nanofiber fabrication using micro/nanopipettes

Nain, A. S., Amon, C., Sitti, M.

In Nanotechnology, 2005. 5th IEEE Conference on, pages: 366-369, 2005 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Fusion of biomedical microcapsule endoscope and microsystem technology

Kim, Tae Song, Kim, Byungkyu, Cho, Dongil Dan, Song, Si Young, Dario, P, Sitti, M

In Solid-State Sensors, Actuators and Microsystems, 2005. Digest of Technical Papers. TRANSDUCERS’05. The 13th International Conference on, 1, pages: 9-14, 2005 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Atomic force microscope based two-dimensional assembly of micro/nanoparticles

Tafazzoli, A., Pawashe, C., Sitti, M.

In Assembly and Task Planning: From Nano to Macro Assembly and Manufacturing, 2005.(ISATP 2005). The 6th IEEE International Symposium on, pages: 230-235, 2005 (inproceedings)

pi

[BibTex]

[BibTex]


no image
A new endoscopic microcapsule robot using beetle inspired microfibrillar adhesives

Cheung, E., Karagozler, M. E., Park, S., Kim, B., Sitti, M.

In Advanced Intelligent Mechatronics. Proceedings, 2005 IEEE/ASME International Conference on, pages: 551-557, 2005 (inproceedings)

pi

Project Page [BibTex]

Project Page [BibTex]

2003


no image
Concentration Inequalities for Sub-Additive Functions Using the Entropy Method

Bousquet, O.

Stochastic Inequalities and Applications, 56, pages: 213-247, Progress in Probability, (Editors: Giné, E., C. Houdré and D. Nualart), November 2003 (article)

Abstract
We obtain exponential concentration inequalities for sub-additive functions of independent random variables under weak conditions on the increments of those functions, like the existence of exponential moments for these increments. As a consequence of these general inequalities, we obtain refinements of Talagrand's inequality for empirical processes and new bounds for randomized empirical processes. These results are obtained by further developing the entropy method introduced by Ledoux.

ei

PostScript [BibTex]

2003


PostScript [BibTex]


no image
On the Complexity of Learning the Kernel Matrix

Bousquet, O., Herrmann, D.

In Advances in Neural Information Processing Systems 15, pages: 399-406, (Editors: Becker, S. , S. Thrun, K. Obermayer), The MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We investigate data based procedures for selecting the kernel when learning with Support Vector Machines. We provide generalization error bounds by estimating the Rademacher complexities of the corresponding function classes. In particular we obtain a complexity bound for function classes induced by kernels with given eigenvectors, i.e., we allow to vary the spectrum and keep the eigenvectors fix. This bound is only a logarithmic factor bigger than the complexity of the function class induced by a single kernel. However, optimizing the margin over such classes leads to overfitting. We thus propose a suitable way of constraining the class. We use an efficient algorithm to solve the resulting optimization problem, present preliminary experimental results, and compare them to an alignment-based approach.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Cluster Kernels for Semi-Supervised Learning

Chapelle, O., Weston, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 15, pages: 585-592, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We propose a framework to incorporate unlabeled data in kernel classifier, based on the idea that two points in the same cluster are more likely to have the same label. This is achieved by modifying the eigenspectrum of the kernel matrix. Experimental results assess the validity of this approach.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Mismatch String Kernels for SVM Protein Classification

Leslie, C., Eskin, E., Weston, J., Noble, W.

In Advances in Neural Information Processing Systems 15, pages: 1417-1424, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Real-Time Face Detection

Kienzle, W.

Biologische Kybernetik, Eberhard-Karls-Universitaet Tuebingen, Tuebingen, Germany, October 2003 (diplomathesis)

ei

[BibTex]

[BibTex]


no image
Kernel Dependency Estimation

Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.

In Advances in Neural Information Processing Systems 15, pages: 873-880, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Linear Combinations of Optic Flow Vectors for Estimating Self-Motion: a Real-World Test of a Neural Model

Franz, MO., Chahl, JS.

In Advances in Neural Information Processing Systems 15, pages: 1319-1326, (Editors: Becker, S., S. Thrun and K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion. In this study, we examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an estimator consisting of a linear combination of optic flow vectors that incorporates prior knowledge both about the distance distribution of the environment, and about the noise and self-motion statistics of the sensor. The estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates turn out to be less reliable.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Clustering with the Fisher score

Tsuda, K., Kawanabe, M., Müller, K.

In Advances in Neural Information Processing Systems 15, pages: 729-736, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Marginalized Kernels between Labeled Graphs

Kashima, H., Tsuda, K., Inokuchi, A.

In 20th International Conference on Machine Learning, pages: 321-328, (Editors: Faucett, T. and N. Mishra), 20th International Conference on Machine Learning, August 2003 (inproceedings)

ei

PDF [BibTex]

PDF [BibTex]


no image
Sparse Gaussian Processes: inference, subspace identification and model selection

Csato, L., Opper, M.

In Proceedings, pages: 1-6, (Editors: Van der Hof, , Wahlberg), The Netherlands, 13th IFAC Symposium on System Identifiaction, August 2003, electronical version; Index ThA02-2 (inproceedings)

Abstract
Gaussian Process (GP) inference is a probabilistic kernel method where the GP is treated as a latent function. The inference is carried out using the Bayesian online learning and its extension to the more general iterative approach which we call TAP/EP learning. Sparsity is introduced in this context to make the TAP/EP method applicable to large datasets. We address the prohibitive scaling of the number of parameters by defining a subset of the training data that is used as the support the GP, thus the number of required parameters is independent of the training set, similar to the case of ``Support--‘‘ or ``Relevance--Vectors‘‘. An advantage of the full probabilistic treatment is that allows the computation of the marginal data likelihood or evidence, leading to hyper-parameter estimation within the GP inference. An EM algorithm to choose the hyper-parameters is proposed. The TAP/EP learning is the E-step and the M-step then updates the hyper-parameters. Due to the sparse E-step the resulting algorithm does not involve manipulation of large matrices. The presented algorithm is applicable to a wide variety of likelihood functions. We present results of applying the algorithm on classification and nonstandard regression problems for artificial and real datasets.

ei

PDF GZIP [BibTex]

PDF GZIP [BibTex]


no image
Adaptive, Cautious, Predictive control with Gaussian Process Priors

Murray-Smith, R., Sbarbaro, D., Rasmussen, CE., Girard, A.

In Proceedings of the 13th IFAC Symposium on System Identification, pages: 1195-1200, (Editors: Van den Hof, P., B. Wahlberg and S. Weiland), Proceedings of the 13th IFAC Symposium on System Identification, August 2003 (inproceedings)

Abstract
Nonparametric Gaussian Process models, a Bayesian statistics approach, are used to implement a nonlinear adaptive control law. Predictions, including propagation of the state uncertainty are made over a k-step horizon. The expected value of a quadratic cost function is minimised, over this prediction horizon, without ignoring the variance of the model predictions. The general method and its main features are illustrated on a simulation example.

ei

PDF [BibTex]

PDF [BibTex]


no image
On the Representation, Learning and Transfer of Spatio-Temporal Movement Characteristics

Ilg, W., Bakir, GH., Mezger, J., Giese, MA.

In Humanoids Proceedings, pages: 0-0, Humanoids Proceedings, July 2003, electronical version (inproceedings)

Abstract
In this paper we present a learning-based approach for the modelling of complex movement sequences. Based on the method of Spatio-Temporal Morphable Models (STMMS. We derive a hierarchical algorithm that, in a first step, identifies automatically movement elements in movement sequences based on a coarse spatio-temporal description, and in a second step models these movement primitives by approximation through linear combinations of learned example movement trajectories. We describe the different steps of the algorithm and show how it can be applied for modelling and synthesis of complex sequences of human movements that contain movement elements with variable style. The proposed method is demonstrated on different applications of movement representation relevant for imitation learning of movement styles in humanoid robotics.

ei

PDF [BibTex]

PDF [BibTex]


no image
Statistical Learning Theory, Capacity and Complexity

Schölkopf, B.

Complexity, 8(4):87-94, July 2003 (article)

Abstract
We give an exposition of the ideas of statistical learning theory, followed by a discussion of how a reinterpretation of the insights of learning theory could potentially also benefit our understanding of a certain notion of complexity.

ei

Web DOI [BibTex]


no image
Dealing with large Diagonals in Kernel Matrices

Weston, J., Schölkopf, B., Eskin, E., Leslie, C., Noble, W.

Annals of the Institute of Statistical Mathematics, 55(2):391-408, June 2003 (article)

Abstract
In kernel methods, all the information about the training data is contained in the Gram matrix. If this matrix has large diagonal values, which arises for many types of kernels, then kernel methods do not perform well: We propose and test several methods for dealing with this problem by reducing the dynamic range of the matrix while preserving the positive definiteness of the Hessian of the quadratic programming problem that one has to solve when training a Support Vector Machine, which is a common kernel approach for pattern recognition.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
The em Algorithm for Kernel Matrix Completion with Auxiliary Data

Tsuda, K., Akaho, S., Asai, K.

Journal of Machine Learning Research, 4, pages: 67-81, May 2003 (article)

ei

PDF [BibTex]

PDF [BibTex]


no image
Constructing Descriptive and Discriminative Non-linear Features: Rayleigh Coefficients in Kernel Feature Spaces

Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., Müller, K.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):623-628, May 2003 (article)

Abstract
We incorporate prior knowledge to construct nonlinear algorithms for invariant feature extraction and discrimination. Employing a unified framework in terms of a nonlinearized variant of the Rayleigh coefficient, we propose nonlinear generalizations of Fisher‘s discriminant and oriented PCA using support vector kernel functions. Extensive simulations show the utility of our approach.

ei

DOI [BibTex]

DOI [BibTex]


no image
A case based comparison of identification with neural network and Gaussian process models.

Kocijan, J., Banko, B., Likar, B., Girard, A., Murray-Smith, R., Rasmussen, CE.

In Proceedings of the International Conference on Intelligent Control Systems and Signal Processing ICONS 2003, 1, pages: 137-142, (Editors: Ruano, E.A.), Proceedings of the International Conference on Intelligent Control Systems and Signal Processing ICONS, April 2003 (inproceedings)

Abstract
In this paper an alternative approach to black-box identification of non-linear dynamic systems is compared with the more established approach of using artificial neural networks. The Gaussian process prior approach is a representative of non-parametric modelling approaches. It was compared on a pH process modelling case study. The purpose of modelling was to use the model for control design. The comparison revealed that even though Gaussian process models can be effectively used for modelling dynamic systems caution has to be axercised when signals are selected.

ei

PDF [BibTex]

PDF [BibTex]


no image
On-Line One-Class Support Vector Machines. An Application to Signal Segmentation

Gretton, A., Desobry, ..

In IEEE ICASSP Vol. 2, pages: 709-712, IEEE ICASSP, April 2003 (inproceedings)

Abstract
In this paper, we describe an efficient algorithm to sequentially update a density support estimate obtained using one-class support vector machines. The solution provided is an exact solution, which proves to be far more computationally attractive than a batch approach. This deterministic technique is applied to the problem of audio signal segmentation, with simulations demonstrating the computational performance gain on toy data sets, and the accuracy of the segmentation on audio signals.

ei

PostScript [BibTex]

PostScript [BibTex]


no image
The Kernel Mutual Information

Gretton, A., Herbrich, R., Smola, A.

In IEEE ICASSP Vol. 4, pages: 880-883, IEEE ICASSP, April 2003 (inproceedings)

Abstract
We introduce a new contrast function, the kernel mutual information (KMI), to measure the degree of independence of continuous random variables. This contrast function provides an approximate upper bound on the mutual information, as measured near independence, and is based on a kernel density estimate of the mutual information between a discretised approximation of the continuous random variables. We show that Bach and Jordan‘s kernel generalised variance (KGV) is also an upper bound on the same kernel density estimate, but is looser. Finally, we suggest that the addition of a regularising term in the KGV causes it to approach the KMI, which motivates the introduction of this regularisation.

ei

PostScript [BibTex]

PostScript [BibTex]