Header logo is


2005


no image
Kernel Methods for Measuring Independence

Gretton, A., Herbrich, R., Smola, A., Bousquet, O., Schölkopf, B.

Journal of Machine Learning Research, 6, pages: 2075-2129, December 2005 (article)

Abstract
We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlation-based dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis.

ei

PDF PostScript PDF [BibTex]

2005


PDF PostScript PDF [BibTex]


no image
A Unifying View of Sparse Approximate Gaussian Process Regression

Quinonero Candela, J., Rasmussen, C.

Journal of Machine Learning Research, 6, pages: 1935-1959, December 2005 (article)

Abstract
We provide a new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression. Our approach relies on expressing the effective prior which the methods are using. This allows new insights to be gained, and highlights the relationship between existing methods. It also allows for a clear theoretically justified ranking of the closeness of the known approximations to the corresponding full GPs. Finally we point directly to designs of new better sparse approximations, combining the best of the existing strategies, within attractive computational constraints.

ei

PDF [BibTex]

PDF [BibTex]


no image
Method and device for detection of splice form and alternative splice forms in DNA or RNA sequences

Rätsch, G., Sonnenburg, S., Müller, K., Schölkopf, B.

European Patent Application, International No PCT/EP2005/005783, December 2005 (patent)

ei

[BibTex]

[BibTex]


no image
Maximal Margin Classification for Metric Spaces

Hein, M., Bousquet, O., Schölkopf, B.

Journal of Computer and System Sciences, 71(3):333-359, October 2005 (article)

Abstract
In order to apply the maximum margin method in arbitrary metric spaces, we suggest to embed the metric space into a Banach or Hilbert space and to perform linear classification in this space. We propose several embeddings and recall that an isometric embedding in a Banach space is always possible while an isometric embedding in a Hilbert space is only possible for certain metric spaces. As a result, we obtain a general maximum margin classification algorithm for arbitrary metric spaces (whose solution is approximated by an algorithm of Graepel. Interestingly enough, the embedding approach, when applied to a metric which can be embedded into a Hilbert space, yields the SVM algorithm, which emphasizes the fact that its solution depends on the metric and not on the kernel. Furthermore we give upper bounds of the capacity of the function classes corresponding to both embeddings in terms of Rademacher averages. Finally we compare the capacities of these function classes directly.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Selective integration of multiple biological data for supervised network inference

Kato, T., Tsuda, K., Asai, K.

Bioinformatics, 21(10):2488 , October 2005 (article)

ei

PDF [BibTex]

PDF [BibTex]


no image
Assessing Approximate Inference for Binary Gaussian Process Classification

Kuss, M., Rasmussen, C.

Journal of Machine Learning Research, 6, pages: 1679 , October 2005 (article)

Abstract
Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace‘s method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace‘s method.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

Banerjee, A., Dhillon, I., Ghosh, J., Sra, S.

Journal of Machine Learning Research, 6, pages: 1345-1382, September 2005 (article)

Abstract
Several large scale data mining applications, such as text categorization and gene expression analysis, involve high-dimensional data that is also inherently directional in nature. Often such data is L2 normalized so that it lies on the surface of a unit hypersphere. Popular models such as (mixtures of) multi-variate Gaussians are inadequate for characterizing such data. This paper proposes a generative mixture-model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for data distributed on the unit hypersphere. In particular, we derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the mean and concentration parameters of this mixture. Numerical estimation of the concentration parameters is non-trivial in high dimensions since it involves functional inversion of ratios of Bessel functions. We also formulate two clustering algorithms corresponding to the variants of EM that we derive. Our approach provides a theoretical basis for the use of cosine similarity that has been widely employed by the information retrieval community, and obtains the spherical kmeans algorithm (kmeans with cosine similarity) as a special case of both variants. Empirical results on clustering of high-dimensional text and gene-expression data based on a mixture of vMF distributions show that the ability to estimate the concentration parameter for each vMF component, which is not present in existing approaches, yields superior results, especially for difficult clustering tasks in high-dimensional spaces.

ei

PDF [BibTex]

PDF [BibTex]


no image
Support Vector Machines for 3D Shape Processing

Steinke, F., Schölkopf, B., Blanz, V.

Computer Graphics Forum, 24(3, EUROGRAPHICS 2005):285-294, September 2005 (article)

Abstract
We propose statistical learning methods for approximating implicit surfaces and computing dense 3D deformation fields. Our approach is based on Support Vector (SV) Machines, which are state of the art in machine learning. It is straightforward to implement and computationally competitive; its parameters can be automatically set using standard machine learning methods. The surface approximation is based on a modified Support Vector regression. We present applications to 3D head reconstruction, including automatic removal of outliers and hole filling. In a second step, we build on our SV representation to compute dense 3D deformation fields between two objects. The fields are computed using a generalized SVMachine enforcing correspondence between the previously learned implicit SV object representations, as well as correspondences between feature points if such points are available. We apply the method to the morphing of 3D heads and other objects.

ei

PDF [BibTex]

PDF [BibTex]


no image
Correlation of EEG spectral entropy with regional cerebral blood flow during sevoflurane and propofol anaesthesia

Maksimow, A., Kaisti, K., Aalto, S., Mäenpää, M., Jääskeläinen, S., Hinkka, S., Martens, SMM., Särkelä, M., Viertiö-Oja, H., Scheinin, H.

Anaesthesia, 60(9):862-869, September 2005 (article)

Abstract
ENTROPY index monitoring, based on spectral entropy of the electroencephalogram, is a promising new method to measure the depth of anaesthesia. We examined the association between spectral entropy and regional cerebral blood flow in healthy subjects anaesthetised with 2%, 3% and 4% end-expiratory concentrations of sevoflurane and 7.6, 12.5 and 19.0 microg.ml(-1) plasma drug concentrations of propofol. Spectral entropy from the frequency band 0.8-32 Hz was calculated and cerebral blood flow assessed using positron emission tomography and [(15)O]-labelled water at baseline and at each anaesthesia level. Both drugs induced significant reductions in spectral entropy and cortical and global cerebral blood flow. Midfrontal-central spectral entropy was associated with individual frontal and whole brain blood flow values across all conditions, suggesting that this novel measure of anaesthetic depth can depict global changes in neuronal activity induced by the drugs. The cortical areas of the most significant associations were remarkably similar for both drugs.

ei

DOI [BibTex]

DOI [BibTex]


no image
Fast Protein Classification with Multiple Networks

Tsuda, K., Shin, H., Schölkopf, B.

Bioinformatics, 21(Suppl. 2):59-65, September 2005 (article)

Abstract
Support vector machines (SVM) have been successfully used to classify proteins into functional categories. Recently, to integrate multiple data sources, a semidefinite programming (SDP) based SVM method was introduced Lanckriet et al (2004). In SDP/SVM, multiple kernel matrices corresponding to each of data sources are combined with weights obtained by solving an SDP. However, when trying to apply SDP/SVM to large problems, the computational cost can become prohibitive, since both converting the data to a kernel matrix for the SVM and solving the SDP are time and memory demanding. Another application-specific drawback arises when some of the data sources are protein networks. A common method of converting the network to a kernel matrix is the diffusion kernel method, which has time complexity of O(n^3), and produces a dense matrix of size n x n. We propose an efficient method of protein classification using multiple protein networks. Available protein networks, such as a physical interaction network or a metabolic network, can be directly incorporated. Vectorial data can also be incorporated after conversion into a network by means of neighbor point connection. Similarly to the SDP/SVM method, the combination weights are obtained by convex optimization. Due to the sparsity of network edges, the computation time is nearly linear in the number of edges of the combined network. Additionally, the combination weights provide information useful for discarding noisy or irrelevant networks. Experiments on function prediction of 3588 yeast proteins show promising results: the computation time is enormously reduced, while the accuracy is still comparable to the SDP/SVM method.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Analyzing microarray data using quantitative association rules

Georgii, E., Richter, L., Rückert, U., Kramer, S.

Bioinformatics, 21(Suppl. 2):123-129, September 2005 (article)

Abstract
Motivation: We tackle the problem of finding regularities in microarray data. Various data mining tools, such as clustering, classification, Bayesian networks and association rules, have been applied so far to gain insight into gene-expression data. Association rule mining techniques used so far work on discretizations of the data and cannot account for cumulative effects. In this paper, we investigate the use of quantitative association rules that can operate directly on numeric data and represent cumulative effects of variables. Technically speaking, this type of quantitative association rules based on half-spaces can find non-axis-parallel regularities. Results: We performed a variety of experiments testing the utility of quantitative association rules for microarray data. First of all, the results should be statistically significant and robust against fluctuations in the data. Next, the approach should be scalable in the number of variables, which is important for such high-dimensional data. Finally, the rules should make sense biologically and be sufficiently different from rules found in regular association rule mining working with discretizations. In all of these dimensions, the proposed approach performed satisfactorily. Therefore, quantitative association rules based on half-spaces should be considered as a tool for the analysis of microarray gene-expression data.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Iterative Kernel Principal Component Analysis for Image Modeling

Kim, K., Franz, M., Schölkopf, B.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(9):1351-1366, September 2005 (article)

Abstract
In recent years, Kernel Principal Component Analysis (KPCA) has been suggested for various image processing tasks requiring an image model such as, e.g., denoising or compression. The original form of KPCA, however, can be only applied to strongly restricted image classes due to the limited number of training examples that can be processed. We therefore propose a new iterative method for performing KPCA, the Kernel Hebbian Algorithm which iteratively estimates the Kernel Principal Components with only linear order memory complexity. In our experiments, we compute models for complex image classes such as faces and natural images which require a large number of training examples. The resulting image models are tested in single-frame super-resolution and denoising applications. The KPCA model is not specifically tailored to these tasks; in fact, the same model can be used in super-resolution with variable input resolution, or denoising with unknown noise characteristics. In spite of this, both super-resolution a nd denoising performance are comparable to existing methods.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Large Margin Methods for Structured and Interdependent Output Variables

Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.

Journal of Machine Learning Research, 6, pages: 1453-1484, September 2005 (article)

Abstract
Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary issue of designing classification algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classification problems with class attributes. In order to accomplish this, we propose to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation. While this leads to a quadratic program with a potentially prohibitive, i.e. exponential, number of constraints, we present a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems. The proposed method has important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. Experiments from various domains involving different types of output spaces emphasize the breadth and generality of our approach.

ei

PDF [BibTex]

PDF [BibTex]


no image
Gene Expression Profiling of Serum- and Interleukin-1beta-Stimulated Primary Human Adult Articular Chondrocytes - A Molecular Analysis Based on Chondrocytes Isolated from One Donor

Aigner, T., McKenna, L., Zien, A., Fan, Z., Gebhard, P., Zimmer, R.

Cytokine, 31(3):227-240, August 2005 (article)

Abstract
In order to understand the cellular disease mechanisms of osteoarthritic cartilage degeneration it is of primary importance to understand both the anabolic and the catabolic processes going on in parallel in the diseased tissue. In this study, we have applied cDNA-array technology (Clontech) to study gene expression patterns of primary human normal adult articular chondrocytes isolated from one donor cultured under anabolic (serum) and catabolic (IL-1beta) conditions. Significant differences between the different in vitro cultures tested were detected. Overall, serum and IL-1beta significantly altered gene expression levels of 102 and 79 genes, respectively. IL-1beta stimulated the matrix metalloproteinases-1, -3, and -13 as well as members of its intracellular signaling cascade, whereas serum increased the expression of many cartilage matrix genes. Comparative gene expression analysis with previously published in vivo data (normal and osteoarthritic cartilage) showed significant differences of all in vitro s timulations compared to the changes detected in osteoarthritic cartilage in vivo. This investigation allowed us to characterize gene expression profiles of two classical anabolic and catabolic stimuli of human adult articular chondrocytes in vitro. No in vitro model appeared to be adequate to study overall gene expression alterations in osteoarthritic cartilage. Serum stimulated in vitro cultures largely reflected the results that were only consistent with the anabolic activation seen in osteoarthritic chondrocytes. In contrast, IL-1beta did not appear to be a good model for mimicking catabolic gene alterations in degenerating chondrocytes.

ei

Web [BibTex]


no image
Phenotypic characterization of chondrosarcoma-derived cell lines

Schorle, C., Finger, F., Zien, A., Block, J., Gebhard, P., Aigner, T.

Cancer Letters, 226(2):143-154, August 2005 (article)

Abstract
Gene expression profiling of three chondrosarcoma derived cell lines (AD, SM, 105KC) showed an increased proliferative activity and a reduced expression of chondrocytic-typical matrix products compared to primary chondrocytes. The incapability to maintain an adequate matrix synthesis as well as a notable proliferative activity at the same time is comparable to neoplastic chondrosarcoma cells in vivo which cease largely cartilage matrix formation as soon as their proliferative activity increases. Thus, the investigated cell lines are of limited value as substitute of primary chondrocytes but might have a much higher potential to investigate the behavior of neoplastic chondrocytes, i.e. chondrosarcoma biology.

ei

Web [BibTex]

Web [BibTex]


no image
Local Rademacher Complexities

Bartlett, P., Bousquet, O., Mendelson, S.

The Annals of Statistics, 33(4):1497-1537, August 2005 (article)

Abstract
We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.

ei

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
Inlier-based ICA with an application to superimposed images

Meinecke, F., Harmeling, S., Müller, K.

International Journal of Imaging Systems and Technology, 15(1):48-55, July 2005 (article)

Abstract
This paper proposes a new independent component analysis (ICA) method which is able to unmix overcomplete mixtures of sparce or structured signals like speech, music or images. Furthermore, the method is designed to be robust against outliers, which is a favorable feature for ICA algorithms since most of them are extremely sensitive to outliers. Our approach is based on a simple outlier index. However, instead of robustifying an existing algorithm by some outlier rejection technique we show how this index can be used directly to solve the ICA problem for super-Gaussian sources. The resulting inlier-based ICA (IBICA) is outlier-robust by construction and can be used for standard ICA as well as for overcomplete ICA (i.e. more source signals than observed signals).

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Learning the Kernel with Hyperkernels

Ong, CS., Smola, A., Williamson, R.

Journal of Machine Learning Research, 6, pages: 1043-1071, July 2005 (article)

Abstract
This paper addresses the problem of choosing a kernel suitable for estimation with a Support Vector Machine, hence further automating machine learning. This goal is achieved by defining a Reproducing Kernel Hilbert Space on the space of kernels itself. Such a formulation leads to a statistical estimation problem similar to the problem of minimizing a regularized risk functional. We state the equivalent representer theorem for the choice of kernels and present a semidefinite programming formulation of the resulting optimization problem. Several recipes for constructing hyperkernels are provided, as well as the details of common machine learning problems. Experimental results for classification, regression and novelty detection on UCI data show the feasibility of our approach.

ei

PDF [BibTex]

PDF [BibTex]


no image
Image Reconstruction by Linear Programming

Tsuda, K., Rätsch, G.

IEEE Transactions on Image Processing, 14(6):737-744, June 2005 (article)

Abstract
One way of image denoising is to project a noisy image to the subspace of admissible images derived, for instance, by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by l1-norm penalization and to update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be efficiently solved. In particular, one can apply the upsilon trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g., sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are also able to show the upsilon property for this extended LP leading to a method which is easy to use. Experimental results demonstrate the power of our approach.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
RASE: recognition of alternatively spliced exons in C.elegans

Rätsch, G., Sonnenburg, S., Schölkopf, B.

Bioinformatics, 21(Suppl. 1):i369-i377, June 2005 (article)

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection

Tsuda, K., Rätsch, G., Warmuth, M.

Journal of Machine Learning Research, 6, pages: 995-1018, June 2005 (article)

Abstract
We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: on-line learning with a simple square loss, and finding a symmetric positive definite matrix subject to linear constraints. The updates generalize the exponentiated gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive definiteness. Most importantly, we show how the derivation and the analyses of the original EG update and AdaBoost generalize to the non-diagonal case. We apply the resulting matrix exponentiated gradient (MEG) update and DefiniteBoost to the problem of learning a kernel matrix from distance measurements.

ei

PDF [BibTex]

PDF [BibTex]


no image
Protein function prediction via graph kernels

Borgwardt, KM., Ong, CS., Schönauer, S., Vishwanathan, ., Smola, AJ., Kriegel, H-P.

Bioinformatics, 21(Suppl. 1: ISMB 2005 Proceedings):i47-i56, June 2005 (article)

Abstract
Motivation: Computational approaches to protein function prediction infer protein function by finding proteins with similar sequence, structure, surface clefts, chemical properties, amino acid motifs, interaction partners or phylogenetic profiles. We present a new approach that combines sequential, structural and chemical information into one graph model of proteins. We predict functional class membership of enzymes and non-enzymes using graph kernels and support vector machine classification on these protein graphs. Results: Our graph model, derivable from protein sequence and structure only, is competitive with vector models that require additional protein information, such as the size of surface pockets. If we include this extra information into our graph model, our classifier yields significantly higher accuracy levels than the vector models. Hyperkernels allow us to select and to optimally combine the most relevant node attributes in our protein graphs. We have laid the foundation for a protein function prediction system that integrates protein information from various sources efficiently and effectively.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Texture and haptic cues in slant discrimination: Reliability-based cue weighting without statistically optimal cue combination

Rosas, P., Wagemans, J., Ernst, M., Wichmann, F.

Journal of the Optical Society of America A, 22(5):801-809, May 2005 (article)

Abstract
A number of models of depth cue combination suggest that the final depth percept results from a weighted average of independent depth estimates based on the different cues available. The weight of each cue in such an average is thought to depend on the reliability of each cue. In principle, such a depth estimation could be statistically optimal in the sense of producing the minimum variance unbiased estimator that can be constructed from the available information. Here we test such models using visual and haptic depth information. Different texture types produce differences in slant discrimination performance, providing a means for testing a reliability-sensitive cue combination model using texture as one of the cues to slant. Our results show that the weights for the cues were generally sensitive to their reliability, but fell short of statistically optimal combination—we find reliability-based re-weighting, but not statistically optimal cue combination.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Bayesian inference for psychometric functions

Kuss, M., Jäkel, F., Wichmann, F.

Journal of Vision, 5(5):478-492, May 2005 (article)

Abstract
In psychophysical studies, the psychometric function is used to model the relation between physical stimulus intensity and the observer’s ability to detect or discriminate between stimuli of different intensities. In this study, we propose the use of Bayesian inference to extract the information contained in experimental data to estimate the parameters of psychometric functions. Because Bayesian inference cannot be performed analytically, we describe how a Markov chain Monte Carlo method can be used to generate samples from the posterior distribution over parameters. These samples are used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. In addition, we discuss the parameterization of psychometric functions and the role of prior distributions in the analysis. The proposed approach is exemplified using artificially generated data and in a case study for real experimental data. Furthermore, we compare our approach with traditional methods based on maximum likelihood parameter estimation combined with bootstrap techniques for confidence interval estimation and find the Bayesian approach to be superior.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
A gene expression map of Arabidopsis thaliana development

Schmid, M., Davison, T., Henz, S., Pape, U., Demar, M., Vingron, M., Schölkopf, B., Weigel, D., Lohmann, J.

Nature Genetics, 37(5):501-506, April 2005 (article)

Abstract
Regulatory regions of plant genes tend to be more compact than those of animal genes, but the complement of transcription factors encoded in plant genomes is as large or larger than that found in those of animals. Plants therefore provide an opportunity to study how transcriptional programs control multicellular development. We analyzed global gene expression during development of the reference plant Arabidopsis thaliana in samples covering many stages, from embryogenesis to senescence, and diverse organs. Here, we provide a first analysis of this data set, which is part of the AtGenExpress expression atlas. We observed that the expression levels of transcription factor genes and signal transduction components are similar to those of metabolic genes. Examining the expression patterns of large gene families, we found that they are often more similar than would be expected by chance, indicating that many gene families have been co-opted for specific developmental processes.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Morphological characterization of molecular complexes present in the synaptic cleft

Lucic, V., Yang, T., Schweikert, G., Förster, F., Baumeister, W.

Structure, 13(3):423-434, March 2005 (article)

Abstract
We obtained tomograms of isolated mammalian excitatory synapses by cryo-electron tomography. This method allows the investigation of biological material in the frozen-hydrated state, without staining, and can therefore provide reliable structural information at the molecular level. We developed an automated procedure for the segmentation of molecular complexes present in the synaptic cleft based on thresholding and connectivity, and calculated several morphological characteristics of these complexes. Extensive lateral connections along the synaptic cleft are shown to form a highly connected structure with a complex topology. Our results are essentially parameter-free, i.e., they do not depend on the choice of certain parameter values (such as threshold). In addition, the results are not sensitive to noise; the same conclusions can be drawn from the analysis of both nondenoised and denoised tomograms.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Experimentally optimal v in support vector regression for different noise models and parameter settings

Chalimourda, A., Schölkopf, B., Smola, A.

Neural Networks, 18(2):205-205, March 2005 (article)

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Semi-supervised protein classification using cluster kernels

Weston, J., Leslie, C., Ie, E., Zhou, D., Elisseeff, A., Noble, W.

Bioinformatics, 21(15):3241-3247, 2005 (article)

ei

[BibTex]

[BibTex]


no image
Invariance of Neighborhood Relation under Input Space to Feature Space Mapping

Shin, H., Cho, S.

Pattern Recognition Letters, 26(6):707-718, 2005 (article)

Abstract
If the training pattern set is large, it takes a large memory and a long time to train support vector machine (SVM). Recently, we proposed neighborhood property based pattern selection algorithm (NPPS) which selects only the patterns that are likely to be near the decision boundary ahead of SVM training [Proc. of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Lecture Notes in Artificial Intelligence (LNAI 2637), Seoul, Korea, pp. 376–387]. NPPS tries to identify those patterns that are likely to become support vectors in feature space. Preliminary reports show its effectiveness: SVM training time was reduced by two orders of magnitude with almost no loss in accuracy for various datasets. It has to be noted, however, that decision boundary of SVM and support vectors are all defined in feature space while NPPS described above operates in input space. If neighborhood relation in input space is not preserved in feature space, NPPS may not always be effective. In this paper, we sh ow that the neighborhood relation is invariant under input to feature space mapping. The result assures that the patterns selected by NPPS in input space are likely to be located near decision boundary in feature space.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Graph Kernels for Chemical Informatics

Ralaivola, L., Swamidass, J., Saigo, H., Baldi, P.

Neural Networks, 18(8):1093-1110, 2005 (article)

Abstract
Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depthfirst search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5% on the Mutag dataset, 65-67% on the PTC (Predictive Toxicology Challenge) dataset, and 72% on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Extended Gaussianization Method for Blind Separation of Post-Nonlinear Mixtures

Zhang, K., Chan, L.

Neural Computation, 17(2):425-452, 2005 (article)

Abstract
The linear mixture model has been investigated in most articles tackling the problem of blind source separation. Recently, several articles have addressed a more complex model: blind source separation (BSS) of post-nonlinear (PNL) mixtures. These mixtures are assumed to be generated by applying an unknown invertible nonlinear distortion to linear instantaneous mixtures of some independent sources. The gaussianization technique for BSS of PNL mixtures emerged based on the assumption that the distribution of the linear mixture of independent sources is gaussian. In this letter, we review the gaussianization method and then extend it to apply to PNL mixture in which the linear mixture is close to gaussian. Our proposed method approximates the linear mixture using the Cornish-Fisher expansion. We choose the mutual information as the independence measurement to develop a learning algorithm to separate PNL mixtures. This method provides better applicability and accuracy. We then discuss the sufficient condition for the method to be valid. The characteristics of the nonlinearity do not affect the performance of this method. With only a few parameters to tune, our algorithm has a comparatively low computation. Finally, we present experiments to illustrate the efficiency of our method.

ei

Web DOI [BibTex]


no image
Theory of Classification: A Survey of Some Recent Advances

Boucheron, S., Bousquet, O., Lugosi, G.

ESAIM: Probability and Statistics, 9, pages: 323 , 2005 (article)

Abstract
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have lead to these important recent developments.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Moment Inequalities for Functions of Independent Random Variables

Boucheron, S., Bousquet, O., Lugosi, G., Massart, P.

To appear in Annals of Probability, 33, pages: 514-560, 2005 (article)

Abstract
A general method for obtaining moment inequalities for functions of independent random variables is presented. It is a generalization of the entropy method which has been used to derive concentration inequalities for such functions cite{BoLuMa01}, and is based on a generalized tensorization inequality due to Lata{l}a and Oleszkiewicz cite{LaOl00}. The new inequalities prove to be a versatile tool in a wide range of applications. We illustrate the power of the method by showing how it can be used to effortlessly re-derive classical inequalities including Rosenthal and Kahane-Khinchine-type inequalities for sums of independent random variables, moment inequalities for suprema of empirical processes, and moment inequalities for Rademacher chaos and $U$-statistics. Some of these corollaries are apparently new. In particular, we generalize Talagrands exponential inequality for Rademacher chaos of order two to any order. We also discuss applications for other complex functions of independent random variables, such as suprema of boolean polynomials which include, as special cases, subgraph counting problems in random graphs.

ei

PDF [BibTex]

PDF [BibTex]


no image
A novel representation of protein sequences for prediction of subcellular location using support vector machines

Matsuda, S., Vert, J., Saigo, H., Ueda, N., Toh, H., Akutsu, T.

Protein Science, 14, pages: 2804-2813, 2005 (article)

Abstract
As the number of complete genomes rapidly increases, accurate methods to automatically predict the subcellular location of proteins are increasingly useful to help their functional annotation. In order to improve the predictive accuracy of the many prediction methods developed to date, a novel representation of protein sequences is proposed. This representation involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids. For calculating the local features, each sequence is split into three parts: N-terminal, middle, and C-terminal. The N-terminal part is further divided into four regions to consider ambiguity in the length and position of signal sequences. We tested this representation with support vector machines on two data sets extracted from the SWISS-PROT database. Through fivefold cross-validation tests, overall accuracies of more than 87% and 91% were obtained for eukaryotic and prokaryotic proteins, respectively. It is concluded that considering the respective features in the N-terminal, middle, and C-terminal parts is helpful to predict the subcellular location. Keywords: subcellular location; signal sequence; amino acid composition; distance frequency; support vector machine; predictive accuracy

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A tutorial on v-support vector machines

Chen, P., Lin, C., Schölkopf, B.

Applied Stochastic Models in Business and Industry, 21(2):111-136, 2005 (article)

Abstract
We briefly describe the main ideas of statistical learning theory, support vector machines (SVMs), and kernel feature spaces. We place particular emphasis on a description of the so-called -SVM, including details of the algorithm and its implementation, theoretical results, and practical applications. Copyright © 2005 John Wiley & Sons, Ltd.

ei

PDF [BibTex]

PDF [BibTex]


no image
Robust EEG Channel Selection Across Subjects for Brain Computer Interfaces

Schröder, M., Lal, T., Hinterberger, T., Bogdan, M., Hill, J., Birbaumer, N., Rosenstiel, W., Schölkopf, B.

EURASIP Journal on Applied Signal Processing, 2005(19, Special Issue: Trends in Brain Computer Interfaces):3103-3112, (Editors: Vesin, J. M., T. Ebrahimi), 2005 (article)

Abstract
Most EEG-based Brain Computer Interface (BCI) paradigms come along with specific electrode positions, e.g.~for a visual based BCI electrode positions close to the primary visual cortex are used. For new BCI paradigms it is usually not known where task relevant activity can be measured from the scalp. For individual subjects Lal et.~al showed that recording positions can be found without the use of prior knowledge about the paradigm used. However it remains unclear to what extend their method of Recursive Channel Elimination (RCE) can be generalized across subjects. In this paper we transfer channel rankings from a group of subjects to a new subject. For motor imagery tasks the results are promising, although cross-subject channel selection does not quite achieve the performance of channel selection on data of single subjects. Although the RCE method was not provided with prior knowledge about the mental task, channels that are well known to be important (from a physiological point of view) were consistently selected whereas task-irrelevant channels were reliably disregarded.

ei

Web DOI [BibTex]

Web DOI [BibTex]

2004


no image
On the representation, learning and transfer of spatio-temporal movement characteristics

Ilg, W., Bakir, GH., Mezger, J., Giese, M.

International Journal of Humanoid Robotics, 1(4):613-636, December 2004 (article)

ei

[BibTex]

2004


[BibTex]


no image
Insect-inspired estimation of egomotion

Franz, MO., Chahl, JS., Krapp, HG.

Neural Computation, 16(11):2245-2260, November 2004 (article)

Abstract
Tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during egomotion. In this study, we examine whether a simplified linear model based on the organization principles in tangential neurons can be used to estimate egomotion from the optic flow. We present a theory for the construction of an estimator consisting of a linear combination of optic flow vectors that incorporates prior knowledge both about the distance distribution of the environment, and about the noise and egomotion statistics of the sensor. The estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates are of reasonable quality, albeit less reliable.

ei

PDF PostScript Web DOI [BibTex]

PDF PostScript Web DOI [BibTex]


no image
Efficient face detection by a cascaded support-vector machine expansion

Romdhani, S., Torr, P., Schölkopf, B., Blake, A.

Proceedings of The Royal Society of London A, 460(2501):3283-3297, A, November 2004 (article)

Abstract
We describe a fast system for the detection and localization of human faces in images using a nonlinear ‘support-vector machine‘. We approximate the decision surface in terms of a reduced set of expansion vectors and propose a cascaded evaluation which has the property that the full support-vector expansion is only evaluated on the face-like parts of the image, while the largest part of typical images is classified using a single expansion vector (a simpler and more efficient classifier). As a result, only three reduced-set vectors are used, on average, to classify an image patch. Hence, the cascaded evaluation, presented in this paper, offers a thirtyfold speed-up over an evaluation using the full set of reduced-set vectors, which is itself already thirty times faster than classification using all the support vectors.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Pattern detection methods and systems and face detection methods and systems

Blake, A., Romdhani, S., Schölkopf, B., Torr, P. H. S.

United States Patent, No 6804391, October 2004 (patent)

ei

[BibTex]

[BibTex]


no image
Advanced Lectures on Machine Learning

Bousquet, O., von Luxburg, U., Rätsch, G.

ML Summer Schools 2003, LNAI 3176, pages: 240, Springer, Berlin, Germany, ML Summer Schools, September 2004 (proceedings)

Abstract
Machine Learning has become a key enabling technology for many engineering applications, investigating scientific questions and theoretical problems alike. To stimulate discussions and to disseminate new results, a summer school series was started in February 2002, the documentation of which is published as LNAI 2600. This book presents revised lectures of two subsequent summer schools held in 2003 in Canberra, Australia, and in T{\"u}bingen, Germany. The tutorial lectures included are devoted to statistical learning theory, unsupervised learning, Bayesian inference, and applications in pattern recognition; they provide in-depth overviews of exciting new developments and contain a large number of references. Graduate students, lecturers, researchers and professionals alike will find this book a useful resource in learning and teaching machine learning.

ei

Web [BibTex]

Web [BibTex]


no image
Pattern Recognition: 26th DAGM Symposium, LNCS, Vol. 3175

Rasmussen, C., Bülthoff, H., Giese, M., Schölkopf, B.

Proceedings of the 26th Pattern Recognition Symposium (DAGM‘04), pages: 581, Springer, Berlin, Germany, 26th Pattern Recognition Symposium, August 2004 (proceedings)

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Learning kernels from biological networks by maximizing entropy

Tsuda, K., Noble, W.

Bioinformatics, 20(Suppl. 1):i326-i333, August 2004 (article)

Abstract
Motivation: The diffusion kernel is a general method for computing pairwise distances among all nodes in a graph, based on the sum of weighted paths between each pair of nodes. This technique has been used successfully, in conjunction with kernel-based learning methods, to draw inferences from several types of biological networks. Results: We show that computing the diffusion kernel is equivalent to maximizing the von Neumann entropy, subject to a global constraint on the sum of the Euclidean distances between nodes. This global constraint allows for high variance in the pairwise distances. Accordingly, we propose an alternative, locally constrained diffusion kernel, and we demonstrate that the resulting kernel allows for more accurate support vector machine prediction of protein functional classifications from metabolic and protein–protein interaction networks.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Masking effect produced by Mach bands on the detection of narrow bars of random polarity

Henning, GB., Hoddinott, KT., Wilson-Smith, ZJ., Hill, NJ.

Journal of the Optical Society of America, 21(8):1379-1387, A, August 2004 (article)

ei

[BibTex]

[BibTex]


no image
Support Vector Channel Selection in BCI

Lal, T., Schröder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Schölkopf, B.

IEEE Transactions on Biomedical Engineering, 51(6):1003-1010, June 2004 (article)

Abstract
Designing a Brain Computer Interface (BCI) system one can choose from a variety of features that may be useful for classifying brain activity during a mental task. For the special case of classifying EEG signals we propose the usage of the state of the art feature selection algorithms Recursive Feature Elimination and Zero-Norm Optimization which are based on the training of Support Vector Machines (SVM). These algorithms can provide more accurate solutions than standard filter methods for feature selection. We adapt the methods for the purpose of selecting EEG channels. For a motor imagery paradigm we show that the number of used channels can be reduced significantly without increasing the classification error. The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks. Furthermore we show how time dependent task specific information can be visualized.

ei

DOI [BibTex]

DOI [BibTex]


no image
Distance-Based Classification with Lipschitz Functions

von Luxburg, U., Bousquet, O.

Journal of Machine Learning Research, 5, pages: 669-695, June 2004 (article)

Abstract
The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. To analyze the resulting algorithm, we prove several representer theorems. They state that there always exist solutions of the Lipschitz classifier which can be expressed in terms of distance functions to training points. We provide generalization bounds for Lipschitz classifiers in terms of the Rademacher complexities of some Lipschitz function classes. The generality of our approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz classifier, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.

ei

PDF PostScript PDF [BibTex]

PDF PostScript PDF [BibTex]


no image
Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference

Thrun, S., Saul, L., Schölkopf, B.

Proceedings of the Seventeenth Annual Conference on Neural Information Processing Systems (NIPS 2003), pages: 1621, MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (proceedings)

Abstract
The annual Neural Information Processing (NIPS) conference is the flagship meeting on neural computation. It draws a diverse group of attendees—physicists, neuroscientists, mathematicians, statisticians, and computer scientists. The presentations are interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, brain imaging, vision, speech and signal processing, reinforcement learning and control, emerging technologies, and applications. Only thirty percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. This volume contains all the papers presented at the 2003 conference.

ei

Web [BibTex]

Web [BibTex]


no image
cDNA-Microarray Technology in Cartilage Research - Functional Genomics of Osteoarthritis [in German]

Aigner, T., Finger, F., Zien, A., Bartnik, E.

Zeitschrift f{\"u}r Orthop{\"a}die und ihre Grenzgebiete, 142(2):241-247, April 2004 (article)

Abstract
Functional genomics represents a new challenging approach in order to analyze complex diseases such as osteoarthritis on a molecular level. The characterization of the molecular changes of the cartilage cells, the chondrocytes, enables a better understanding of the pathomechanisms of the disease. In particular, the identification and characterization of new target molecules for therapeutic intervention is of interest. Also, potential molecular markers for diagnosis and monitoring of osteoarthritis contribute to a more appropriate patient management. The DNA-microarray technology complements (but does not replace) biochemical and biological research in new disease-relevant genes. Large-scale functional genomics will identify molecular networks such as yet identified players in the anabolic-catabolic balance of articular cartilage as well as disease-relevant intracellular signaling cascades so far rather unknown in articular chondrocytes. However, at the moment it is also important to recognize the limitations of the microarray technology in order to avoid over-interpretation of the results. This might lead to misleading results and prevent to a significant extent a proper use of the potential of this technology in the field of osteoarthritis.

ei

[BibTex]

[BibTex]


no image
A Compression Approach to Support Vector Model Selection

von Luxburg, U., Bousquet, O., Schölkopf, B.

Journal of Machine Learning Research, 5, pages: 293-323, April 2004 (article)

Abstract
In this paper we investigate connections between statistical learning theory and data compression on the basis of support vector machine (SVM) model selection. Inspired by several generalization bounds we construct "compression coefficients" for SVMs which measure the amount by which the training labels can be compressed by a code built from the separating hyperplane. The main idea is to relate the coding precision to geometrical concepts such as the width of the margin or the shape of the data in the feature space. The so derived compression coefficients combine well known quantities such as the radius-margin term R^2/rho^2, the eigenvalues of the kernel matrix, and the number of support vectors. To test whether they are useful in practice we ran model selection experiments on benchmark data sets. As a result we found that compression coefficients can fairly accurately predict the parameters for which the test error is minimized.

ei

PDF [BibTex]

PDF [BibTex]


no image
Injecting noise for analysing the stability of ICA components

Harmeling, S., Meinecke, F., Müller, K.

Signal Processing, 84(2):255-266, February 2004 (article)

Abstract
Usually, noise is considered to be destructive. We present a new method that constructively injects noise to assess the reliability and the grouping structure of empirical ICA component estimates. Our method can be viewed as a Monte-Carlo-style approximation of the curvature of some performance measure at the solution. Simulations show that the true root-mean-squared angle distances between the real sources and the source estimates can be approximated well by our method. In a toy experiment, we see that we are also able to reveal the underlying grouping structure of the extracted ICA components. Furthermore, an experiment with fetal ECG data demonstrates that our approach is useful for exploratory data analysis of real-world data.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]