Header logo is


2007


no image
Unsupervised Classification for non-invasive Brain-Computer-Interfaces

Eren, S., Grosse-Wentrup, M., Buss, M.

In Automed 2007, pages: 65-66, VDI Verlag, Düsseldorf, Germany, Automed Workshop, October 2007 (inproceedings)

Abstract
Non-invasive Brain-Computer-Interfaces (BCIs) are devices that infer the intention of human subjects from signals generated by the central nervous system and recorded outside the skull, e.g., by electroencephalography (EEG). They can be used to enable basic communication for patients who are not able to communicate by normal means, e.g., due to neuro-degenerative diseases such as amyotrophic lateral sclerosis (ALS) (see [Vaughan2003] for a review). One challenge in research on BCIs is minimizing the training time prior to usage of the BCI. Since EEG patterns vary across subjects, it is usually necessary to record a number of trials in which the intention of the user is known to train a classifier. This classifier is subsequently used to infer the intention of the BCI-user. In this paper, we present the application of an unsupervised classification method to a binary noninvasive BCI based on motor imagery. The result is a BCI that does not require any training, since the mapping from EEG pattern changes to the intention of the user is learned online by the BCI without any feedback. We present experimental results from six healthy subjects, three of which display classification errors below 15%. We conclude that unsupervised BCIs are a viable option, but not yet as reliable as supervised BCIs. The rest of this paper is organized as follows. In the Methods section, we first introduce the experimental paradigm. This is followed by a description of the methods used for spatial filtering, feature extraction, and unsupervised classification. We then present the experimental results, and conclude the paper with a brief discussion.

ei

PDF Web [BibTex]

2007


PDF Web [BibTex]


no image
Some observations on the masking effects of Mach bands

Curnow, T., Cowie, DA., Henning, GB., Hill, NJ.

Journal of the Optical Society of America A, 24(10):3233-3241, October 2007 (article)

Abstract
There are 8 cycle / deg ripples or oscillations in performance as a function of location near Mach bands in experiments measuring Mach bands’ masking effects on random polarity signal bars. The oscillations with increments are 180 degrees out of phase with those for decrements. The oscillations, much larger than the measurement error, appear to relate to the weighting function of the spatial-frequency-tuned channel detecting the broad- band signals. The ripples disappear with step maskers and become much smaller at durations below 25 ms, implying either that the site of masking has changed or that the weighting function and hence spatial-frequency tuning is slow to develop.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Hilbert Space Embedding for Distributions

Smola, A., Gretton, A., Song, L., Schölkopf, B.

In Algorithmic Learning Theory, Lecture Notes in Computer Science 4754 , pages: 13-31, (Editors: M Hutter and RA Servedio and E Takimoto), Springer, Berlin, Germany, 18th International Conference on Algorithmic Learning Theory (ALT), October 2007 (inproceedings)

Abstract
We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Cluster Identification in Nearest-Neighbor Graphs

Maier, M., Hein, M., von Luxburg, U.

In ALT 2007, pages: 196-210, (Editors: Hutter, M. , R. A. Servedio, E. Takimoto), Springer, Berlin, Germany, 18th International Conference on Algorithmic Learning Theory, October 2007 (inproceedings)

Abstract
Assume we are given a sample of points from some underlying distribution which contains several distinct clusters. Our goal is to construct a neighborhood graph on the sample points such that clusters are ``identified‘‘: that is, the subgraph induced by points from the same cluster is connected, while subgraphs corresponding to different clusters are not connected to each other. We derive bounds on the probability that cluster identification is successful, and use them to predict ``optimal‘‘ values of k for the mutual and symmetric k-nearest-neighbor graphs. We point out different properties of the mutual and symmetric nearest-neighbor graphs related to the cluster identification problem.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Inducing Metric Violations in Human Similarity Judgements

Laub, J., Macke, J., Müller, K., Wichmann, F.

In Advances in Neural Information Processing Systems 19, pages: 777-784, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difficult challenge. Some of the difficulty arises because of conflicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. Intuitively, this has a strong appeal as it would allow (dis)similarity to be represented geometrically as distance in some internal space. Here we show how a single stimulus, carefully constructed in a psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. We term this one influential data point a conflictual judgement. We present an algorithm of how to analyse such data and how to identify the crucial point. Thus there may not be a strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically with a small subset causing a global violation of metricity.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

Seeger, M.

In Advances in Neural Information Processing Systems 19, pages: 1233-1240, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We propose a highly efficient framework for kernel multi-class models with a large and structured set of classes. Kernel parameters are learned automatically by maximizing the cross-validation log likelihood, and predictive probabilities are estimated. We demonstrate our approach on large scale text classification tasks with hierarchical class structure, achieving state-of-the-art results in an order of magnitude less time than previous work.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
A Local Learning Approach for Clustering

Wu, M., Schölkopf, B.

In Advances in Neural Information Processing Systems 19, pages: 1529-1536, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We present a local learning approach for clustering. The basic idea is that a good clustering result should have the property that the cluster label of each data point can be well predicted based on its neighboring data and their cluster labels, using current supervised learning methods. An optimization problem is formulated such that its solution has the above property. Relaxation and eigen-decomposition are applied to solve this optimization problem. We also briefly investigate the parameter selection issue and provide a simple parameter selection method for the proposed algorithm. Experimental results are provided to validate the effectiveness of the proposed approach.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Adaptive Spatial Filters with predefined Region of Interest for EEG based Brain-Computer-Interfaces

Grosse-Wentrup, M., Gramann, K., Buss, M.

In Advances in Neural Information Processing Systems 19, pages: 537-544, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
The performance of EEG-based Brain-Computer-Interfaces (BCIs) critically depends on the extraction of features from the EEG carrying information relevant for the classification of different mental states. For BCIs employing imaginary movements of different limbs, the method of Common Spatial Patterns (CSP) has been shown to achieve excellent classification results. The CSP-algorithm however suffers from a lack of robustness, requiring training data without artifacts for good performance. To overcome this lack of robustness, we propose an adaptive spatial filter that replaces the training data in the CSP approach by a-priori information. More specifically, we design an adaptive spatial filter that maximizes the ratio of the variance of the electric field originating in a predefined region of interest (ROI) and the overall variance of the measured EEG. Since it is known that the component of the EEG used for discriminating imaginary movements originates in the motor cortex, we design two adaptive spatial filters with the ROIs centered in the hand areas of the left and right motor cortex. We then use these to classify EEG data recorded during imaginary movements of the right and left hand of three subjects, and show that the adaptive spatial filters outperform the CSP-algorithm, enabling classification rates of up to 94.7 % without artifact rejection.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Mining complex genotypic features for predicting HIV-1 drug resistance

Saigo, H., Uno, T., Tsuda, K.

Bioinformatics, 23(18):2455-2462, September 2007 (article)

Abstract
Human immunodeficiency virus type 1 (HIV-1) evolves in human body, and its exposure to a drug often causes mutations that enhance the resistance against the drug. To design an effective pharmacotherapy for an individual patient, it is important to accurately predict the drug resistance based on genotype data. Notably, the resistance is not just the simple sum of the effects of all mutations. Structural biological studies suggest that the association of mutations is crucial: Even if mutations A or B alone do not affect the resistance, a significant change might happen when the two mutations occur together. Linear regression methods cannot take the associations into account, while decision tree methods can reveal only limited associations. Kernel methods and neural networks implicitly use all possible associations for prediction, but cannot select salient associations explicitly. Our method, itemset boosting, performs linear regression in the complete space of power sets of mutations. It implements a forward feature selection procedure where, in each iteration, one mutation combination is found by an efficient branch-and-bound search. This method uses all possible combinations, and salient associations are explicitly shown. In experiments, our method worked particularly well for predicting the resistance of nucleotide reverse transcriptase inhibitors (NRTIs). Furthermore, it successfully recovered many mutation associations known in biological literature.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Branch and Bound for Semi-Supervised Support Vector Machines

Chapelle, O., Sindhwani, V., Keerthi, S.

In Advances in Neural Information Processing Systems 19, pages: 217-224, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
Semi-supervised SVMs (S3VMs) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
A Kernel Method for the Two-Sample-Problem

Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.

In Advances in Neural Information Processing Systems 19, pages: 513-520, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We propose two statistical tests to determine if two samples are from different distributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic. The test statistic can be computed in $O(m^2)$ time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly. We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models

Keerthi, S., Sindhwani, V., Chapelle, O.

In Advances in Neural Information Processing Systems 19, pages: 673-680, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We consider the task of tuning hyperparameters in SVM models based on minimizing a smooth performance validation function, e.g., smoothed k-fold cross-validation error, using non-linear optimization techniques. The key computation in this approach is that of the gradient of the validation function with respect to hyperparameters. We show that for large-scale problems involving a wide choice of kernel-based models and validation functions, this computation can be very efficiently done; often within just a fraction of the training time. Empirical results show that a near-optimal set of hyperparameters can be identified by our approach with very few training rounds and gradient computations.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning Dense 3D Correspondence

Steinke, F., Schölkopf, B., Blanz, V.

In Advances in Neural Information Processing Systems 19, pages: 1313-1320, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
Establishing correspondence between distinct objects is an important and nontrivial task: correctness of the correspondence hinges on properties which are difficult to capture in an a priori criterion. While previous work has used a priori criteria which in some cases led to very good results, the present paper explores whether it is possible to learn a combination of features that, for a given training set of aligned human heads, characterizes the notion of correct correspondence. By optimizing this criterion, we are then able to compute correspondence and morphs for novel heads.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Optimal Dominant Motion Estimation using Adaptive Search of Transformation Space

Ulges, A., Lampert, CH., Keysers, D., Breuel, TM.

In DAGM 2007, pages: 204-215, (Editors: Hamprecht, F. A., C. Schnörr, B. Jähne), Springer, Berlin, Germany, 29th Annual Symposium of the German Association for Pattern Recognition, September 2007 (inproceedings)

Abstract
The extraction of a parametric global motion from a motion field is a task with several applications in video processing. We present two probabilistic formulations of the problem and carry out optimization using the RAST algorithm, a geometric matching method novel to motion estimation in video. RAST uses an exhaustive and adaptive search of transformation space and thus gives -- in contrast to local sampling optimization techniques used in the past -- a globally optimal solution. Among other applications, our framework can thus be used as a source of ground truth for benchmarking motion estimation algorithms. Our main contributions are: first, the novel combination of a state-of- the-art MAP criterion for dominant motion estimation with a search procedure that guarantees global optimality. Second, experimental re- sults that illustrate the superior performance of our approach on synthetic flow fields as well as real-world video streams. Third, a significant speedup of the search achieved by extending the mod el with an additional smoothness prior.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Solving Deep Memory POMDPs with Recurrent Policy Gradients

Wierstra, D., Förster, A., Peters, J., Schmidhuber, J.

In ICANN‘07, pages: 697-706, Springer, Berlin, Germany, International Conference on Artificial Neural Networks, September 2007 (inproceedings)

Abstract
This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a “Long Short-Term Memory” architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Output Grouping using Dirichlet Mixtures of Linear Gaussian State-Space Models

Chiappa, S., Barber, D.

In ISPA 2007, pages: 446-451, IEEE Computer Society, Los Alamitos, CA, USA, 5th International Symposium on Image and Signal Processing and Analysis, September 2007 (inproceedings)

Abstract
We consider a model to cluster the components of a vector time-series. The task is to assign each component of the vector time-series to a single cluster, basing this assignment on the simultaneous dynamical similarity of the component to other components in the cluster. This is in contrast to the more familiar task of clustering a set of time-series based on global measures of their similarity. The model is based on a Dirichlet Mixture of Linear Gaussian State-Space models (LGSSMs), in which each LGSSM is treated with a prior to encourage the simplest explanation. The resulting model is approximated using a ‘collapsed’ variational Bayes implementation.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Manifold Denoising

Hein, M., Maier, M.

In Advances in Neural Information Processing Systems 19, pages: 561-568, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We consider the problem of denoising a noisily sampled submanifold $M$ in $R^d$, where the submanifold $M$ is a priori unknown and we are only given a noisy point sample. The presented denoising algorithm is based on a graph-based diffusion process of the point sample. We analyze this diffusion process using recent results about the convergence of graph Laplacians. In the experiments we show that our method is capable of dealing with non-trivial high-dimensional noise. Moreover using the denoising algorithm as pre-processing method we can improve the results of a semi-supervised learning algorithm.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
How to Find Interesting Locations in Video: A Spatiotemporal Interest Point Detector Learned from Human Eye movements

Kienzle, W., Schölkopf, B., Wichmann, F., Franz, M.

In Pattern Recognition, pages: 405-414, (Editors: FA Hamprecht and C Schnörr and B Jähne), Springer, Berlin, Germany, 29th Annual Symposium of the German Association for Pattern Recognition (DAGM), September 2007 (inproceedings)

Abstract
Interest point detection in still images is a well-studied topic in computer vision. In the spatiotemporal domain, however, it is still unclear which features indicate useful interest points. In this paper we approach the problem by emph{learning} a detector from examples: we record eye movements of human subjects watching video sequences and train a neural network to predict which locations are likely to become eye movement targets. We show that our detector outperforms current spatiotemporal interest point architectures on a standard classification dataset.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Bayesian Inference for Sparse Generalized Linear Models

Seeger, M., Gerwinn, S., Bethge, M.

In ECML 2007, pages: 298-309, Lecture Notes in Computer Science ; 4701, (Editors: Kok, J. N., J. Koronacki, R. Lopez de Mantaras, S. Matwin, D. Mladenic, A. Skowron), Springer, Berlin, Germany, 18th European Conference on Machine Learning, September 2007 (inproceedings)

Abstract
We present a framework for efficient, accurate approximate Bayesian inference in generalized linear models (GLMs), based on the expectation propagation (EP) technique. The parameters can be endowed with a factorizing prior distribution, encoding properties such as sparsity or non-negativity. The central role of posterior log-concavity in Bayesian GLMs is emphasized and related to stability issues in EP. In particular, we use our technique to infer the parameters of a point process model for neuronal spiking data from multiple electrodes, demonstrating significantly superior predictive performance when a sparsity assumption is enforced via a Laplace prior distribution.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions

Walder, C., Schölkopf, B., Chapelle, O.

In Advances in Neural Information Processing Systems 19, pages: 273-280, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We consider the problem of constructing a function whose zero set is to represent a surface, given sample points with surface normal vectors. The contributions include a novel means of regularising multi-scale compactly supported basis functions that leads to the desirable properties previously only associated with fully supported bases, and show equivalence to a Gaussian process with modified covariance function. We also provide a regularisation framework for simpler and more direct treatment of surface normals, along with a corresponding generalisation of the representer theorem. We demonstrate the techniques on 3D problems of up to 14 million data points, as well as 4D time series data.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Comparison of Adaptive Spatial Filters with Heuristic and Optimized Region of Interest for EEG Based Brain-Computer-Interfaces

Liefhold, C., Grosse-Wentrup, M., Gramann, K., Buss, M.

In Pattern Recognition, pages: 274-283, (Editors: Hamprecht, F. A., C. Schnörr, B. Jähne), Springer, Berlin, Germany, 29th Annual Symposium of the German Association for Pattern Recognition, September 2007 (inproceedings)

Abstract
Research on EEG based brain-computer-interfaces (BCIs) aims at steering devices by thought. Even for simple applications, BCIs require an extremely effective data processing to work properly because of the low signal-to-noise-ratio (SNR) of EEG signals. Spatial filtering is one successful preprocessing method, which extracts EEG components carrying the most relevant information. Unlike spatial filtering with Common Spatial Patterns (CSP), Adaptive Spatial Filtering (ASF) can be adapted to freely selectable regions of interest (ROI) and with this, artifacts can be actively suppressed. In this context, we compare the performance of ASF with ROIs selected using anatomical a-priori information and ASF with numerically optimized ROIs. Therefore, we introduce a method for data driven spatial filter adaptation and apply the achieved filters for classification of EEG data recorded during imaginary movements of the left and right hand of four subjects. The results show, that in the case of artifact-free datasets, ASFs with numerically optimized ROIs achieve classification rates of up to 97.7 % while ASFs with ROIs defined by anatomical heuristic stay at 93.7 % for the same data. Otherwise, with noisy datasets, the former brake down (66.7 %) while the latter meet 95.7 %.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Machine Learning and Applications in Biology

Shin, H.

In BioKorea 2007, pages: 337-366, BioKorea, September 2007 (inproceedings)

Abstract
The emergence of the fields of computational biology and bioinformatics has alleviated the burden of solving many biological problems, saving the time and cost required for experiments and also providing predictions that guide new experiments. Within computational biology, machine learning algorithms have played a central role in dealing with the flood of biological data. The goal of this tutorial is to raise awareness and comprehension of machine learning so that biologists can properly match the task at hand to the corresponding analytical approach. We start by categorizing biological problem settings and introduce the general machine learning schemes that fit best to each or these categories. We then explore representative models in further detail, from traditional statistical models to recent kernel models, presenting several up-to-date research projects in bioinfomatics to exemplify how biological questions can benefit from a machine learning approach. Finally, we discuss how cooperation between biologis ts and machine learners might be made smoother.

ei

[BibTex]

[BibTex]


no image
Real-Time Fetal Heart Monitoring in Biomagnetic Measurements Using Adaptive Real-Time ICA

Waldert, S., Bensch, M., Bogdan, M., Rosenstiel, W., Schölkopf, B., Lowery, C., Eswaran, H., Preissl, H.

IEEE Transactions on Biomedical Engineering, 54(10):1867-1874, September 2007 (article)

Abstract
Electrophysiological signals of the developing fetal brain and heart can be investigated by fetal magnetoencephalography (fMEG). During such investigations, the fetal heart activity and that of the mother should be monitored continuously to provide an important indication of current well-being. Due to physical constraints of an fMEG system, it is not possible to use clinically established heart monitors for this purpose. Considering this constraint, we developed a real-time heart monitoring system for biomagnetic measurements and showed its reliability and applicability in research and for clinical examinations. The developed system consists of real-time access to fMEG data, an algorithm based on Independent Component Analysis (ICA), and a graphical user interface (GUI). The algorithm extracts the current fetal and maternal heart signal from a noisy and artifact-contaminated data stream in real-time and is able to adapt automatically to continuously varying environmental parameters. This algorithm has been na med Adaptive Real-time ICA (ARICA) and is applicable to real-time artifact removal as well as to related blind signal separation problems.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Nonparametric Approach to Bottom-Up Visual Saliency

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

In Advances in Neural Information Processing Systems 19, pages: 689-696, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
This paper addresses the bottom-up influence of local image information on human eye movements. Most existing computational models use a set of biologically plausible linear filters, e.g., Gabor or Difference-of-Gaussians filters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. Unfortunately, this requires many design parameters such as the number, type, and size of the front-end filters, as well as the choice of nonlinearities, weighting and normalization schemes etc., for which biological plausibility cannot always be justified. As a result, these parameters have to be chosen in a more or less ad hoc way. Here, we propose to emph{learn} a visual saliency model directly from human eye movement data. The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the field that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. Experimental results show that - despite the lack of any biological prior knowledge - our model performs comparably to existing approaches, and in fact learns image features that resemble findings from several previous studies. In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive fields in the early human visual system.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Information Bottleneck for Non Co-Occurrence Data

Seldin, Y., Slonim, N., Tishby, N.

In Advances in Neural Information Processing Systems 19, pages: 1241-1248, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We present a general model-independent approach to the analysis of data in cases when these data do not appear in the form of co-occurrence of two variables X, Y, but rather as a sample of values of an unknown (stochastic) function Z(X,Y). For example, in gene expression data, the expression level Z is a function of gene X and condition Y; or in movie ratings data the rating Z is a function of viewer X and movie Y . The approach represents a consistent extension of the Information Bottleneck method that has previously relied on the availability of co-occurrence statistics. By altering the relevance variable we eliminate the need in the sample of joint distribution of all input variables. This new formulation also enables simple MDL-like model complexity control and prediction of missing values of Z. The approach is analyzed and shown to be on a par with the best known clustering algorithms for a wide range of domains. For the prediction of missing values (collaborative filtering) it improves the currently best known results.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning with Hypergraphs: Clustering, Classification, and Embedding

Zhou, D., Huang, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 19, pages: 1601-1608, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We usually endow the investigated objects with pairwise relationships, which can be illustrated as graphs. In many real-world problems, however, relationships among the objects of our interest are more complex than pairwise. Naively squeezing the complex relationships into pairwise ones will inevitably lead to loss of information which can be expected valuable for our learning tasks however. Therefore we consider using hypergraphs instead to completely represent complex relationships among the objects of our interest, and thus the problem of learning with hypergraphs arises. Our main contribution in this paper is to generalize the powerful methodology of spectral clustering which originally operates on undirected graphs to hypergraphs, and further develop algorithms for hypergraph embedding and transductive classi¯cation on the basis of the spectral hypergraph clustering approach. Our experiments on a number of benchmarks showed the advantages of hypergraphs over usual graphs.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fast Computation of Graph Kernels

Vishwanathan, SVN., Borgwardt, KM., Schraudolph, N.

In Advances in Neural Information Processing Systems 19, pages: 1449-1456, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
Using extensions of linear algebra concepts to Reproducing Kernel Hilbert Spaces (RKHS), we define a unifying framework for random walk kernels on graphs. Reduction to a Sylvester equation allows us to compute many of these kernels in O(n3) worst-case time. This includes kernels whose previous worst-case time complexity was O(n6), such as the geometric kernels of G¨artner et al. [1] and the marginal graph kernels of Kashima et al. [2]. Our algebra in RKHS allow us to exploit sparsity in directed and undirected graphs more effectively than previous methods, yielding sub-cubic computational complexity when combined with conjugate gradient solvers or fixed-point iterations. Experiments on graphs from bioinformatics and other application domains show that our algorithms are often more than 1000 times faster than existing approaches.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Correcting Sample Selection Bias by Unlabeled Data

Huang, J., Smola, A., Gretton, A., Borgwardt, K., Schölkopf, B.

In Advances in Neural Information Processing Systems 19, pages: 601-608, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Our method works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Overcomplete Independent Component Analysis via Linearly Constrained Minimum Variance Spatial Filtering

Grosse-Wentrup, M., Buss, M.

Journal of VLSI Signal Processing, 48(1-2):161-171, August 2007 (article)

Abstract
Independent Component Analysis (ICA) designed for complete bases is used in a variety of applications with great success, despite the often questionable assumption of having N sensors and M sources with N&#8805;M. In this article, we assume a source model with more sources than sensors (M>N), only L<N of which are assumed to have a non-Gaussian distribution. We argue that this is a realistic source model for a variety of applications, and prove that for ICA algorithms designed for complete bases (i.e., algorithms assuming N=M) based on mutual information the mixture coefficients of the L non-Gaussian sources can be reconstructed in spite of the overcomplete mixture model. Further, it is shown that the reconstructed temporal activity of non-Gaussian sources is arbitrarily mixed with Gaussian sources. To obtain estimates of the temporal activity of the non-Gaussian sources, we use the correctly reconstructed mixture coefficients in conjunction with linearly constrained minimum variance spatial filtering. This results in estimates of the non-Gaussian sources minimizing the variance of the interference of other sources. The approach is applied to the denoising of Event Related Fields recorded by MEG, and it is shown that it performs superiorly to ordinary ICA.

ei

PDF PDF DOI [BibTex]


no image
Collaborative Filtering via Ensembles of Matrix Factorizations

Wu, M.

In KDD Cup and Workshop 2007, pages: 43-47, KDD Cup and Workshop, August 2007 (inproceedings)

Abstract
We present a Matrix Factorization(MF) based approach for the Netflix Prize competition. Currently MF based algorithms are popular and have proved successful for collaborative filtering tasks. For the Netflix Prize competition, we adopt three different types of MF algorithms: regularized MF, maximum margin MF and non-negative MF. Furthermore, for each MF algorithm, instead of selecting the optimal parameters, we combine the results obtained with several parameters. With this method, we achieve a performance that is more than 6% better than the Netflix&amp;lsquo;s own system.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fusion of spectral and spatial information by a novel SVM classification technique

Bruzzone, L., Marconcini, M., Persello, C.

In pages: 4838-4841 , IEEE, Piscataway, NJ, USA, IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2007 (inproceedings)

Abstract
A novel context-sensitive semisupervised classification technique based on support vector machines is proposed. This technique aims at exploiting the SVM method for image classification by properly fusing spectral information with spatial- context information. This results in: i) an increased robustness to noisy training sets in the learning phase of the classifier; ii) a higher and more stable classification accuracy with respect to the specific patterns included in the training set; and iii) a regularized classification map. The main property of the proposed context sensitive semisupervised SVM (CS4VM) is to adaptively exploit the contextual information in the training phase of the classifier, without any critical assumption on the expected labels of the pixels included in the same neighborhood system. This is done by defining a novel context-sensitive term in the objective function used in the learning of the classifier. In addition, the proposed CS4VM can be integrated with a Markov random field (MRF) approach for exploiting the contextual information also to regularize the classification map. Experiments carried out on very high geometrical resolution images confirmed the effectiveness of the proposed technique.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Online-Computation Approach to Optimal Control of Noise-Affected Nonlinear Systems with Continuous State and Control Spaces

Deisenroth, MP., Weissel, F., Ohtsuka, T., Hanebeck, UD.

In ECC‘07, pages: 3664-3671, 9th European Control Conference, July 2007 (inproceedings)

Abstract
A novel online-computation approach to optimal control of nonlinear, noise-affected systems with continuous state and control spaces is presented. In the proposed algorithm, system noise is explicitly incorporated into the control decision. This leads to superior results compared to state-of-the-art nonlinear controllers that neglect this influence. The solution of an optimal nonlinear controller for a corresponding deterministic system is employed to find a meaningful state space restriction. This restriction is obtained by means of approximate state prediction using the noisy system equation. Within this constrained state space, an optimal closed-loop solution for a finite decision-making horizon (prediction horizon) is determined within an adaptively restricted optimization space. Interleaving stochastic dynamic programming and value function approximation yields a solution to the considered optimal control problem. The enhanced performance of the proposed discrete-time controller is illustrated by means o f a scalar example system. Nonlinear model predictive control is applied to address approximate treatment of infinite-horizon problems by the finite-horizon controller.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Feature Selection for Trouble Shooting in Complex Assembly Lines

Pfingsten, T., Herrmann, D., Schnitzler, T., Feustel, A., Schölkopf, B.

IEEE Transactions on Automation Science and Engineering, 4(3):465-469, July 2007 (article)

Abstract
The final properties of sophisticated products can be affected by many unapparent dependencies within the manufacturing process, and the products’ integrity can often only be checked in a final measurement. Troubleshooting can therefore be very tedious if not impossible in large assembly lines. In this paper we show that Feature Selection is an efficient tool for serial-grouped lines to reveal causes for irregularities in product attributes. We compare the performance of several methods for Feature Selection on real-world problems in mass-production of semiconductor devices. Note to Practitioners— We present a data based procedure to localize flaws in large production lines: using the results of final quality inspections and information about which machines processed which batches, we are able to identify machines which cause low yield.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Kernel Approach to Comparing Distributions

Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.

In Proceedings of the 22. AAAI Conference on Artificial Intelligence, pages: 1637-1641, AAAI Press, Menlo Park, CA, USA, Twenty-Second AAAI Conference on Artificial Intelligence (AAAI), July 2007 (inproceedings)

Abstract
We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a Reproducing Kernel Hilbert Space. We apply this technique to construct a two-sample test, which is used for determining whether two sets of observations arise from the same distribution. We use this test in attribute matching for databases using the Hungarian marriage method, where it performs strongly. We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Gene selection via the BAHSIC family of algorithms

Song, L., Bedo, J., Borgwardt, K., Gretton, A., Smola, A.

Bioinformatics, 23(13: ISMB/ECCB 2007 Conference Proceedings):i490-i498, July 2007 (article)

Abstract
Motivation: Identifying significant genes among thousands of sequences on a microarray is a central challenge for cancer research in bioinformatics. The ultimate goal is to detect the genes that are involved in disease outbreak and progression. A multitude of methods have been proposed for this task of feature selection, yet the selected gene lists differ greatly between different methods. To accomplish biologically meaningful gene selection from microarray data, we have to understand the theoretical connections and the differences between these methods. In this article, we define a kernel-based framework for feature selection based on the Hilbert–Schmidt independence criterion and backward elimination, called BAHSIC. We show that several well-known feature selectors are instances of BAHSIC, thereby clarifying their relationship. Furthermore, by choosing a different kernel, BAHSIC allows us to easily define novel feature selection algorithms. As a further advantage, feature selection via BAHSIC works directly on multiclass problems. Results: In a broad experimental evaluation, the members of the BAHSIC family reach high levels of accuracy and robustness when compared to other feature selection techniques. Experiments show that features selected with a linear kernel provide the best classification performance in general, but if strong non-linearities are present in the data then non-linear kernels can be more suitable.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Phenotyping of Chondrocytes In Vivo and In Vitro Using cDNA Array Technology

Zien, A., Gebhard, P., Fundel, K., Aigner, T.

Clinical Orthopaedics and Related Research, 460, pages: 226-233, July 2007 (article)

Abstract
The cDNA array technology is a powerful tool to analyze a high number of genes in parallel. We investigated whether large-scale gene expression analysis allows clustering and identification of cellular phenotypes of chondrocytes in different in vivo and in vitro conditions. In 100% of cases, clustering analysis distinguished between in vivo and in vitro samples, suggesting fundamental differences in chondrocytes in situ and in vitro regardless of the culture conditions or disease status. It also allowed us to differentiate between healthy and osteoarthritic cartilage. The clustering also revealed the relative importance of the investigated culturing conditions (stimulation agent, stimulation time, bead/monolayer). We augmented the cluster analysis with a statistical search for genes showing differential expression. The identified genes provided hints to the molecular basis of the differences between the sample classes. Our approach shows the power of modern bioinformatic algorithms for understanding and class ifying chondrocytic phenotypes in vivo and in vitro. Although it does not generate new experimental data per se, it provides valuable information regarding the biology of chondrocytes and may provide tools for diagnosing and staging the osteoarthritic disease process.

ei

DOI [BibTex]

DOI [BibTex]


no image
Manifold Denoising as Preprocessing for Finding Natural Representations of Data

Hein, M., Maier, M.

In AAAI-07, pages: 1646-1649, AAAI Press, Menlo Park, CA, USA, Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-07), July 2007 (inproceedings)

Abstract
A natural representation of data are the parameters which generated the data. If the parameter space is continuous we can regard it as a manifold. In practice we usually do not know this manifold but we just have some representation of the data, often in a very high-dimensional feature space. Since the number of internal parameters does not change with the representation, the data will effectively lie on a low-dimensional submanifold in feature space. Due to measurement errors this data is usually corrupted by noise which particularly in high-dimensional feature spaces makes it almost impossible to find the manifold structure. This paper reviews a method called Manifold Denoising which projects the data onto the submanifold using a diffusion process on a graph generated by the data. We will demonstrate that the method is capable of dealing with non-trival high-dimensional noise. Moreover we will show that using the method as a preprocessing step one can significantly improve the results of a semi-supervised learning algorithm.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graph-based Protein Functional Classification

Shin, H., Lisewski, A., Lichtarge, O.

In BIOCOMP‘07, pages: 738-744, (Editors: Arabnia, H. R., M. Q. Yang, J. Y. Yang), CSREA Press, Las Vegas, NV, USA, 2007 International Conference on Bioinformatics and Computational Biology, July 2007 (inproceedings)

ei

Web [BibTex]

Web [BibTex]


no image
Common Sequence Polymorphisms Shaping Genetic Diversity in Arabidopsis thaliana

Clark, R., Schweikert, G., Toomajian, C., Ossowski, S., Zeller, G., Shinn, P., Warthmann, N., Hu, T., Fu, G., Hinds, D., Chen, H., Frazer, K., Huson, D., Schölkopf, B., Nordborg, M., Rätsch, G., Ecker, J., Weigel, D.

Science, 317(5836):338-342, July 2007 (article)

Abstract
The genomes of individuals from the same species vary in sequence as a result of different evolutionary processes. To examine the patterns of, and the forces shaping, sequence variation in Arabidopsis thaliana, we performed high-density array resequencing of 20 diverse strains (accessions). More than 1 million nonredundant single-nucleotide polymorphisms (SNPs) were identified at moderate false discovery rates (FDRs), and ~4% of the genome was identified as being highly dissimilar or deleted relative to the reference genome sequence. Patterns of polymorphism are highly nonrandom among gene families, with genes mediating interaction with the biotic environment having exceptional polymorphism levels. At the chromosomal scale, regional variation in polymorphism was readily apparent. A scan for recent selective sweeps revealed several candidate regions, including a notable example in which almost all variation was removed in a 500-kilobase window. Analyzing the polymorphisms we describe in larger sets of accessions will enable a detailed understanding of forces shaping population-wide sequence variation in A. thaliana.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Supervised Feature Selection via Dependence Estimation

Song, L., Smola, A., Gretton, A., Borgwardt, K., Bedo, J.

In Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007), pages: 823-830, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, Twenty-Fourth Annual International Conference on Machine Learning (ICML), June 2007 (inproceedings)

Abstract
We introduce a framework for filtering features that employs the Hilbert-Schmidt Independence Criterion (HSIC) as a measure of dependence between the features and the labels. The key idea is that good features should maximise such dependence. Feature selection for various supervised learning problems (including classification and regression) is unified under this framework, and the solutions can be approximated using a backward-elimination algorithm. We demonstrate the usefulness of our method on both artificial and real world datasets.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Kernel-Based Causal Learning Algorithm

Sun, X., Janzing, D., Schölkopf, B., Fukumizu, K.

In Proceedings of the 24th International Conference on Machine Learning, pages: 855-862, (Editors: Z Ghahramani), ACM Press, New York, NY, USA, ICML, June 2007 (inproceedings)

Abstract
We describe a causal learning method, which employs measuring the strength of statistical dependences in terms of the Hilbert-Schmidt norm of kernel-based cross-covariance operators. Following the line of the common faithfulness assumption of constraint-based causal learning, our approach assumes that a variable Z is likely to be a common effect of X and Y, if conditioning on Z increases the dependence between X and Y. Based on this assumption, we collect "votes" for hypothetical causal directions and orient the edges by the majority principle. In most experiments with known causal structures, our method provided plausible results and outperformed the conventional constraint-based PC algorithm.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Entire Regularization Paths for Graph Data

Tsuda, K.

In ICML 2007, pages: 919-926, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th Annual International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
Graph data such as chemical compounds and XML documents are getting more common in many application domains. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraph patterns, the dimensionality gets too large for usual statistical methods. We propose an efficient method to select a small number of salient patterns by regularization path tracking. The generation of useless patterns is minimized by progressive extension of the search space. In experiments, it is shown that our technique is considerably more efficient than a simpler approach based on frequent substructure mining.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Les Représentations Prédictives des États et des Politiques

Boularias, A., Chaib-Draa, B.

In MFI 2007, pages: 37-48, Quatrièmes Journées Francophones Modèles Formels de l‘Interaction, June 2007 (inproceedings)

Abstract
Nous proposons dans cet article une nouvelle approche pour représenter les politiques (stratégies) dans les environnements stochastiques et partiellement observables. Nous nous intéressons plus particulièrement aux systèmes multi-agents, où chaque agent connaît uniquement ses propres politiques, et doit choisir la meilleure parmi elles selon son état de croyance sur les politiques du reste des agents. Notre modèle utilise moins de paramètres que les méthodes de représentation usuelles, telles que les arbres de décision ou les contrôleurs d’états finis stochastiques, permettant ainsi une accélération des algorithmes de planification. Nous montrons aussi comment ce modèle peut être utilisé efficacement dans le cas de la planification multiagents coopérative et sans communication, les résultats empiriques sont comparés avec le modèle DEC-POMDP (Decentralized Partially Observable Markov Decision Process).

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graph Laplacians and their Convergence on Random Neighborhood Graphs

Hein, M., Audibert, J., von Luxburg, U.

Journal of Machine Learning Research, 8, pages: 1325-1370, June 2007 (article)

Abstract
Given a sample from a probability measure with support on a submanifold in Euclidean space one can construct a neighborhood graph which can be seen as an approximation of the submanifold. The graph Laplacian of such a graph is used in several machine learning methods like semi-supervised learning, dimensionality reduction and clustering. In this paper we determine the pointwise limit of three different graph Laplacians used in the literature as the sample size increases and the neighborhood size approaches zero. We show that for a uniform measure on the submanifold all graph Laplacians have the same limit up to constants. However in the case of a non-uniform measure on the submanifold only the so called random walk graph Laplacian converges to the weighted Laplace-Beltrami operator.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
An Extensible Probabilistic Transformation-based Approach to the Third Recognizing Textual Entailment Challenge

Harmeling, S.

In TextEntail 2007, pages: 137-142, ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, June 2007 (inproceedings)

Abstract
We introduce a system for textual entailment that is based on a probabilistic model of entailment. The model is defined using some calculus of transformations on dependency trees, which is characterized by the fact that derivations in that calculus preserve the truth only with a certain probability. We also describe a possible set of transformations (and with it implicitly a calculus) that was successfully applied to the RTE3 challenge data. However, our system can be improved in many ways and we see it as the starting point for a promising new approach to textual entailment.

ei

Web [BibTex]

Web [BibTex]


no image
Weighted Substructure Mining for Image Analysis

Nowozin, S., Tsuda, K., Uno, T., Kudo, T., BakIr, G.

In CVPR 2007, pages: 1-8, IEEE Computer Society, Los Alamitos, CA, USA, 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2007 (inproceedings)

Abstract
In web-related applications of image categorization, it is desirable to derive an interpretable classification rule with high accuracy. Using the bag-of-words representation and the linear support vector machine, one can partly fulfill the goal, but the accuracy of linear classifiers is not high and the obtained features are not informative for users. We propose to combine item set mining and large margin classifiers to select features from the power set of all visual words. Our resulting classification rule is easier to browse and simpler to understand, because each feature has richer information. As a next step, each image is represented as a graph where nodes correspond to local image features and edges encode geometric relations between features. Combining graph mining and boosting, we can obtain a classification rule based on subgraph features that contain more information than the set features. We evaluate our algorithm in a web-retrieval ranking task where the goal is to reject outliers from a set of images returned for a keyword query. Furthermore, it is evaluated on the supervised classification tasks with the challenging VOC2005 data set. Our approach yields excellent accuracy in the unsupervised ranking task compared to a recently proposed probabilistic model and competitive results in the supervised classification task.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Local Learning Projections

Wu, M., Yu, K., Yu, S., Schölkopf, B.

In Proceedings of the 24th International Conference on Machine Learning, pages: 1039-1046, (Editors: Z Ghahramani), ACM Press, New York, NY, USA, ICML, June 2007 (inproceedings)

Abstract
This paper presents a Local Learning Projection (LLP) approach for linear dimensionality reduction. We first point out that the well known Principal Component Analysis (PCA) essentially seeks the projection that has the minimal global estimation error. Then we propose a dimensionality reduction algorithm that leads to the projection with the minimal local estimation error, and elucidate its advantages for classification tasks. We also indicate that LLP keeps the local information in the sense that the projection value of each point can be well estimated based on its neighbors and their projection values. Experimental results are provided to validate the effectiveness of the proposed algorithm.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Training and Approximation of a Primal Multiclass Support Vector Machine

Zien, A., Bona, F., Ong, C.

In ASMDA 2007, pages: 1-8, (Editors: Skiadas, C. H.), 12th International Conference on Applied Stochastic Models and Data Analysis, June 2007 (inproceedings)

Abstract
We revisit the multiclass support vector machine (SVM) and generalize the formulation to convex loss functions and joint feature maps. Motivated by recent work [Chapelle, 2006] we use logistic loss and softmax to enable gradient based primal optimization. Kernels are incorporated via kernel principal component analysis (KPCA), which naturally leads to approximation methods for large scale problems. We investigate similarities and differences to previous multiclass SVM approaches. Experimental comparisons to previous approaches and to the popular one-vs-rest SVM are presented on several different datasets.

ei

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
Nonlinear independent component analysis with minimum nonlinear distortion

Zhang, K., Chan, L.

In ICML ’07: Proceedings of the 24th international conference on Machine learning, pages: 1127-1134, (Editors: Z Ghahramani), ACM, New York, NY, USA, 24th International Conference on Machine Learning (ICML), June 2007 (inproceedings)

ei

PDF [BibTex]

PDF [BibTex]