Header logo is


2008


no image
Episodic Reinforcement Learning by Logistic Reward-Weighted Regression

Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.

In ICANN 2008, pages: 407-416, (Editors: Kurkova-Pohlova, V. , R. Neruda, J. Koutnik), Springer, Berlin, Germany, 18th International Conference on Artificial Neural Networks, September 2008 (inproceedings)

Abstract
It has been a long-standing goal in the adaptive control community to reduce the generically difficult, general reinforcement learning (RL) problem to simpler problems solvable by supervised learning. While this approach is today’s standard for value function-based methods, fewer approaches are known that apply similar reductions to policy search methods. Recently, it has been shown that immediate RL problems can be solved by reward-weighted regression, and that the resulting algorithm is an expectation maximization (EM) algorithm with strong guarantees. In this paper, we extend this algorithm to the episodic case and show that it can be used in the context of LSTM recurrent neural networks (RNNs). The resulting RNN training algorithm is equivalent to a weighted self-modeling supervised learning technique. We focus on partially observable Markov decision problems (POMDPs) where it is essential that the policy is nonstationary in order to be optimal. We show that this new reward-weighted logistic regression u sed in conjunction with an RNN architecture can solve standard benchmark POMDPs with ease.

ei

PDF Web DOI [BibTex]

2008


PDF Web DOI [BibTex]


no image
Exact Dynamic Programming for Decentralized POMDPs with Lossless Policy Compression

Boularias, A., Chaib-Draa, B.

In ICAPS 2008, pages: 20-27, (Editors: Rintanen, J. , B. Nebel, J. C. Beck, E. A. Hansen), AAAI Press, Menlo Park, CA, USA, Eighteenth International Conference on Automated Planning and Scheduling, September 2008 (inproceedings)

Abstract
High dimensionality of belief space in DEC-POMDPs is one of the major causes that makes the optimal joint policy computation intractable. The belief state for a given agent is a probability distribution over the system states and the policies of other agents. Belief compression is an efficient POMDP approach that speeds up planning algorithms by projecting the belief state space to a low-dimensional one. In this paper, we introduce a new method for solving DEC-POMDP problems, based on the compression of the policy belief space. The reduced policy space contains sequences of actions and observations that are linearly independent. We tested our approach on two benchmark problems, and the preliminary results confirm that Dynamic Programming algorithm scales up better when the policy belief is compressed.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Similarity, Kernels, and the Triangle Inequality

Jäkel, F., Schölkopf, B., Wichmann, F.

Journal of Mathematical Psychology, 52(5):297-303, September 2008 (article)

Abstract
Similarity is used as an explanatory construct throughout psychology and multidimensional scaling (MDS) is the most popular way to assess similarity. In MDS, similarity is intimately connected to the idea of a geometric representation of stimuli in a perceptual space. Whilst connecting similarity and closeness of stimuli in a geometric representation may be intuitively plausible, Tversky and Gati [Tversky, A., Gati, I. (1982). Similarity, separability, and the triangle inequality. Psychological Review, 89(2), 123–154] have reported data which are inconsistent with the usual geometric representations that are based on segmental additivity. We show that similarity measures based on Shepard’s universal law of generalization [Shepard, R. N. (1987). Toward a universal law of generalization for psychologica science. Science, 237(4820), 1317–1323] lead to an inner product representation in a reproducing kernel Hilbert space. In such a space stimuli are represented by their similarity to all other stimuli. This representation, based on Shepard’s law, has a natural metric that does not have additive segments whilst still retaining the intuitive notion of connecting similarity and distance between stimuli. Furthermore, this representation has the psychologically appealing property that the distance between stimuli is bounded.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Comparison of Pattern Recognition Methods in Classifying High-resolution BOLD Signals Obtained at High Magnetic Field in Monkeys

Ku, S., Gretton, A., Macke, J., Logothetis, N.

Magnetic Resonance Imaging, 26(7):1007-1014, September 2008 (article)

Abstract
Pattern recognition methods have shown that functional magnetic resonance imaging (fMRI) data can reveal significant information about brain activity. For example, in the debate of how object categories are represented in the brain, multivariate analysis has been used to provide evidence of a distributed encoding scheme [Science 293:5539 (2001) 2425–2430]. Many follow-up studies have employed different methods to analyze human fMRI data with varying degrees of success [Nature reviews 7:7 (2006) 523–534]. In this study, we compare four popular pattern recognition methods: correlation analysis, support-vector machines (SVM), linear discriminant analysis (LDA) and Gaussian naïve Bayes (GNB), using data collected at high field (7 Tesla) with higher resolution than usual fMRI studies. We investigate prediction performance on single trials and for averages across varying numbers of stimulus presentations. The performance of the various algorithms depends on the nature of the brain activity being categorized: for several tasks, many of the methods work well, whereas for others, no method performs above chance level. An important factor in overall classification performance is careful preprocessing of the data, including dimensionality reduction, voxel selection and outlier elimination.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Local Gaussian Processes Regression for Real-time Model-based Robot Control

Nguyen-Tuong, D., Peters, J.

In IROS 2008, pages: 380-385, IEEE Service Center, Piscataway, NJ, USA, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, September 2008 (inproceedings)

Abstract
High performance and compliant robot control require accurate dynamics models which cannot be obtained analytically for sufficiently complex robot systems. In such cases, machine learning offers a promising alternative for approximating the robot dynamics using measured data. This approach offers a natural framework to incorporate unknown nonlinearities as well as to continually adapt online for changes in the robot dynamics. However, the most accurate regression methods, e.g. Gaussian processes regression (GPR) and support vector regression (SVR), suffer from exceptional high computational complexity which prevents their usage for large numbers of samples or online learning to date. Inspired by locally linear regression techniques, we propose an approximation to the standard GPR using local Gaussian processes models. Due to reduced computational cost, local Gaussian processes (LGP) can be applied for larger sample-sizes and online learning. Comparisons with other nonparametric regressions, e.g. standard GPR, nu-SVR and locally weighted projection regression (LWPR), show that LGP has higher accuracy than LWPR close to the performance of standard GPR and nu-SVR while being sufficiently fast for online learning.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Colored Maximum Variance Unfolding

Song, L., Smola, A., Borgwardt, K., Gretton, A.

In Advances in neural information processing systems 20, pages: 1385-1392, (Editors: Platt, J. C., D. Koller, Y. Singer, S. Roweis), Curran, Red Hook, NY, USA, Twenty-First Annual Conference on Neural Information Processing Systems (NIPS), September 2008 (inproceedings)

Abstract
Maximum variance unfolding (MVU) is an effective heuristic for dimensionality reduction. It produces a low-dimensional representation of the data by maximizing the variance of their embeddings while preserving the local distances of the original data. We show that MVU also optimizes a statistical dependence measure which aims to retain the identity of individual observations under the distancepreserving constraints. This general view allows us to design "colored" variants of MVU, which produce low-dimensional representations for a given task, e.g. subject to class labels or other side information.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Assessing Nonlinear Granger Causality from Multivariate Time Series

Sun, X.

In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008, pages: 440-455, (Editors: Daelemans, W. , B. Goethals, K. Morik), Springer, Berlin, Germany, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), September 2008 (inproceedings)

Abstract
A straightforward nonlinear extension of Granger’s concept of causality in the kernel framework is suggested. The kernel-based approach to assessing nonlinear Granger causality in multivariate time series enables us to determine, in a model-free way, whether the causal relation between two time series is present or not and whether it is direct or mediated by other processes. The trace norm of the so-called covariance operator in feature space is used to measure the prediction error. Relying on this measure, we test the improvement of predictability between time series by subsampling-based multiple testing. The distributional properties of the resulting p-values reveal the direction of Granger causality. Experiments with simulated and real-world data show that our method provides encouraging results.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Planning in Decentralized POMDPs with Predictive Policy Representations

Boularias, A., Chaib-Draa, B.

In MASPLAN 2008, pages: 1-7, ICAPS Multiagent Planning Workshop, September 2008 (inproceedings)

Abstract
We discuss the problem of policy representation in stochastic and partially observable systems, and address the case where the policy is a hidden parameter of the planning problem. We propose an adaptation of the Predictive State Representations (PSRs) to this problem by introducing tests (sequences of actions and observations) on policies. The new model, called the Predictive Policy Representations (PPRs), is potentially more compact than the other representations, such as decision trees or Finite-State Controllers (FSCs). In this paper, we show how PPRs can be used to improve the performances of a point-based algorithm for DEC-POMDP.

ei

PDF [BibTex]

PDF [BibTex]


no image
Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

Bethge, M., Berens, P.

In Advances in neural information processing systems 20, pages: 97-104, (Editors: Platt, J. C., D. Koller, Y. Singer, S. Roweis), Curran, Red Hook, NY, USA, Twenty-First Annual Conference on Neural Information Processing Systems (NIPS), September 2008 (inproceedings)

Abstract
Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data - the model parameters can be derived in closed form and sampling is easy. We demonstrate its usefulness by studying a simple neural representation model of natural images. For the first time, we are able to directly compare predictions from a pairwise maximum entropy model not only in small groups of neurons, but also in larger populations of more than thousand units. Our results indicate that in such larger networks interactions exist that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics extrem ely well up to the limit of dimensionality where estimation of the full joint distribution is feasible.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning Perceptual Coupling for Motor Primitives

Kober, J., Mohler, B., Peters, J.

In IROS 2008, pages: 834-839, IEEE Service Center, Piscataway, NJ, USA, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, September 2008 (inproceedings)

Abstract
Dynamic system-based motor primitives [1] have enabled robots to learn complex tasks ranging from Tennisswings to locomotion. However, to date there have been only few extensions which have incorporated perceptual coupling to variables of external focus, and, furthermore, these modifications have relied upon handcrafted solutions. Humans learn how to couple their movement primitives with external variables. Clearly, such a solution is needed in robotics. In this paper, we propose an augmented version of the dynamic systems motor primitives which incorporates perceptual coupling to an external variable. The resulting perceptually driven motor primitives include the previous primitives as a special case and can inherit some of their interesting properties. We show that these motor primitives can perform complex tasks such as Ball-in-a-Cup or Kendama task even with large variances in the initial conditions where a skilled human player would be challenged. For doing so, we initialize the motor primitives in the traditional way by imitation learning without perceptual coupling. Subsequently, we improve the motor primitives using a novel reinforcement learning method which is particularly well-suited for motor primitives.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Receptive Fields without Spike-Triggering

Macke, J., Zeck, G., Bethge, M.

In Advances in neural information processing systems 20, pages: 969-976, (Editors: Platt, J. C., D. Koller, Y. Singer, S. Roweis), Curran, Red Hook, NY, USA, Twenty-First Annual Conference on Neural Information Processing Systems (NIPS), September 2008 (inproceedings)

Abstract
Stimulus selectivity of sensory neurons is often characterized by estimating their receptive field properties such as orientation selectivity. Receptive fields are usually derived from the mean (or covariance) of the spike-triggered stimulus ensemble. This approach treats each spike as an independent message but does not take into account that information might be conveyed through patterns of neural activity that are distributed across space or time. Can we find a concise description for the processing of a whole population of neurons analogous to the receptive field for single neurons? Here, we present a generalization of the linear receptive field which is not bound to be triggered on individual spikes but can be meaningfully linked to distributed response patterns. More precisely, we seek to identify those stimulus features and the corresponding patterns of neural activity that are most reliably coupled. We use an extension of reverse-correlation methods based on canonical correlation analysis. The resulting population receptive fields span the subspace of stimuli that is most informative about the population response. We evaluate our approach using both neuronal models and multi-electrode recordings from rabbit retinal ganglion cells. We show how the model can be extended to capture nonlinear stimulus-response relationships using kernel canonical correlation analysis, which makes it possible to test different coding mechanisms. Our technique can also be used to calculate receptive fields from multi-dimensional neural measurements such as those obtained from dynamic imaging methods.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
An Automated Combination of Kernels for Predicting Protein Subcellular Localization

Ong, C., Zien, A.

In WABI 2008, pages: 186-197, (Editors: Crandall, K. A., J. Lagergren), Springer, Berlin, Germany, 8th Workshop on Algorithms in Bioinformatics, September 2008 (inproceedings)

Abstract
Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer. Here we utilize the multiclass support vector machine (m-SVM) method to directly solve protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. We further propose a general class of protein sequence kernels which considers all motifs, including motifs with gaps. Instead of heuristically selecting one or a few kernels from this family, we utilize a recent extension of SVMs that optimizes over multiple kernels simultaneously. This way, we automatically search over families of possible amino acid motifs. We compare our automated approach to three other predictors on four different datasets, and show that we perform better than the current state of the art. Further, our method provides some insights as to which sequence motifs are most useful for determining subcellular ocalization, which are in agreement with biological reasoning.

ei

PDF PDF Web DOI [BibTex]

PDF PDF Web DOI [BibTex]


no image
Discriminative K-means for Clustering

Ye, J., Zhao, Z., Wu, M.

In Advances in neural information processing systems 20, pages: 1649-1656, (Editors: Platt, J. C., D. Koller, Y. Singer, S. Roweis), Curran, Red Hook, NY, USA, Twenty-First Annual Conference on Neural Information Processing Systems (NIPS), September 2008 (inproceedings)

Abstract
We present a theoretical study on the discriminative clustering framework, recently proposed for simultaneous subspace selection via linear discriminant analysis (LDA) and clustering. Empirical results have shown its favorable performance in comparison with several other popular clustering algorithms. However, the inherent relationship between subspace selection and clustering in this framework is not well understood, due to the iterative nature of the algorithm. We show in this paper that this iterative subspace selection and clustering is equivalent to kernel K-means with a specific kernel Gram matrix. This provides significant and new insights into the nature of this subspace selection procedure. Based on this equivalence relationship, we propose the Discriminative K-means (DisKmeans) algorithm for simultaneous LDA subspace selection and clustering, as well as an automatic parameter estimation procedure. We also present the nonlinear extension of DisKmeans using kernels. We show that the learning of the ke rnel matrix over a convex set of pre-specified kernel matrices can be incorporated into the clustering formulation. The connection between DisKmeans and several other clustering algorithms is also analyzed. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Bayesian Inference for Spiking Neuron Models with a Sparsity Prior

Gerwinn, S., Macke, J., Seeger, M., Bethge, M.

In Advances in neural information processing systems 20, pages: 529-536, (Editors: Platt, J. C., D. Koller, Y. Singer, S. Roweis), Curran, Red Hook, NY, USA, Twenty-First Annual Conference on Neural Information Processing Systems (NIPS), September 2008 (inproceedings)

Abstract
Generalized linear models are the most commonly used tools to describe the stimulus selectivity of sensory neurons. Here we present a Bayesian treatment of such models. Using the expectation propagation algorithm, we are able to approximate the full posterior distribution over all weights. In addition, we use a Laplacian prior to favor sparse solutions. Therefore, stimulus features that do not critically influence neural activity will be assigned zero weights and thus be effectively excluded by the model. This feature selection mechanism facilitates both the interpretation of the neuron model as well as its predictive abilities. The posterior distribution can be used to obtain confidence intervals which makes it possible to assess the statistical significance of the solution. In neural data analysis, the available amount of experimental measurements is often limited whereas the parameter space is large. In such a situation, both regularization by a sparsity prior and uncertainty estimates for the model parameters are essential. We apply our method to multi-electrode recordings of retinal ganglion cells and use our uncertainty estimate to test the statistical significance of functional couplings between neurons. Furthermore we used the sparsity of the Laplace prior to select those filters from a spike-triggered covariance analysis that are most informative about the neural response.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Policy Gradients with Parameter-based Exploration for Control

Sehnke, F., Osendorfer, C., Rückstiess, T., Graves, A., Peters, J., Schmidhuber, J.

In ICANN 2008, pages: 387-396, (Editors: Kurkova-Pohlova, V. , R. Neruda, J. Koutnik), Springer, Berlin, Germany, 18th International Conference on Artificial Neural Networks, September 2008 (inproceedings)

Abstract
We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than those obtained by policy gradient methods such as REINFORCE. For several complex control tasks, including robust standing with a humanoid robot, we show that our method outperforms well-known algorithms from the fields of policy gradients, finite difference methods and population based heuristics. We also provide a detailed analysis of the differences between our method and the other algorithms.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Distribution-free Learning of Bayesian Network Structure

Sun, X.

In ECML PKDD 2008, pages: 423-439, (Editors: Daelemans, W. , B. Goethals, K. Morik), Springer, Berlin, Germany, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, September 2008 (inproceedings)

Abstract
We present an independence-based method for learning Bayesian network (BN) structure without making any assumptions on the probability distribution of the domain. This is mainly useful for continuous domains. Even mixed continuous-categorical domains and structures containing vectorial variables can be handled. We address the problem by developing a non-parametric conditional independence test based on the so-called kernel dependence measure, which can be readily used by any existing independence-based BN structure learning algorithm. We demonstrate the structure learning of graphical models in continuous and mixed domains from real-world data without distributional assumptions. We also experimentally show that our test is a good alternative, in particular in case of small sample sizes, compared to existing tests, which can only be used in purely categorical or continuous domains.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Consistent Minimization of Clustering Objective Functions

von Luxburg, U., Bubeck, S., Jegelka, S., Kaufmann, M.

In Advances in neural information processing systems 20, pages: 961-968, (Editors: Platt, J. C., D. Koller, Y. Singer, S. Roweis), Curran, Red Hook, NY, USA, Twenty-First Annual Conference on Neural Information Processing Systems (NIPS), September 2008 (inproceedings)

Abstract
Clustering is often formulated as a discrete optimization problem. The objective is to find, among all partitions of the data set, the best one according to some quality measure. However, in the statistical setting where we assume that the finite data set has been sampled from some underlying space, the goal is not to find the best partition of the given sample, but to approximate the true partition of the underlying space. We argue that the discrete optimization approach usually does not achieve this goal. As an alternative, we suggest the paradigm of nearest neighbor clustering‘‘. Instead of selecting the best out of all partitions of the sample, it only considers partitions in some restricted function class. Using tools from statistical learning theory we prove that nearest neighbor clustering is statistically consistent. Moreover, its worst case complexity is polynomial by co nstructi on, and it can b e implem ented wi th small average case co mplexity using b ranch an d bound.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Enhancement of Kinesthetic Perception for Microsurgical Teleoperation

Son, HI., Lee, DY.

In KSME Conference on Bioengineering, pages: 259-260, KSME Bioengineering Conference, September 2008 (inproceedings)

ei

PDF [BibTex]

PDF [BibTex]


no image
A Single-shot Measurement of the Energy of Product States in a Translation Invariant Spin Chain Can Replace Any Quantum Computation

Janzing, D., Wocjan, P., Zhang, S.

New Journal of Physics, 10(093004):1-18, September 2008 (article)

Abstract
In measurement-based quantum computation, quantum algorithms are implemented via sequences of measurements. We describe a translationally invariant finite-range interaction on a one-dimensional qudit chain and prove that a single-shot measurement of the energy of an appropriate computational basis state with respect to this Hamiltonian provides the output of any quantum circuit. The required measurement accuracy scales inverse polynomially with the size of the simulated quantum circuit. This shows that the implementation of energy measurements on generic qudit chains is as hard as the realization of quantum computation. Here, a ‘measurement‘ is any procedure that samples from the spectral measurement induced by the observable and the state under consideration. As opposed to measurement-based quantum computation, the post-measurement state is irrelevant.

ei

PDF DOI [BibTex]


no image
A Kernel Statistical Test of Independence

Gretton, A., Fukumizu, K., Teo, C., Song, L., Schölkopf, B., Smola, A.

In Advances in neural information processing systems 20, pages: 585-592, (Editors: JC Platt and D Koller and Y Singer and S Roweis), Curran, Red Hook, NY, USA, 21st Annual Conference on Neural Information Processing Systems (NIPS), September 2008 (inproceedings)

Abstract
Whereas kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m^2), where m is the sample size. We demonstrate that this test outperforms established contingency table-based tests. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fitness Expectation Maximization

Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.

In PPSN 2008, pages: 337-346, (Editors: Rudolph, G. , T. Jansen, S. Lucas, C. Poloni, N. Beume), Springer, Berlin, Germany, 10th International Conference on Parallel Problem Solving From Nature, September 2008 (inproceedings)

Abstract
We present Fitness Expectation Maximization (FEM), a novel method for performing ‘black box’ function optimization. FEM searches the fitness landscape of an objective function using an instantiation of the well-known Expectation Maximization algorithm, producing search points to match the sample distribution weighted according to higher expected fitness. FEM updates both candidate solution parameters and the search policy, which is represented as a multinormal distribution. Inheriting EM’s stability and strong guarantees, the method is both elegant and competitive with some of the best heuristic search methods in the field, and performs well on a number of unimodal and multimodal benchmark tasks. To illustrate the potential practical applications of the approach, we also show experiments on finding the parameters for a controller of the challenging non-Markovian double pole balancing task.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Voluntary Brain Regulation and Communication with ECoG-Signals

Hinterberger, T., Widmann, G., Lal, T., Hill, J., Tangermann, M., Rosenstiel, W., Schölkopf, B., Elger, C., Birbaumer, N.

Epilepsy and Behavior, 13(2):300-306, August 2008 (article)

Abstract
Brain–computer interfaces (BCIs) can be used for communication in writing without muscular activity or for learning to control seizures by voluntary regulation of brain signals such as the electroencephalogram (EEG). Three of five patients with epilepsy were able to spell their names with electrocorticogram (ECoG) signals derived from motor-related areas within only one or two training sessions. Imagery of finger or tongue movements was classified with support-vector classification of autoregressive coefficients derived from the ECoG signals. After training of the classifier, binary classification responses were used to select letters from a computer-generated menu. Offline analysis showed increased theta activity in the unsuccessful patients, whereas the successful patients exhibited dominant sensorimotor rhythms that they could control. The high spatial resolution and increased signal-to-noise ratio in ECoG signals, combined with short training periods, may offer an alternative for communication in complete paralysis, locked-in syndrome, and motor restoration.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Partial Least Squares Regression for Graph Mining

Saigo, H., Krämer, N., Tsuda, K.

In KDD2008, pages: 578-586, (Editors: Li, Y. , B. Liu, S. Sarawagi), ACM Press, New York, NY, USA, 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2008 (inproceedings)

Abstract
Attributed graphs are increasingly more common in many application domains such as chemistry, biology and text processing. A central issue in graph mining is how to collect informative subgraph patterns for a given learning task. We propose an iterative mining method based on partial least squares regression (PLS). To apply PLS to graph data, a sparse version of PLS is developed first and then it is combined with a weighted pattern mining algorithm. The mining algorithm is iteratively called with different weight vectors, creating one latent component per one mining call. Our method, graph PLS, is efficient and easy to implement, because the weight vector is updated with elementary matrix calculations. In experiments, our graph PLS algorithm showed competitive prediction accuracies in many chemical datasets and its efficiency was significantly superior to graph boosting (gboost) and the naive method based on frequent graph mining.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Enhancement of Kinesthetic Perception for Microsurgical Teleoperation using Impedance-Shaping

Son, HI., Lee, DY.

In International Conference of the IEEE Engineering in Medicine and Biology Society, pages: 1939-1942, IEEE, Piscataway, NJ, USA, 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), August 2008 (inproceedings)

Abstract
A new control scheme is developed in this paper for a bilateral teleoperation system for microsurgical applications. The main objective of the proposed control scheme is to enhance the kinesthetic perception of the operator. First, the kinesthetic perception, based on psychophysics, is classified into three metrics of detection, sensitivity of detection, and discrimination. Additionally, a new performance index is introduced as a combination of these three metrics to quantify the kinesthetic performance. Second, modified macro-micro bilateral control system using an impedance-shaping method is proposed. The proposed controller can increase kinesthetic perception by shaping and magnifying the transmitted impedance to the operator. Finally, the performance of the proposed controller is verified in a comparison with the two-channel position-position (PP) controller, the two-channel force-position (FP) controller, and the four-channel transparency- optimized controller.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis

Blaschko, M., Lampert, C., Gretton, A.

In ECML PKDD 2008, pages: 133-145, (Editors: Daelemans, W. , B. Goethals, K. Morik), Springer, Berlin, Germany, 19th European Conference on Machine Learning, August 2008 (inproceedings)

Abstract
Kernel canonical correlation analysis (KCCA) is a dimensionality reduction technique for paired data. By finding directions that maximize correlation, KCCA learns representations that are more closely tied to the underlying semantics of the data rather than noise. However, meaningful directions are not only those that have high correlation to another modality, but also those that capture the manifold structure of the data. We propose a method that is simultaneously able to find highly correlated directions that are also located on high variance directions along the data manifold. This is achieved by the use of semi-supervised Laplacian regularization of KCCA. We show experimentally that Laplacian regularized training improves class separation over KCCA with only Tikhonov regularization, while causing no degradation in the correlation between modalities. We propose a model selection criterion based on the Hilbert-Schmidt norm of the semi-supervised Laplacian regularized cross-covariance operator, which we compute in closed form.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
RKHS Representation of Measures Applied to Homogeneity, Independence, and Fourier Optics

Schölkopf, B., Sriperumbudur, B., Gretton, A., Fukumizu, K.

In OWR 2008, pages: 42-44, (Editors: K Jetter and S Smale and D-X Zhou), Mathematisches Forschungsinstitut, Oberwolfach-Walke, Germany, 30. Oberwolfach Report, August 2008 (inproceedings)

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Predicting phenotypic effects of gene perturbations in C. elegans using an integrated network model

Borgwardt, K.

BioEssays, 30(8):707–710, August 2008 (article)

Abstract
Predicting the phenotype of an organism from its genotype is a central question in genetics. Most importantly, we would like to find out if the perturbation of a single gene may be the cause of a disease. However, our current ability to predict the phenotypic effects of perturbations of individual genes is limited. Network models of genes are one tool for tackling this problem. In a recent study, (Lee et al.) it has been shown that network models covering the majority of genes of an organism can be used for accurately predicting phenotypic effects of gene perturbations in multicellular organisms.

ei

Web DOI [BibTex]


no image
Learning Robot Dynamics for Computed Torque Control Using Local Gaussian Processes Regression

Nguyen-Tuong, D., Peters, J.

In LAB-RS 2008, pages: 59-64, (Editors: Stoica, A. , E. Tunstel, T. Huntsberger, T. Arslan, S. Vijayakumar, A. O. El-Rayis), IEEE Computer Society, Los Alamitos, CA, USA, 2008 ECSIS Symposium on Learning and Adaptive Behaviors for Robotic Systems, August 2008 (inproceedings)

Abstract
Accurate models of the robot dynamics allow the design of significantly more precise, energy-efficient and more compliant computed torque control for robots. However, in some cases the accuracy of rigid-body models does not suffice for sound control performance due to unmodeled nonlinearities such as hydraulic cables, complex friction, or actuator dynamics. In such cases, learning the models from data poses an interesting alternative and estimating the dynamics model using regression techniques becomes an important problem. However, the most accurate regression methods, e.g. Gaussian processes regression (GPR) and support vector regression (SVR), suffer from exceptional high computational complexity which prevents their usage for large numbers of samples or online learning to date. We proposed an approximation to the standard GPR using local Gaussian processes models. Due to reduced computational cost, local Gaussian processes (LGP) is capable for an online learning. Comparisons with other nonparametric regre ssions, e.g. standard GPR, SVR and locally weighted projection regression (LWPR), show that LGP has higher accuracy than LWPR and close to the performance of standard GPR and SVR while being sufficiently fast for online learning.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Multi-class Common Spatial Pattern and Information Theoretic Feature Extraction

Grosse-Wentrup, M., Buss, M.

IEEE Transactions on Biomedical Engineering, 55(8):1991-2000, August 2008 (article)

Abstract
We address two shortcomings of the common spatial patterns (CSP) algorithm for spatial filtering in the context of brain--computer interfaces (BCIs) based on electroencephalography/magnetoencephalography (EEG/MEG): First, the question of optimality of CSP in terms of the minimal achievable classification error remains unsolved. Second, CSP has been initially proposed for two-class paradigms. Extensions to multiclass paradigms have been suggested, but are based on heuristics. We address these shortcomings in the framework of information theoretic feature extraction (ITFE). We show that for two-class paradigms, CSP maximizes an approximation of mutual information of extracted EEG/MEG components and class labels. This establishes a link between CSP and the minimal classification error. For multiclass paradigms, we point out that CSP by joint approximate diagonalization (JAD) is equivalent to independent component analysis (ICA), and provide a method to choose those independent components (ICs) that approximately maximize mutual information of ICs and class labels. This eliminates the need for heuristics in multiclass CSP, and allows incorporating prior class probabilities. The proposed method is applied to the dataset IIIa of the third BCI competition, and is shown to increase the mean classification accuracy by 23.4% in comparison to multiclass CSP.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Haptic Control with Environment Estimation for Telesurgery

Bhattacharjee, T., Son, HI., Lee, DY.

In International Conference of the IEEE Engineering in Medicine and Biology Society, pages: 3241-3244, IEEE, Piscataway, NJ, USA, 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), August 2008 (inproceedings)

Abstract
Success of telesurgical operations depends on better position tracking ability of the slave device. Improved position tracking of the slave device can lead to safer and less strenuous telesurgical operations. The two-channel force-position control architecture is widely used for better position tracking ability. This architecture requires force sensors for direct force feedback. Force sensors may not be a good choice in the telesurgical environment because of the inherent noise, and limitation in the deployable place and space. Hence, environment force estimation is developed using the concept of the robot function parameter matrix and a recursive least squares method. Simulation results show efficacy of the proposed method. The slave device successfully tracks the position of the master device, and the estimation error quickly becomes negligible.

ei

Web DOI [BibTex]

Web DOI [BibTex]


Thumb xl jnm
A non-parametric Bayesian alternative to spike sorting

Wood, F., Black, M. J.

J. Neuroscience Methods, 173(1):1–12, August 2008 (article)

Abstract
The analysis of extra-cellular neural recordings typically begins with careful spike sorting and all analysis of the data then rests on the correctness of the resulting spike trains. In many situations this is unproblematic as experimental and spike sorting procedures often focus on well isolated units. There is evidence in the literature, however, that errors in spike sorting can occur even with carefully collected and selected data. Additionally, chronically implanted electrodes and arrays with fixed electrodes cannot be easily adjusted to provide well isolated units. In these situations, multiple units may be recorded and the assignment of waveforms to units may be ambiguous. At the same time, analysis of such data may be both scientifically important and clinically relevant. In this paper we address this issue using a novel probabilistic model that accounts for several important sources of uncertainty and error in spike sorting. In lieu of sorting neural data to produce a single best spike train, we estimate a probabilistic model of spike trains given the observed data. We show how such a distribution over spike sortings can support standard neuroscientific questions while providing a representation of uncertainty in the analysis. As a representative illustration of the approach, we analyzed primary motor cortical tuning with respect to hand movement in data recorded with a chronic multi-electrode array in non-human primates.We found that the probabilistic analysis generally agrees with human sorters but suggests the presence of tuned units not detected by humans.

ps

pdf preprint pdf from publisher PubMed [BibTex]

pdf preprint pdf from publisher PubMed [BibTex]


no image
A Novel Protocol for Accuracy Assessment in Classification of Very High Resolution Multispectral and SAR Images

Bruzzone, L., Persello, C.

In pages: II-265-II-268 , IEEE, Piscataway, NJ, USA, IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2008 (inproceedings)

Abstract
This paper presents a novel protocol for the accuracy assessment of thematic maps obtained by the classification of very high resolution images. As the thematic accuracy alone is not sufficient to adequately characterize the geometrical properties of classification maps, we propose a novel protocol that is based on the analysis of two families of indexes: (i) the traditional thematic accuracy indexes, and (ii) a set of geometric indexes that characterize different geometric properties of the objects recognized in the map. These indexes can be used in the training phase of a classifier for identifying the parameters values that optimize classification results on the basis of a multi-objective criterion. Experimental results obtained on Quickbird images show the effectiveness of the proposed protocol in selecting classification maps characterized by better tradeoff between thematic and geometric accuracy with respect to standard accuracy measures.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Decoupled Approach to Exemplar-based Unsupervised Learning

Nowozin, S., BakIr, G.

In ICML 2008, pages: 704-711, (Editors: Cohen, W. W., A. McCallum, S. Roweis), ACM Press, New York, NY, USA, 25th International Conference on Machine Learning, July 2008 (inproceedings)

Abstract
A recent trend in exemplar based unsupervised learning is to formulate the learning problem as a convex optimization problem. Convexity is achieved by restricting the set of possible prototypes to training exemplars. In particular, this has been done for clustering, vector quantization and mixture model density estimation. In this paper we propose a novel algorithm that is theoretically and practically superior to these convex formulations. This is possible by posing the unsupervised learning problem as a single convex master problem" with non-convex subproblems. We show that for the above learning tasks the subproblems are extremely wellbehaved and can be solved efficiently.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Novel Approach to the Selection of Robust and Invariant Features for Classification of Hyperspectral Images

Bruzzone, L., Persello, C.

In pages: I-66-I-69 , IEEE, Piscataway, NJ, USA, IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2008 (inproceedings)

Abstract
This paper presents a novel approach to feature selection for the classification of hyperspectral images. The proposed approach aims at selecting a subset of the original set of features that exhibits two main properties:( i) high capability to discriminate among the considered classes, (ii) high invariance (stationarity) in the spatial domain of the investigated scene. The feature selection is accomplished by defining a multi-objective criterion that considers two terms: (i) a term that assesses the class separability, (ii) a term that evaluates the spatial invariance of the selected features. The multi-objective problem is solved by an evolutionary algorithm that estimates the Pareto-optimal solutions. Experiments carried out on a hyperspectral image acquired by the Hyperion sensor confirmed the effectiveness of the proposed technique.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
The skew spectrum of graphs

Kondor, R., Borgwardt, K.

In pages: 496-503, (Editors: Cohen, W.W. , A. McCallum, S.T. Roweis), ACM Press, New York, NY, USA, Twenty-Fifth International Conference on Machine Learning (ICML), July 2008 (inproceedings)

Abstract
The central issue in representing graph-structured data instances in learning algorithms is designing features which are invariant to permuting the numbering of the vertices. We present a new system of invariant graph features which we call the skew spectrum of graphs. The skew spectrum is based on mapping the adjacency matrix of any (weigted, directed, unlabeled) graph to a function on the symmetric group and computing bispectral invariants. The reduced form of the skew spectrum is computable in O(n3) time, and experiments show that on several benchmark datasets it can outperform state of the art graph kernels.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Relating clustering stability to properties of cluster boundaries

Ben-David, S., von Luxburg, U.

In COLT 2008, pages: 379-390, (Editors: Servedio, R. A., T. Zhang), Omnipress, Madison, WI, USA, 21st Annual Conference on Learning Theory, July 2008 (inproceedings)

Abstract
In this paper, we investigate stability-based methods for cluster model selection, in particular to select the number K of clusters. The scenario under consideration is that clustering is performed by minimizing a certain clustering quality function, and that a unique global minimizer exists. On the one hand we show that stability can be upper bounded by certain properties of the optimal clustering, namely by the mass in a small tube around the cluster boundaries. On the other hand, we provide counterexamples which show that a reverse statement is not true in general. Finally, we give some examples and arguments why, from a theoretic point of view, using clustering stability in a high sample setting can be problematic. It can be seen that distribution-free guarantees bounding the difference between the finite sample stability and the “true stability” cannot exist, unless one makes strong assumptions on the underlying distribution.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
At-TAX: A Whole Genome Tiling Array Resource for Developmental Expression Analysis and Transcript Identification in Arabidopsis thaliana

Laubinger, S., Zeller, G., Henz, S., Sachsenberg, T., Widmer, C., Naouar, N., Vuylsteke, M., Schölkopf, B., Rätsch, G., Weigel, D.

Genome Biology, 9(7: R112):1-16, July 2008 (article)

Abstract
Gene expression maps for model organisms, including Arabidopsis thaliana, have typically been created using gene-centric expression arrays. Here, we describe a comprehensive expression atlas, Arabidopsis thaliana Tiling Array Express (At-TAX), which is based on whole-genome tiling arrays. We demonstrate that tiling arrays are accurate tools for gene expression analysis and identified more than 1,000 unannotated transcribed regions. Visualizations of gene expression estimates, transcribed regions, and tiling probe measurements are accessible online at the At-TAX homepage.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Compressed Sensing and Bayesian Experimental Design

Seeger, M., Nickisch, H.

In ICML 2008, pages: 912-919, (Editors: Cohen, W. W., A. McCallum, S. Roweis), ACM Press, New York, NY, USA, 25th International Conference on Machine Learning, July 2008 (inproceedings)

Abstract
We relate compressed sensing (CS) with Bayesian experimental design and provide a novel efficient approximate method for the latter, based on expectation propagation. In a large comparative study about linearly measuring natural images, we show that the simple standard heuristic of measuring wavelet coefficients top-down systematically outperforms CS methods using random measurements; the sequential projection optimisation approach of (Ji & Carin, 2007) performs even worse. We also show that our own approximate Bayesian method is able to learn measurement filters on full images efficiently which ouperform the wavelet heuristic. To our knowledge, ours is the first successful attempt at "learning compressed sensing" for images of realistic size. In contrast to common CS methods, our framework is not restricted to sparse signals, but can readily be applied to other notions of signal complexity or noise models. We give concrete ideas how our method can be scaled up to large signal representations.

ei

PDF PDF Web DOI [BibTex]

PDF PDF Web DOI [BibTex]


no image
Tailoring density estimation via reproducing kernel moment matching

Song, L., Zhang, X., Smola, A., Gretton, A., Schölkopf, B.

In Proceedings of the 25th International Conference onMachine Learning, pages: 992-999, (Editors: WW Cohen and A McCallum and S Roweis), ACM Press, New York, NY, USA, ICML, July 2008 (inproceedings)

Abstract
Moment matching is a popular means of parametric density estimation. We extend this technique to nonparametric estimation of mixture models. Our approach works by embedding distributions into a reproducing kernel Hilbert space, and performing moment matching in that space. This allows us to tailor density estimators to a function class of interest (i.e., for which we would like to compute expectations). We show our density estimation approach is useful in applications such as message compression in graphical models, and image classification and retrieval.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Injective Hilbert Space Embeddings of Probability Measures

Sriperumbudur, B., Gretton, A., Fukumizu, K., Lanckriet, G., Schölkopf, B.

In Proceedings of the 21st Annual Conference on Learning Theory, pages: 111-122, (Editors: RA Servedio and T Zhang), Omnipress, Madison, WI, USA, 21st Annual Conference on Learning Theory (COLT), July 2008 (inproceedings)

Abstract
A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). The embedding function has been proven to be injective when the reproducing kernel is universal. In this case, the embedding induces a metric on the space of probability distributions defined on compact metric spaces. In the present work, we consider more broadly the problem of specifying characteristic kernels, defined as kernels for which the RKHS embedding of probability measures is injective. In particular, characteristic kernels can include non-universal kernels. We restrict ourselves to translation-invariant kernels on Euclidean space, and define the associated metric on probability measures in terms of the Fourier spectrum of the kernel and characteristic functions of these measures. The support of the kernel spectrum is important in finding whether a kernel is characteristic: in particular, the embedding is injective if and only if the kernel spectrum has the entire domain as its support. Characteristic kernels may nonetheless have difficulty in distinguishing certain distributions on the basis of finite samples, again due to the interaction of the kernel spectrum and the characteristic functions of the measures.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
A Hilbert-Schmidt Dependence Maximization Approach to Unsupervised Structure Discovery

Blaschko, M., Gretton, A.

In MLG 2008, pages: 1-3, 6th International Workshop on Mining and Learning with Graphs, July 2008 (inproceedings)

Abstract
In recent work by (Song et al., 2007), it has been proposed to perform clustering by maximizing a Hilbert-Schmidt independence criterion with respect to a predefined cluster structure Y , by solving for the partition matrix, II. We extend this approach here to the case where the cluster structure Y is not fixed, but is a quantity to be optimized; and we use an independence criterion which has been shown to be more sensitive at small sample sizes (the Hilbert-Schmidt Normalized Information Criterion, or HSNIC, Fukumizu et al., 2008). We demonstrate the use of this framework in two scenarios. In the first, we adopt a cluster structure selection approach in which the HSNIC is used to select a structure from several candidates. In the second, we consider the case where we discover structure by directly optimizing Y.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Adaptive Importance Sampling with Automatic Model Selection in Value Function Approximation

Hachiya, H., Akiyama, T., Sugiyama, M., Peters, J.

In AAAI 2008, pages: 1351-1356, (Editors: Fox, D. , C. P. Gomes), AAAI Press, Menlo Park, CA, USA, Twenty-Third Conference on Artificial Intelligence, July 2008 (inproceedings)

Abstract
Off-policy reinforcement learning is aimed at efficiently reusing data samples gathered in the past, which is an essential problem for physically grounded AI as experiments are usually prohibitively expensive. A common approach is to use importance sampling techniques for compensating for the bias caused by the difference between data-sampling policies and the target policy. However, existing off-policy methods do not often take the variance of value function estimators explicitly into account and therefore their performance tends to be unstable. To cope with this problem, we propose using an adaptive importance sampling technique which allows us to actively control the trade-off between bias and variance. We further provide a method for optimally determining the trade-off parameter based on a variant of cross-validation. We demonstrate the usefulness of the proposed approach through simulations.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Sparse Multiscale Gaussian Process Regression

Walder, C., Kim, K., Schölkopf, B.

In Proceedings of the 25th International Conference on Machine Learning, pages: 1112-1119, (Editors: WW Cohen and A McCallum and S Roweis), ACM Press, New York, NY, USA, ICML, July 2008 (inproceedings)

Abstract
Most existing sparse Gaussian process (g.p.) models seek computational advantages by basing their computations on a set of m basis functions that are the covariance function of the g.p. with one of its two inputs fixed. We generalise this for the case of Gaussian covariance function, by basing our computations on m Gaussian basis functions with arbitrary diagonal covariance matrices (or length scales). For a fixed number of basis functions and any given criteria, this additional flexibility permits approximations no worse and typically better than was previously possible. We perform gradient based optimisation of the marginal likelihood, which costs O(m2n) time where n is the number of data points, and compare the method to various other sparse g.p. methods. Although we focus on g.p. regression, the central idea is applicable to all kernel based algorithms, and we also provide some results for the support vector machine (s.v.m.) and kernel ridge regression (k.r.r.). Our approach outperforms the other methods, particularly for the case of very few basis functions, i.e. a very high sparsity ratio.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Graphical Analysis of NMR Structural Quality and Interactive Contact Map of NOE Assignments in ARIA

Bardiaux, B., Bernard, A., Rieping, W., Habeck, M., Malliavin, T., Nilges, M.

BMC Structural Biology, 8(30):1-5, June 2008 (article)

Abstract
BACKGROUND: The Ambiguous Restraints for Iterative Assignment (ARIA) approach is widely used for NMR structure determination. It is based on simultaneously calculating structures and assigning NOE through an iterative protocol. The final solution consists of a set of conformers and a list of most probable assignments for the input NOE peak list. RESULTS: ARIA was extended with a series of graphical tools to facilitate a detailed analysis of the intermediate and final results of the ARIA protocol. These additional features provide (i) an interactive contact map, serving as a tool for the analysis of assignments, and (ii) graphical representations of structure quality scores and restraint statistics. The interactive contact map between residues can be clicked to obtain information about the restraints and their contributions. Profiles of quality scores are plotted along the protein sequence, and contact maps provide information of the agreement with the data on a residue pair level. CONCLUSIONS: The g raphical tools and outputs described here significantly extend the validation and analysis possibilities of NOE assignments given by ARIA as well as the analysis of the quality of the final structure ensemble. These tools are included in the latest version of ARIA, which is available at http://aria.pasteur.fr. The Web site also contains an installation guide, a user manual and example calculations.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Example-Based Learning for Single-Image Super-Resolution

Kim, K., Kwon, Y.

In DAGM 2008, pages: 456-463, (Editors: Rigoll, G. ), Springer, Berlin, Germany, 30th Annual Symposium of the German Association for Pattern Recognition, June 2008 (inproceedings)

Abstract
This paper proposes a regression-based method for single-image super-resolution. Kernel ridge regression (KRR) is used to estimate the high-frequency details of the underlying high-resolution image. A sparse solution of KRR is found by combining the ideas of kernel matching pursuit and gradient descent, which allows time-complexity to be kept to a moderate level. To resolve the problem of ringing artifacts occurring due to the regularization effect, the regression results are post-processed using a prior model of a generic image class. Experimental results demonstrate the effectiveness of the proposed method.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
A Multiple Kernel Learning Approach to Joint Multi-Class Object Detection

Lampert, C., Blaschko, M.

In DAGM 2008, pages: 31-40, (Editors: Rigoll, G. ), Springer, Berlin, Germany, 30th Annual Symposium of the German Association for Pattern Recognition, June 2008, Main Award DAGM 2008 (inproceedings)

Abstract
Most current methods for multi-class object classification and localization work as independent 1-vs-rest classifiers. They decide whether and where an object is visible in an image purely on a per-class basis. Joint learning of more than one object class would generally be preferable, since this would allow the use of contextual information such as co-occurrence between classes. However, this approach is usually not employed because of its computational cost. In this paper we propose a method to combine the efficiency of single class localization with a subsequent decision process that works jointly for all given object classes. By following a multiple kernel learning (MKL) approach, we automatically obtain a sparse dependency graph of relevant object classes on which to base the decision. Experiments on the PASCAL VOC 2006 and 2007 datasets show that the subsequent joint decision step clearly improves the accuracy compared to single class detection.

ei

PDF ZIP Web DOI [BibTex]

PDF ZIP Web DOI [BibTex]


no image
Natural Evolution Strategies

Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.

In CEC 2008, pages: 3381-3387, IEEE, Piscataway, NJ, USA, IEEE Congress on Evolutionary Computation, June 2008 (inproceedings)

Abstract
This paper presents natural evolution strategies (NES), a novel algorithm for performing real-valued dasiablack boxpsila function optimization: optimizing an unknown objective function where algorithm-selected function measurements constitute the only information accessible to the method. Natural evolution strategies search the fitness landscape using a multivariate normal distribution with a self-adapting mutation matrix to generate correlated mutations in promising regions. NES shares this property with covariance matrix adaption (CMA), an evolution strategy (ES) which has been shown to perform well on a variety of high-precision optimization tasks. The natural evolution strategies algorithm, however, is simpler, less ad-hoc and more principled. Self-adaptation of the mutation matrix is derived using a Monte Carlo estimate of the natural gradient towards better expected fitness. By following the natural gradient instead of the dasiavanillapsila gradient, we can ensure efficient update steps while preventing early convergence due to overly greedy updates, resulting in reduced sensitivity to local suboptima. We show NES has competitive performance with CMA on unimodal tasks, while outperforming it on several multimodal tasks that are rich in deceptive local optima.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Operational Space Control: A Theoretical and Empirical Comparison

Nakanishi, J., Cory, R., Mistry, M., Peters, J., Schaal, S.

International Journal of Robotics Research, 27(6):737-757, June 2008 (article)

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Kernel Methods in Machine Learning

Hofmann, T., Schölkopf, B., Smola, A.

Annals of Statistics, 36(3):1171-1220, June 2008 (article)

Abstract
We review machine learning methods employing positive definite kernels. These methods formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of function has the benefit of facilitating the construction and analysis of learning algorithms while at the same time allowing large classes of functions. The latter include nonlinear functions as well as functions defined on nonvectorial data.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Cross-validation Optimization for Large Scale Structured Classification Kernel Methods

Seeger, M.

Journal of Machine Learning Research, 9, pages: 1147-1178, June 2008 (article)

Abstract
We propose a highly efficient framework for penalized likelihood kernel methods applied to multi-class models with a large, structured set of classes. As opposed to many previous approaches which try to decompose the fitting problem into many smaller ones, we focus on a Newton optimization of the complete model, making use of model structure and linear conjugate gradients in order to approximate Newton search directions. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels, and focusing code optimization efforts to these primitives only. Kernel parameters are learned automatically, by maximizing the cross-validation log likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate our approach on large scale text classification tasks with hierarchical structure on thousands of classes, achieving state-of-the-art results in an order of magnitude less time than previous work.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]