Header logo is


2012


no image
Investigating the Neural Basis of Brain-Computer Interface (BCI)-based Stroke Rehabilitation

Meyer, T., Peters, J., Zander, T., Brötz, D., Soekadar, S., Schölkopf, B., Grosse-Wentrup, M.

In International Conference on NeuroRehabilitation (ICNR) , pages: 617-621, (Editors: JL Pons, D Torricelli, and M Pajaro), Springer, Berlin, Germany, ICNR, 2012 (inproceedings)

ei

PDF [BibTex]

2012


PDF [BibTex]


no image
A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

Ortega, P., Grau-Moya, J., Genewein, T., Balduzzi, D., Braun, D.

In Advances in Neural Information Processing Systems 25, pages: 3014-3022, (Editors: P Bartlett and FCN Pereira and CJC. Burges and L Bottou and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)

ei

PDF [BibTex]

PDF [BibTex]


no image
Algorithms for Learning Markov Field Policies

Boularias, A., Kroemer, O., Peters, J.

In Advances in Neural Information Processing Systems 25, pages: 2186-2194, (Editors: P Bartlett and FCN Pereira and CJC. Burges and L Bottou and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)

ei

PDF [BibTex]

PDF [BibTex]


no image
Semi-Supervised Domain Adaptation with Copulas

Lopez-Paz, D., Hernandez-Lobato, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 25, pages: 674-682, (Editors: P Bartlett, FCN Pereira, CJC. Burges, L Bottou, and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)

ei

PDF [BibTex]

PDF [BibTex]


no image
Gradient Weights help Nonparametric Regressors

Kpotufe, S., Boularias, A.

In Advances in Neural Information Processing Systems 25, pages: 2870-2878, (Editors: P Bartlett and FCN Pereira and CJC. Burges and L Bottou and KQ Weinberger), 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)

ei

PDF [BibTex]

PDF [BibTex]


no image
A Blind Deconvolution Approach for Pseudo CT Prediction from MR Image Pairs

Hirsch, M., Hofmann, M., Mantlik, F., Pichler, B., Schölkopf, B., Habeck, M.

In 19th IEEE International Conference on Image Processing (ICIP) , pages: 2953 -2956, IEEE, ICIP, 2012 (inproceedings)

ei

DOI [BibTex]

DOI [BibTex]


no image
A mixed model approach for joint genetic analysis of alternatively spliced transcript isoforms using RNA-Seq data

Rakitsch, B., Lippert, C., Topa, H., Borgwardt, KM., Honkela, A., Stegle, O.

In 2012 (inproceedings) Submitted

ei

Web [BibTex]

Web [BibTex]


no image
Evaluation of marginal likelihoods via the density of states

Habeck, M.

In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2012) , 22, pages: 486-494, (Editors: N Lawrence and M Girolami), JMLR: W&CP 22, AISTATS, 2012 (inproceedings)

Abstract
Bayesian model comparison involves the evaluation of the marginal likelihood, the expectation of the likelihood under the prior distribution. Typically, this high-dimensional integral over all model parameters is approximated using Markov chain Monte Carlo methods. Thermodynamic integration is a popular method to estimate the marginal likelihood by using samples from annealed posteriors. Here we show that there exists a robust and flexible alternative. The new method estimates the density of states, which counts the number of states associated with a particular value of the likelihood. If the density of states is known, computation of the marginal likelihood reduces to a one- dimensional integral. We outline a maximum likelihood procedure to estimate the density of states from annealed posterior samples. We apply our method to various likelihoods and show that it is superior to thermodynamic integration in that it is more flexible with regard to the annealing schedule and the family of bridging distributions. Finally, we discuss the relation of our method with Skilling's nested sampling.

ei

PDF [BibTex]

PDF [BibTex]


no image
Distributed multisensory signals acquisition and analysis in dyadic interactions

Tawari, A., Tran, C., Doshi, A., Zander, TO.

In Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems Extended Abstracts, pages: 2261-2266, (Editors: JA Konstan and EH Chi and K Höök), ACM, New York, NY, USA, CHI, 2012 (inproceedings)

ei

DOI [BibTex]

DOI [BibTex]


no image
Measuring Cognitive Load by means of EEG-data - how detailed is the picture we can get?

Scharinger, C., Cierniak, G., Walter, C., Zander, TO., Gerjets, P.

In Meeting of the EARLI SIG 22 Neuroscience and Education, 2012 (inproceedings)

ei

[BibTex]

[BibTex]


no image
Optimal kernel choice for large-scale two-sample tests

Gretton, A., Sriperumbudur, B., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M., Fukumizu, K.

In Advances in Neural Information Processing Systems 25, pages: 1214-1222, (Editors: P Bartlett and FCN Pereira and CJC. Burges and L Bottou and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)

ei

PDF [BibTex]

PDF [BibTex]


no image
Encoding of Periodic and their Transient Motions by a Single Dynamic Movement Primitive

Ernesti, J., Righetti, L., Do, M., Asfour, T., Schaal, S.

In 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages: 57-64, IEEE, Osaka, Japan, November 2012 (inproceedings)

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Adaptive Coding of Actions and Observations

Ortega, PA, Braun, DA

pages: 1-4, NIPS Workshop on Information in Perception and Action, December 2012 (conference)

Abstract
The application of expected utility theory to construct adaptive agents is both computationally intractable and statistically questionable. To overcome these difficulties, agents need the ability to delay the choice of the optimal policy to a later stage when they have learned more about the environment. How should agents do this optimally? An information-theoretic answer to this question is given by the Bayesian control rule—the solution to the adaptive coding problem when there are not only observations but also actions. This paper reviews the central ideas behind the Bayesian control rule.

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Learning Force Control Policies for Compliant Robotic Manipulation

Kalakrishnan, M., Righetti, L., Pastor, P., Schaal, S.

In ICML’12 Proceedings of the 29th International Coference on International Conference on Machine Learning, pages: 49-50, Edinburgh, Scotland, 2012 (inproceedings)

am mg

[BibTex]

[BibTex]


no image
Free Energy and the Generalized Optimality Equations for Sequential Decision Making

Ortega, PA, Braun, DA

pages: 1-10, 10th European Workshop on Reinforcement Learning (EWRL), July 2012 (conference)

Abstract
The free energy functional has recently been proposed as a variational principle for bounded rational decision-making, since it instantiates a natural trade-off between utility gains and information processing costs that can be axiomatically derived. Here we apply the free energy principle to general decision trees that include both adversarial and stochastic environments. We derive generalized sequential optimality equations that not only include the Bellman optimality equations as a limit case, but also lead to well-known decision-rules such as Expectimax, Minimax and Expectiminimax. We show how these decision-rules can be derived from a single free energy principle that assigns a resource parameter to each node in the decision tree. These resource parameters express a concrete computational cost that can be measured as the amount of samples that are needed from the distribution that belongs to each node. The free energy principle therefore provides the normative basis for generalized optimality equations that account for both adversarial and stochastic environments.

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Quadratic programming for inverse dynamics with optimal distribution of contact forces

Righetti, L., Schaal, S.

In 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages: 538-543, IEEE, Osaka, Japan, November 2012 (inproceedings)

Abstract
In this contribution we propose an inverse dynamics controller for a humanoid robot that exploits torque redundancy to minimize any combination of linear and quadratic costs in the contact forces and the commands. In addition the controller satisfies linear equality and inequality constraints in the contact forces and the commands such as torque limits, unilateral contacts or friction cones limits. The originality of our approach resides in the formulation of the problem as a quadratic program where we only need to solve for the control commands and where the contact forces are optimized implicitly. Furthermore, we do not need a structured representation of the dynamics of the robot (i.e. an explicit computation of the inertia matrix). It is in contrast with existing methods based on quadratic programs. The controller is then robust to uncertainty in the estimation of the dynamics model and the optimization is fast enough to be implemented in high bandwidth torque control loops that are increasingly available on humanoid platforms. We demonstrate properties of our controller with simulations of a human size humanoid robot.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Towards Associative Skill Memories

Pastor, P., Kalakrishnan, M., Righetti, L., Schaal, S.

In 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages: 309-315, IEEE, Osaka, Japan, November 2012 (inproceedings)

Abstract
Movement primitives as basis of movement planning and control have become a popular topic in recent years. The key idea of movement primitives is that a rather small set of stereotypical movements should suffice to create a large set of complex manipulation skills. An interesting side effect of stereotypical movement is that it also creates stereotypical sensory events, e.g., in terms of kinesthetic variables, haptic variables, or, if processed appropriately, visual variables. Thus, a movement primitive executed towards a particular object in the environment will associate a large number of sensory variables that are typical for this manipulation skill. These association can be used to increase robustness towards perturbations, and they also allow failure detection and switching towards other behaviors. We call such movement primitives augmented with sensory associations Associative Skill Memories (ASM). This paper addresses how ASMs can be acquired by imitation learning and how they can create robust manipulation skill by determining subsequent ASMs online to achieve a particular manipulation goal. Evaluation for grasping and manipulation with a Barrett WAM/Hand illustrate our approach.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Template-based learning of grasp selection

Herzog, A., Pastor, P., Kalakrishnan, M., Righetti, L., Asfour, T., Schaal, S.

In 2012 IEEE International Conference on Robotics and Automation, pages: 2379-2384, IEEE, Saint Paul, USA, 2012 (inproceedings)

Abstract
The ability to grasp unknown objects is an important skill for personal robots, which has been addressed by many present and past research projects, but still remains an open problem. A crucial aspect of grasping is choosing an appropriate grasp configuration, i.e. the 6d pose of the hand relative to the object and its finger configuration. Finding feasible grasp configurations for novel objects, however, is challenging because of the huge variety in shape and size of these objects. Moreover, possible configurations also depend on the specific kinematics of the robotic arm and hand in use. In this paper, we introduce a new grasp selection algorithm able to find object grasp poses based on previously demonstrated grasps. Assuming that objects with similar shapes can be grasped in a similar way, we associate to each demonstrated grasp a grasp template. The template is a local shape descriptor for a possible grasp pose and is constructed using 3d information from depth sensors. For each new object to grasp, the algorithm then finds the best grasp candidate in the library of templates. The grasp selection is also able to improve over time using the information of previous grasp attempts to adapt the ranking of the templates. We tested the algorithm on two different platforms, the Willow Garage PR2 and the Barrett WAM arm which have very different hands. Our results show that the algorithm is able to find good grasp configurations for a large set of objects from a relatively small set of demonstrations, and does indeed improve its performance over time.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Probabilistic depth image registration incorporating nonvisual information

Wüthrich, M., Pastor, P., Righetti, L., Billard, A., Schaal, S.

In 2012 IEEE International Conference on Robotics and Automation, pages: 3637-3644, IEEE, Saint Paul, USA, 2012 (inproceedings)

Abstract
In this paper, we derive a probabilistic registration algorithm for object modeling and tracking. In many robotics applications, such as manipulation tasks, nonvisual information about the movement of the object is available, which we will combine with the visual information. Furthermore we do not only consider observations of the object, but we also take space into account which has been observed to not be part of the object. Furthermore we are computing a posterior distribution over the relative alignment and not a point estimate as typically done in for example Iterative Closest Point (ICP). To our knowledge no existing algorithm meets these three conditions and we thus derive a novel registration algorithm in a Bayesian framework. Experimental results suggest that the proposed methods perform favorably in comparison to PCL [1] implementations of feature mapping and ICP, especially if nonvisual information is available.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2006


no image
Conformal Multi-Instance Kernels

Blaschko, M., Hofmann, T.

In NIPS 2006 Workshop on Learning to Compare Examples, pages: 1-6, NIPS Workshop on Learning to Compare Examples, December 2006 (inproceedings)

Abstract
In the multiple instance learning setting, each observation is a bag of feature vectors of which one or more vectors indicates membership in a class. The primary task is to identify if any vectors in the bag indicate class membership while ignoring vectors that do not. We describe here a kernel-based technique that defines a parametric family of kernels via conformal transformations and jointly learns a discriminant function over bags together with the optimal parameter settings of the kernel. Learning a conformal transformation effectively amounts to weighting regions in the feature space according to their contribution to classification accuracy; regions that are discriminative will be weighted higher than regions that are not. This allows the classifier to focus on regions contributing to classification accuracy while ignoring regions that correspond to vectors found both in positive and in negative bags. We show how parameters of this transformation can be learned for support vector machines by posing the problem as a multiple kernel learning problem. The resulting multiple instance classifier gives competitive accuracy for several multi-instance benchmark datasets from different domains.

ei

PDF Web [BibTex]

2006


PDF Web [BibTex]


no image
Adapting Spatial Filter Methods for Nonstationary BCIs

Tomioka, R., Hill, J., Blankertz, B., Aihara, K.

In IBIS 2006, pages: 65-70, 2006 Workshop on Information-Based Induction Sciences, November 2006 (inproceedings)

Abstract
A major challenge in applying machine learning methods to Brain-Computer Interfaces (BCIs) is to overcome the possible nonstationarity in the data from the datablock the method is trained on and that the method is applied to. Assuming the joint distributions of the whitened signal and the class label to be identical in two blocks, where the whitening is done in each block independently, we propose a simple adaptation formula that is applicable to a broad class of spatial filtering methods including ICA, CSP, and logistic regression classifiers. We characterize the class of linear transformations for which the above assumption holds. Experimental results on 60 BCI datasets show improved classification accuracy compared to (a) fixed spatial filter approach (no adaptation) and (b) fixed spatial pattern approach (proposed by Hill et al., 2006 [1]).

ei

PDF [BibTex]

PDF [BibTex]


no image
A Linear Programming Approach for Molecular QSAR analysis

Saigo, H., Kadowaki, T., Tsuda, K.

In MLG 2006, pages: 85-96, (Editors: Gärtner, T. , G. C. Garriga, T. Meinl), International Workshop on Mining and Learning with Graphs, September 2006, Best Paper Award (inproceedings)

Abstract
Small molecules in chemistry can be represented as graphs. In a quantitative structure-activity relationship (QSAR) analysis, the central task is to find a regression function that predicts the activity of the molecule in high accuracy. Setting a QSAR as a primal target, we propose a new linear programming approach to the graph-based regression problem. Our method extends the graph classification algorithm by Kudo et al. (NIPS 2004), which is a combination of boosting and graph mining. Instead of sequential multiplicative updates, we employ the linear programming boosting (LP) for regression. The LP approach allows to include inequality constraints for the parameter vector, which turns out to be particularly useful in QSAR tasks where activity values are sometimes unavailable. Furthermore, the efficiency is improved significantly by employing multiple pricing.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Incremental Aspect Models for Mining Document Streams

Surendran, A., Sra, S.

In PKDD 2006, pages: 633-640, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2006 (inproceedings)

Abstract
In this paper we introduce a novel approach for incrementally building aspect models, and use it to dynamically discover underlying themes from document streams. Using the new approach we present an application which we call “query-line tracking” i.e., we automatically discover and summarize different themes or stories that appear over time, and that relate to a particular query. We present evaluation on news corpora to demonstrate the strength of our method for both query-line tracking, online indexing and clustering.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
PALMA: Perfect Alignments using Large Margin Algorithms

Rätsch, G., Hepp, B., Schulze, U., Ong, C.

In GCB 2006, pages: 104-113, (Editors: Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell), Gesellschaft für Informatik, Bonn, Germany, German Conference on Bioinformatics, September 2006 (inproceedings)

Abstract
Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graph Based Semi-Supervised Learning with Sharper Edges

Shin, H., Hill, N., Rätsch, G.

In ECML 2006, pages: 401-412, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning (ECML), September 2006 (inproceedings)

Abstract
In many graph-based semi-supervised learning algorithms, edge weights are assumed to be fixed and determined by the data points‘ (often symmetric)relationships in input space, without considering directionality. However, relationships may be more informative in one direction (e.g. from labelled to unlabelled) than in the reverse direction, and some relationships (e.g. strong weights between oppositely labelled points) are unhelpful in either direction. Undesirable edges may reduce the amount of influence an informative point can propagate to its neighbours -- the point and its outgoing edges have been ``blunted.‘‘ We present an approach to ``sharpening‘‘ in which weights are adjusted to meet an optimization criterion wherever they are directed towards labelled points. This principle can be applied to a wide variety of algorithms. In the current paper, we present one ad hoc solution satisfying the principle, in order to show that it can improve performance on a number of publicly available benchmark data sets.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Finite-Horizon Optimal State-Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle

Deisenroth, MP., Ohtsuka, T., Weissel, F., Brunn, D., Hanebeck, UD.

In MFI 2006, pages: 371-376, (Editors: Hanebeck, U. D.), IEEE Service Center, Piscataway, NJ, USA, 6th IEEE International Conference on Multisensor Fusion and Integration, September 2006 (inproceedings)

Abstract
In this paper, an approach to the finite-horizon optimal state-feedback control problem of nonlinear, stochastic, discrete-time systems is presented. Starting from the dynamic programming equation, the value function will be approximated by means of Taylor series expansion up to second-order derivatives. Moreover, the problem will be reformulated, such that a minimum principle can be applied to the stochastic problem. Employing this minimum principle, the optimal control problem can be rewritten as a two-point boundary-value problem to be solved at each time step of a shrinking horizon. To avoid numerical problems, the two-point boundary-value problem will be solved by means of a continuation method. Thus, the curse of dimensionality of dynamic programming is avoided, and good candidates for the optimal state-feedback controls are obtained. The proposed approach will be evaluated by means of a scalar example system.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Uniform Convergence of Adaptive Graph-Based Regularization

Hein, M.

In COLT 2006, pages: 50-64, (Editors: Lugosi, G. , H.-U. Simon), Springer, Berlin, Germany, 19th Annual Conference on Learning Theory, September 2006 (inproceedings)

Abstract
The regularization functional induced by the graph Laplacian of a random neighborhood graph based on the data is adaptive in two ways. First it adapts to an underlying manifold structure and second to the density of the data-generating probability measure. We identify in this paper the limit of the regularizer and show uniform convergence over the space of Hoelder functions. As an intermediate step we derive upper bounds on the covering numbers of Hoelder functions on compact Riemannian manifolds, which are of independent interest for the theoretical analysis of manifold-based learning methods.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Regularised CSP for Sensor Selection in BCI

Farquhar, J., Hill, N., Lal, T., Schölkopf, B.

In Proceedings of the 3rd International Brain-Computer Interface Workshop and Training Course 2006, pages: 14-15, (Editors: GR Müller-Putz and C Brunner and R Leeb and R Scherer and A Schlögl and S Wriessnegger and G Pfurtscheller), Verlag der Technischen Universität Graz, Graz, Austria, 3rd International Brain-Computer Interface Workshop and Training Course, September 2006 (inproceedings)

Abstract
The Common Spatial Pattern (CSP) algorithm is a highly successful method for efficiently calculating spatial filters for brain signal classification. Spatial filtering can improve classification performance considerably, but demands that a large number of electrodes be mounted, which is inconvenient in day-to-day BCI usage. The CSP algorithm is also known for its tendency to overfit, i.e. to learn the noise in the training set rather than the signal. Both problems motivate an approach in which spatial filters are sparsified. We briefly sketch a reformulation of the problem which allows us to do this, using 1-norm regularisation. Focusing on the electrode selection issue, we present preliminary results on EEG data sets that suggest that effective spatial filters may be computed with as few as 10--20 electrodes, hence offering the potential to simplify the practical realisation of BCI systems significantly.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Time-Dependent Demixing of Task-Relevant EEG Signals

Hill, N., Farquhar, J., Lal, T., Schölkopf, B.

In Proceedings of the 3rd International Brain-Computer Interface Workshop and Training Course 2006, pages: 20-21, (Editors: GR Müller-Putz and C Brunner and R Leeb and R Scherer and A Schlögl and S Wriessnegger and G Pfurtscheller), Verlag der Technischen Universität Graz, Graz, Austria, 3rd International Brain-Computer Interface Workshop and Training Course, September 2006 (inproceedings)

Abstract
Given a spatial filtering algorithm that has allowed us to identify task-relevant EEG sources, we present a simple approach for monitoring the activity of these sources while remaining relatively robust to changes in other (task-irrelevant) brain activity. The idea is to keep spatial *patterns* fixed rather than spatial filters, when transferring from training to test sessions or from one time window to another. We show that a fixed spatial pattern (FSP) approach, using a moving-window estimate of signal covariances, can be more robust to non-stationarity than a fixed spatial filter (FSF) approach.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
A Sober Look at Clustering Stability

Ben-David, S., von Luxburg, U., Pal, D.

In COLT 2006, pages: 5-19, (Editors: Lugosi, G. , H.-U. Simon), Springer, Berlin, Germany, 19th Annual Conference on Learning Theory, September 2006 (inproceedings)

Abstract
Stability is a common tool to verify the validity of sample based algorithms. In clustering it is widely used to tune the parameters of the algorithm, such as the number k of clusters. In spite of the popularity of stability in practical applications, there has been very little theoretical analysis of this notion. In this paper we provide a formal definition of stability and analyze some of its basic properties. Quite surprisingly, the conclusion of our analysis is that for large sample size, stability is fully determined by the behavior of the objective function which the clustering algorithm is aiming to minimize. If the objective function has a unique global minimizer, the algorithm is stable, otherwise it is unstable. In particular we conclude that stability is not a well-suited tool to determine the number of clusters - it is determined by the symmetries of the data which may be unrelated to clustering parameters. We prove our results for center-based clusterings and for spectral clustering, and support our conclusions by many examples in which the behavior of stability is counter-intuitive.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Information Marginalization on Subgraphs

Huang, J., Zhu, T., Rereiner, R., Zhou, D., Schuurmans, D.

In ECML/PKDD 2006, pages: 199-210, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2006 (inproceedings)

Abstract
Real-world data often involves objects that exhibit multiple relationships; for example, ‘papers’ and ‘authors’ exhibit both paper-author interactions and paper-paper citation relationships. A typical learning problem requires one to make inferences about a subclass of objects (e.g. ‘papers’), while using the remaining objects and relations to provide relevant information. We present a simple, unified mechanism for incorporating information from multiple object types and relations when learning on a targeted subset. In this scheme, all sources of relevant information are marginalized onto the target subclass via random walks. We show that marginalized random walks can be used as a general technique for combining multiple sources of information in relational data. With this approach, we formulate new algorithms for transduction and ranking in relational data, and quantify the performance of new schemes on real world data—achieving good results in many problems.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Bayesian Active Learning for Sensitivity Analysis

Pfingsten, T.

In ECML 2006, pages: 353-364, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning, September 2006 (inproceedings)

Abstract
Designs of micro electro-mechanical devices need to be robust against fluctuations in mass production. Computer experiments with tens of parameters are used to explore the behavior of the system, and to compute sensitivity measures as expectations over the input distribution. Monte Carlo methods are a simple approach to estimate these integrals, but they are infeasible when the models are computationally expensive. Using a Gaussian processes prior, expensive simulation runs can be saved. This Bayesian quadrature allows for an active selection of inputs where the simulation promises to be most valuable, and the number of simulation runs can be reduced further. We present an active learning scheme for sensitivity analysis which is rigorously derived from the corresponding Bayesian expected loss. On three fully featured, high dimensional physical models of electro-mechanical sensors, we show that the learning rate in the active learning scheme is significantly better than for passive learning.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Supervised Probabilistic Principal Component Analysis

Yu, S., Yu, K., Tresp, V., Kriegel, H., Wu, M.

In KDD 2006, pages: 464-473, (Editors: Ungar, L. ), ACM Press, New York, NY, USA, 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2006 (inproceedings)

Abstract
Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When labels of data are available, e.g.,~in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e.,~in a semi-supervised setting. In this paper we propose a supervised PCA model called SPPCA and a semi-supervised PCA model called S$^2$PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into the projection phase, and can naturally handle multiple outputs (i.e.,~in multi-task learning problems). We derive an efficient EM learning algorithm for both models, and also provide theoretical justifications of the model behaviors. SPPCA and S$^2$PPCA are compared with other supervised projection methods on various learning tasks, and show not only promising performance but also good scalability.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Continuation Method for Semi-Supervised SVMs

Chapelle, O., Chi, M., Zien, A.

In ICML 2006, pages: 185-192, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Semi-Supervised Support Vector Machines (S3VMs) are an appealing method for using unlabeled data in classification: their objective function favors decision boundaries which do not cut clusters. However their main problem is that the optimization problem is non-convex and has many local minima, which often results in suboptimal performances. In this paper we propose to use a global optimization technique known as continuation to alleviate this problem. Compared to other algorithms minimizing the same objective function, our continuation method often leads to lower test errors.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Trading Convexity for Scalability

Collobert, R., Sinz, F., Weston, J., Bottou, L.

In ICML 2006, pages: 201-208, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how non-convexity can provide scalability advantages over convexity. We show how concave-convex programming can be applied to produce (i) faster SVMs where training errors are no longer support vectors, and (ii) much faster Transductive SVMs.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Personalized handwriting recognition via biased regularization

Kienzle, W., Chellapilla, K.

In ICML 2006, pages: 457-464, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
We present a new approach to personalized handwriting recognition. The problem, also known as writer adaptation, consists of converting a generic (user-independent) recognizer into a personalized (user-dependent) one, which has an improved recognition rate for a particular user. The adaptation step usually involves user-specific samples, which leads to the fundamental question of how to fuse this new information with that captured by the generic recognizer. We propose adapting the recognizer by minimizing a regularized risk functional (a modified SVM) where the prior knowledge from the generic recognizer enters through a modified regularization term. The result is a simple personalization framework with very good practical properties. Experiments on a 100 class real-world data set show that the number of errors can be reduced by over 40% with as few as five user samples per character.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Deterministic annealing for semi-supervised kernel machines

Sindhwani, V., Keerthi, S., Chapelle, O.

In ICML 2006, pages: 841-848, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
An intuitive approach to utilizing unlabeled data in kernel-based classification algorithms is to simply treat the unknown labels as additional optimization variables. For margin-based loss functions, one can view this approach as attempting to learn low-density separators. However, this is a hard optimization problem to solve in typical semi-supervised settings where unlabeled data is abundant. The popular Transductive SVM algorithm is a label-switching-retraining procedure that is known to be susceptible to local minima. In this paper, we present a global optimization framework for semi-supervised Kernel machines where an easier problem is parametrically deformed to the original hard problem and minimizers are smoothly tracked. Our approach is motivated from deterministic annealing techniques and involves a sequence of convex optimization problems that are exactly and efficiently solved. We present empirical results on several synthetic and real world datasets that demonstrate the effectiveness of our approach.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Clustering Graphs by Weighted Substructure Mining

Tsuda, K., Kudo, T.

In ICML 2006, pages: 953-960, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Graph data is getting increasingly popular in, e.g., bioinformatics and text processing. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraphs, the dimensionality gets too large for usual statistical methods. We propose an efficient method for learning a binomial mixture model in this feature space. Combining the $ell_1$ regularizer and the data structure called DFS code tree, the MAP estimate of non-zero parameters are computed efficiently by means of the EM algorithm. Our method is applied to the clustering of RNA graphs, and is compared favorably with graph kernels and the spectral graph distance.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Choice Model with Infinitely Many Latent Features

Görür, D., Jäkel, F., Rasmussen, C.

In ICML 2006, pages: 361-368, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Elimination by aspects (EBA) is a probabilistic choice model describing how humans decide between several options. The options from which the choice is made are characterized by binary features and associated weights. For instance, when choosing which mobile phone to buy the features to consider may be: long lasting battery, color screen, etc. Existing methods for inferring the parameters of the model assume pre-specified features. However, the features that lead to the observed choices are not always known. Here, we present a non-parametric Bayesian model to infer the features of the options and the corresponding weights from choice data. We use the Indian buffet process (IBP) as a prior over the features. Inference using Markov chain Monte Carlo (MCMC) in conjugate IBP models has been previously described. The main contribution of this paper is an MCMC algorithm for the EBA model that can also be used in inference for other non-conjugate IBP models---this may broaden the use of IBP priors considerably.

ei

PostScript PDF Web DOI [BibTex]

PostScript PDF Web DOI [BibTex]


no image
Learning High-Order MRF Priors of Color Images

McAuley, J., Caetano, T., Smola, A., Franz, MO.

In ICML 2006, pages: 617-624, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
In this paper, we use large neighborhood Markov random fields to learn rich prior models of color images. Our approach extends the monochromatic Fields of Experts model (Roth and Blackwell, 2005) to color images. In the Fields of Experts model, the curse of dimensionality due to very large clique sizes is circumvented by parameterizing the potential functions according to a product of experts. We introduce several simplifications of the original approach by Roth and Black which allow us to cope with the increased clique size (typically 3x3x3 or 5x5x3 pixels) of color images. Experimental results are presented for image denoising which evidence improvements over state-of-the-art monochromatic image priors.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Inference with the Universum

Weston, J., Collobert, R., Sinz, F., Bottou, L., Vapnik, V.

In ICML 2006, pages: 1009-1016, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
WIn this paper we study a new framework introduced by Vapnik (1998) and Vapnik (2006) that is an alternative capacity concept to the large margin approach. In the particular case of binary classification, we are given a set of labeled examples, and a collection of "non-examples" that do not belong to either class of interest. This collection, called the Universum, allows one to encode prior knowledge by representing meaningful concepts in the same domain as the problem at hand. We describe an algorithm to leverage the Universum by maximizing the number of observed contradictions, and show experimentally that this approach delivers accuracy improvements over using labeled data alone.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Statistical Convergence of Kernel CCA

Fukumizu, K., Bach, F., Gretton, A.

In Advances in neural information processing systems 18, pages: 387-394, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
While kernel canonical correlation analysis (kernel CCA) has been applied in many problems, the asymptotic convergence of the functions estimated from a finite sample to the true functions has not yet been established. This paper gives a rigorous proof of the statistical convergence of kernel CCA and a related method (NOCCO), which provides a theoretical justification for these methods. The result also gives a sufficient condition on the decay of the regularization coefficient in the methods to ensure convergence.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Products of "Edge-perts"

Gehler, PV., Welling, M.

In Advances in neural information processing systems 18, pages: 419-426, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
Images represent an important and abundant source of data. Understanding their statistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the “products of edge-perts model” to describe the structure of wavelet transformed images. We develop a practical denoising algorithm based on a single edge-pert and show state-ofthe-art denoising performance on benchmark images.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Assessing Approximations for Gaussian Process Classification

Kuss, M., Rasmussen, C.

In Advances in neural information processing systems 18, pages: 699-706, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
Gaussian processes are attractive models for probabilistic classification but unfortunately exact inference is analytically intractable. We compare Laplace‘s method and Expectation Propagation (EP) focusing on marginal likelihood estimates and predictive performance. We explain theoretically and corroborate empirically that EP is superior to Laplace. We also compare to a sophisticated MCMC scheme and show that EP is surprisingly accurate.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning an Interest Operator from Human Eye Movements

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

In CVPWR 2006, pages: page 24, (Editors: C Schmid and S Soatto and C Tomasi), IEEE Computer Society, Los Alamitos, CA, USA, 2006 Conference on Computer Vision and Pattern Recognition Workshop, April 2006 (inproceedings)

Abstract
We present an approach for designing interest operators that are based on human eye movement statistics. In contrast to existing methods which use hand-crafted saliency measures, we use machine learning methods to infer an interest operator directly from eye movement data. That way, the operator provides a measure of biologically plausible interestingness. We describe the data collection, training, and evaluation process, and show that our learned saliency measure significantly accounts for human eye movements. Furthermore, we illustrate connections to existing interest operators, and present a multi-scale interest point detector based on the learned function.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Evaluating Predictive Uncertainty Challenge

Quinonero Candela, J., Rasmussen, C., Sinz, F., Bousquet, O., Schölkopf, B.

In Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, pages: 1-27, (Editors: J Quiñonero Candela and I Dagan and B Magnini and F d’Alché-Buc), Springer, Berlin, Germany, First PASCAL Machine Learning Challenges Workshop (MLCW), April 2006 (inproceedings)

Abstract
This Chapter presents the PASCAL Evaluating Predictive Uncertainty Challenge, introduces the contributed Chapters by the participants who obtained outstanding results, and provides a discussion with some lessons to be learnt. The Challenge was set up to evaluate the ability of Machine Learning algorithms to provide good “probabilistic predictions”, rather than just the usual “point predictions” with no measure of uncertainty, in regression and classification problems. Parti-cipants had to compete on a number of regression and classification tasks, and were evaluated by both traditional losses that only take into account point predictions and losses we proposed that evaluate the quality of the probabilistic predictions.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Estimating Predictive Variances with Kernel Ridge Regression

Cawley, G., Talbot, N., Chapelle, O.

In MLCW 2005, pages: 56-77, (Editors: Quinonero-Candela, J. , I. Dagan, B. Magnini, F. D‘Alché-Buc), Springer, Berlin, Germany, First PASCAL Machine Learning Challenges Workshop, April 2006 (inproceedings)

Abstract
In many regression tasks, in addition to an accurate estimate of the conditional mean of the target distribution, an indication of the predictive uncertainty is also required. There are two principal sources of this uncertainty: the noise process contaminating the data and the uncertainty in estimating the model parameters based on a limited sample of training data. Both of them can be summarised in the predictive variance which can then be used to give confidence intervals. In this paper, we present various schemes for providing predictive variances for kernel ridge regression, especially in the case of a heteroscedastic regression, where the variance of the noise process contaminating the data is a smooth function of the explanatory variables. The use of leave-one-out cross-validation is shown to eliminate the bias inherent in estimates of the predictive variance. Results obtained on all three regression tasks comprising the predictive uncertainty challenge demonstrate the value of this approach.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Machine Learning Methods For Estimating Operator Equations

Steinke, F., Schölkopf, B.

In Proceedings of the 14th IFAC Symposium on System Identification (SYSID 2006), pages: 6, (Editors: B Ninness and H Hjalmarsson), Elsevier, Oxford, United Kingdom, 14th IFAC Symposium on System Identification (SYSID), March 2006 (inproceedings)

Abstract
We consider the problem of fitting a linear operator induced equation to point sampled data. In order to do so we systematically exploit the duality between minimizing a regularization functional derived from an operator and kernel regression methods. Standard machine learning model selection algorithms can then be interpreted as a search of the equation best fitting given data points. For many kernels this operator induced equation is a linear differential equation. Thus, we link a continuous-time system identification task with common machine learning methods. The presented link opens up a wide variety of methods to be applied to this system identification problem. In a series of experiments we demonstrate an example algorithm working on non-uniformly spaced data, giving special focus to the problem of identifying one system from multiple data recordings.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Implicit Volterra and Wiener Series for Higher-Order Image Analysis

Franz, M., Schölkopf, B.

In Advances in Data Analysis: Proceedings of the 30th Annual Conference of The Gesellschaft für Klassifikation, 30, pages: 1, March 2006 (inproceedings)

Abstract
The computation of classical higher-order statistics such as higher-order moments or spectra is difficult for images due to the huge number of terms to be estimated and interpreted. We propose an alternative approach in which multiplicative pixel interactions are described by a series of Wiener functionals. Since the functionals are estimated implicitly via polynomial kernels, the combinatorial explosion associated with the classical higher-order statistics is avoided. In addition, the kernel framework allows for estimating infinite series expansions and for the regularized estimation of the Wiener series. First results show that image structures such as lines or corners can be predicted correctly, and that pixel interactions up to the order of five play an important role in natural images.

ei

PDF [BibTex]

PDF [BibTex]


no image
Causal Inference by Choosing Graphs with Most Plausible Markov Kernels

Sun, X., Janzing, D., Schölkopf, B.

In Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics, pages: 1-11, ISAIM, January 2006 (inproceedings)

Abstract
We propose a new inference rule for estimating causal structure that underlies the observed statistical dependencies among n random variables. Our method is based on comparing the conditional distributions of variables given their direct causes (the so-called Markov kernels") for all hypothetical causal directions and choosing the most plausible one. We consider those Markov kernels most plausible, which maximize the (conditional) entropies constrained by their observed first moment (expectation) and second moments (variance and covariance with its direct causes) based on their given domain. In this paper, we discuss our inference rule for causal relationships between two variables in detail, apply it to a real-world temperature data set with known causality and show that our method provides a correct result for the example.

ei

PDF Web [BibTex]

PDF Web [BibTex]