Header logo is


2007


no image
Entire Regularization Paths for Graph Data

Tsuda, K.

In ICML 2007, pages: 919-926, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th Annual International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
Graph data such as chemical compounds and XML documents are getting more common in many application domains. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraph patterns, the dimensionality gets too large for usual statistical methods. We propose an efficient method to select a small number of salient patterns by regularization path tracking. The generation of useless patterns is minimized by progressive extension of the search space. In experiments, it is shown that our technique is considerably more efficient than a simpler approach based on frequent substructure mining.

ei

PDF Web DOI [BibTex]

2007


PDF Web DOI [BibTex]


no image
Les Représentations Prédictives des États et des Politiques

Boularias, A., Chaib-Draa, B.

In MFI 2007, pages: 37-48, Quatrièmes Journées Francophones Modèles Formels de l‘Interaction, June 2007 (inproceedings)

Abstract
Nous proposons dans cet article une nouvelle approche pour représenter les politiques (stratégies) dans les environnements stochastiques et partiellement observables. Nous nous intéressons plus particulièrement aux systèmes multi-agents, où chaque agent connaît uniquement ses propres politiques, et doit choisir la meilleure parmi elles selon son état de croyance sur les politiques du reste des agents. Notre modèle utilise moins de paramètres que les méthodes de représentation usuelles, telles que les arbres de décision ou les contrôleurs d’états finis stochastiques, permettant ainsi une accélération des algorithmes de planification. Nous montrons aussi comment ce modèle peut être utilisé efficacement dans le cas de la planification multiagents coopérative et sans communication, les résultats empiriques sont comparés avec le modèle DEC-POMDP (Decentralized Partially Observable Markov Decision Process).

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graph Laplacians and their Convergence on Random Neighborhood Graphs

Hein, M., Audibert, J., von Luxburg, U.

Journal of Machine Learning Research, 8, pages: 1325-1370, June 2007 (article)

Abstract
Given a sample from a probability measure with support on a submanifold in Euclidean space one can construct a neighborhood graph which can be seen as an approximation of the submanifold. The graph Laplacian of such a graph is used in several machine learning methods like semi-supervised learning, dimensionality reduction and clustering. In this paper we determine the pointwise limit of three different graph Laplacians used in the literature as the sample size increases and the neighborhood size approaches zero. We show that for a uniform measure on the submanifold all graph Laplacians have the same limit up to constants. However in the case of a non-uniform measure on the submanifold only the so called random walk graph Laplacian converges to the weighted Laplace-Beltrami operator.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
An Extensible Probabilistic Transformation-based Approach to the Third Recognizing Textual Entailment Challenge

Harmeling, S.

In TextEntail 2007, pages: 137-142, ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, June 2007 (inproceedings)

Abstract
We introduce a system for textual entailment that is based on a probabilistic model of entailment. The model is defined using some calculus of transformations on dependency trees, which is characterized by the fact that derivations in that calculus preserve the truth only with a certain probability. We also describe a possible set of transformations (and with it implicitly a calculus) that was successfully applied to the RTE3 challenge data. However, our system can be improved in many ways and we see it as the starting point for a promising new approach to textual entailment.

ei

Web [BibTex]

Web [BibTex]


no image
Weighted Substructure Mining for Image Analysis

Nowozin, S., Tsuda, K., Uno, T., Kudo, T., BakIr, G.

In CVPR 2007, pages: 1-8, IEEE Computer Society, Los Alamitos, CA, USA, 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2007 (inproceedings)

Abstract
In web-related applications of image categorization, it is desirable to derive an interpretable classification rule with high accuracy. Using the bag-of-words representation and the linear support vector machine, one can partly fulfill the goal, but the accuracy of linear classifiers is not high and the obtained features are not informative for users. We propose to combine item set mining and large margin classifiers to select features from the power set of all visual words. Our resulting classification rule is easier to browse and simpler to understand, because each feature has richer information. As a next step, each image is represented as a graph where nodes correspond to local image features and edges encode geometric relations between features. Combining graph mining and boosting, we can obtain a classification rule based on subgraph features that contain more information than the set features. We evaluate our algorithm in a web-retrieval ranking task where the goal is to reject outliers from a set of images returned for a keyword query. Furthermore, it is evaluated on the supervised classification tasks with the challenging VOC2005 data set. Our approach yields excellent accuracy in the unsupervised ranking task compared to a recently proposed probabilistic model and competitive results in the supervised classification task.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Local Learning Projections

Wu, M., Yu, K., Yu, S., Schölkopf, B.

In Proceedings of the 24th International Conference on Machine Learning, pages: 1039-1046, (Editors: Z Ghahramani), ACM Press, New York, NY, USA, ICML, June 2007 (inproceedings)

Abstract
This paper presents a Local Learning Projection (LLP) approach for linear dimensionality reduction. We first point out that the well known Principal Component Analysis (PCA) essentially seeks the projection that has the minimal global estimation error. Then we propose a dimensionality reduction algorithm that leads to the projection with the minimal local estimation error, and elucidate its advantages for classification tasks. We also indicate that LLP keeps the local information in the sense that the projection value of each point can be well estimated based on its neighbors and their projection values. Experimental results are provided to validate the effectiveness of the proposed algorithm.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Training and Approximation of a Primal Multiclass Support Vector Machine

Zien, A., Bona, F., Ong, C.

In ASMDA 2007, pages: 1-8, (Editors: Skiadas, C. H.), 12th International Conference on Applied Stochastic Models and Data Analysis, June 2007 (inproceedings)

Abstract
We revisit the multiclass support vector machine (SVM) and generalize the formulation to convex loss functions and joint feature maps. Motivated by recent work [Chapelle, 2006] we use logistic loss and softmax to enable gradient based primal optimization. Kernels are incorporated via kernel principal component analysis (KPCA), which naturally leads to approximation methods for large scale problems. We investigate similarities and differences to previous multiclass SVM approaches. Experimental comparisons to previous approaches and to the popular one-vs-rest SVM are presented on several different datasets.

ei

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
Nonlinear independent component analysis with minimum nonlinear distortion

Zhang, K., Chan, L.

In ICML ’07: Proceedings of the 24th international conference on Machine learning, pages: 1127-1134, (Editors: Z Ghahramani), ACM, New York, NY, USA, 24th International Conference on Machine Learning (ICML), June 2007 (inproceedings)

ei

PDF [BibTex]

PDF [BibTex]


no image
Information-theoretic Metric Learning

Davis, J., Kulis, B., Jain, P., Sra, S., Dhillon, I.

In ICML 2007, pages: 209-216, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th Annual International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function. We formulate the problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the distance function. We express this problem as a particular Bregman optimization problem---that of minimizing the LogDet divergence subject to linear constraints. Our resulting algorithm has several advantages over existing methods. First, our method can handle a wide variety of constraints and can optionally incorporate a prior on the distance function. Second, it is fast and scalable. Unlike most existing methods, no eigenvalue computations or semi-definite programming are required. We also present an online version and derive regret bounds for the resulting algorithm. Finally, we evaluate our method on a recent error reporting system for software called Clarify, in the context of metric learning for nearest neighbor classification, as well as on standard data sets.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Dependence Maximization View of Clustering

Song, L., Smola, A., Gretton, A., Borgwardt, K.

In Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007), pages: 815-822, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, Twenty-Fourth Annual International Conference on Machine Learning (ICML), June 2007 (inproceedings)

Abstract
We propose a family of clustering algorithms based on the maximization of dependence between the input variables and their cluster labels, as expressed by the Hilbert-Schmidt Independence Criterion (HSIC). Under this framework, we unify the geometric, spectral, and statistical dependence views of clustering, and subsume many existing algorithms as special cases (e.g. k-means and spectral clustering). Distinctive to our framework is that kernels can also be applied on the labels, which can endow them with particular structures. We also obtain a perturbation bound on the change in k-means clustering.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Multiclass Multiple Kernel Learning

Zien, A., Ong, C.

In ICML 2007, pages: 1191-1198, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semi-infinite linear programs (SILPs) toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Transductive Support Vector Machines for Structured Variables

Zien, A., Brefeld, U., Scheffer, T.

In ICML 2007, pages: 1183-1190, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
We study the problem of learning kernel machines transductively for structured output variables. Transductive learning can be reduced to combinatorial optimization problems over all possible labelings of the unlabeled data. In order to scale transductive learning to structured variables, we transform the corresponding non-convex, combinatorial, constrained optimization problems into continuous, unconstrained optimization problems. The discrete optimization parameters are eliminated and the resulting differentiable problems can be optimized efficiently. We study the effectiveness of the generalized TSVM on multiclass classification and label-sequence learning problems empirically.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


Thumb xl cvpr07scape
Detailed human shape and pose from images

Balan, A., Sigal, L., Black, M. J., Davis, J., Haussecker, H.

In IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, pages: 1-8, Minneapolis, June 2007 (inproceedings)

ps

pdf YouTube [BibTex]

pdf YouTube [BibTex]


no image
Learning static Gestalt laws through dynamic experience

Ostrovsky, Y., Wulff, J., Sinha, P.

Journal of Vision, 7(9):315-315, ARVO, June 2007 (article)

Abstract
The Gestalt laws (Wertheimer 1923) are widely regarded as the rules that help us parse the world into objects. However, it is unclear as to how these laws are acquired by an infant's visual system. Classically, these “laws” have been presumed to be innate (Kellman and Spelke 1983). But, more recent work in infant development, showing the protracted time-course over which these grouping principles emerge (e.g., Johnson and Aslin 1995; Craton 1996), suggests that visual experience might play a role in their genesis. Specifically, our studies of patients with late-onset vision (Project Prakash; VSS 2006) and evidence from infant development both point to an early role of common motion cues for object grouping. Here we explore the possibility that the privileged status of motion in the developmental timeline is not happenstance, but rather serves to bootstrap the learning of static Gestalt cues. Our approach involves computational analyses of real-world motion sequences to investigate whether primitive optic flow information is correlated with static figural cues that could eventually come to serve as proxies for grouping in the form of Gestalt principles. We calculated local optic flow maps and then examined how similarity of motion across image patches co-varied with similarity of certain figural properties in static frames. Results indicate that patches with similar motion are much more likely to have similar luminance, color, and orientation as compared to patches with dissimilar motion vectors. This regularity suggests that, in principle, common motion extracted from dynamic visual experience can provide enough information to bootstrap region grouping based on luminance and color and contour continuation mechanisms in static scenes. These observations, coupled with the cited experimental studies, lend credence to the hypothesis that static Gestalt laws might be learned through a bootstrapping process based on early dynamic experience.

ps

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
The power of external mentors for women pursuing academic careers in engineering and science: Stories of MentorNet ACE and its Proteges and Mentors

Muller, C. B., Smith, E. H. B., Chou-Green, J., Daniels-Race, T., Drummond, A., Kuchenbecker, K. J.

In Proc. Women in Engineering Programs and Advocates Network (WEPAN) National Conference, Lake Buena Vista, Florida, USA, June 2007, Oral presentation given by Muller (inproceedings)

hi

[BibTex]

[BibTex]


no image
Effects of Visual and Proprioceptive Position Feedback on Human Control of Targeted Movement

Kuchenbecker, K. J., Gurari, N., Okamura, A. M.

In Proc. IEEE International Conference on Rehabilitation Robotics, pages: 513-524, Noordwijk, Netherlands, June 2007, Oral and poster presentations given by Kuchenbecker (inproceedings)

hi

[BibTex]

[BibTex]


no image
Asymptotic stability of the solution of the M/MB/1 queueing model

Haji, A., Radl, A.

Computers and Mathematics with Applications, 53(9):1411-1420, May 2007 (article)

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Competition and Coordination in Stochastic Games

Burkov, A., Boularias, A., Chaib-Draa, B.

In Canadian AI 2007, pages: 26-37, (Editors: Kobti, Z. , D. Wu), Springer, Berlin, Germany, 20th Conference of the Canadian Society for Computational Studies of Intelligence, May 2007 (inproceedings)

Abstract
Agent competition and coordination are two classical and most important tasks in multiagent systems. In recent years, there was a number of learning algorithms proposed to resolve such type of problems. Among them, there is an important class of algorithms, called adaptive learning algorithms, that were shown to be able to converge in self-play to a solution in a wide variety of the repeated matrix games. Although certain algorithms of this class, such as Infinitesimal Gradient Ascent (IGA), Policy Hill-Climbing (PHC) and Adaptive Play Q-learning (APQ), have been catholically studied in the recent literature, a question of how these algorithms perform versus each other in general form stochastic games is remaining little-studied. In this work we are trying to answer this question. To do that, we analyse these algorithms in detail and give a comparative analysis of their behavior on a set of competition and coordination stochastic games. Also, we introduce a new multiagent learning algorithm, called ModIGA. This is an extension of the IGA algorithm, which is able to estimate the strategy of its opponents in the cases when they do not explicitly play mixed strategies (e.g., APQ) and which can be applied to the games with more than two actions.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
MR Angiography of Dural Arteriovenous Fistulas: Diagnosis and Follow-Up after Treatment Using a Time-Resolved 3D Contrast-Enhanced Technique

Meckel, S., Maier, M., San Millan Ruiz, D., Yilmaz, H., Scheffler, K., Radü, E., Wetzel, S.

American Journal of Neuroradiology, 28(5):877-884, May 2007 (article)

Abstract
BACKGROUND AND PURPOSE: Digital subtraction angiography (DSA) is the method of reference for imaging of dural arteriovenous fistula (DAVF). The goal of this study was to analyze the value of different MR images including 3D contrast-enhanced MR angiography (MRA) with a high temporal resolution in diagnostic and follow-up imaging of DAVFs. MATERIALS AND METHODS: A total of 18 MR/MRA examinations from 14 patients with untreated (n = 9) and/or treated (n = 9) DAVFs were evaluated. Two observers assessed all MR and MRA investigations for signs indicating the presence of a DAVF, for fistula characteristics such as fistula grading, location of fistulous point, and fistula obliteration after treatment. All results were compared with DSA findings. RESULTS: On time-resolved 3D contrast-enhanced (TR 3D) MRA, the side and presence of all patent fistulas (n = 13) were correctly indicated, and no false-positive findings were observed in occluded DAVFs (n = 5). Grading of fistulas with this imaging technique was correct in 77% and 85% of patent fistulas for both readers, respectively. On T2-weighted images, signs indicative of a DAVF were encountered only in fistulas with cortical venous reflux (56%), whereas on 3D time-of-flight (TOF) MRA, most fistulas (88%) were correctly detected. In complete fistula occlusion, false-positive findings were encountered on both T2-weighted images and on TOF MRA images. CONCLUSION: In this study, TR 3D MRA proved reliable in detecting DAVFs and suitable for follow-up imaging. The technique allowed—within limitations—to grade DAVFs. Although 3D TOF MRA can depict signs of DAVFs, its value for follow-up imaging is limited.

ei

Web [BibTex]

Web [BibTex]


no image
Bayesian Reconstruction of the Density of States

Habeck, M.

Physical Review Letters, 98(20, 200601):1-4, May 2007 (article)

Abstract
A Bayesian framework is developed to reconstruct the density of states from multiple canonical simulations. The framework encompasses the histogram reweighting method of Ferrenberg and Swendsen. The new approach applies to nonparametric as well as parametric models and does not require simulation data to be discretized. It offers a means to assess the precision of the reconstructed density of states and of derived thermodynamic quantities.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
PALMA: mRNA to Genome Alignments using Large Margin Algorithms

Schulze, U., Hepp, B., Ong, C., Rätsch, G.

Bioinformatics, 23(15):1892-1900, May 2007 (article)

Abstract
Motivation: Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. Results: We present a novel approach based on large margin learning that combines accurate plice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm – called PALMA – tunes the parameters of the model such that true alignments score higher than other alignments. We study the accuracy of alignments of mRNAs containing artificially generated micro-exons to genomic DNA. In a carefully designed experiment, we show that our algorithm accurately identifies the intron boundaries as well as boundaries of the optimal local alignment. It outperforms all other methods: for 5702 artificially shortened EST sequences from C. elegans and human it correctly identifies the intron boundaries in all except two cases. The best other method is a recently proposed method called exalin which misaligns 37 of the sequences. Our method also demonstrates robustness to mutations, insertions and deletions, retaining accuracy even at high noise levels. Availability: Datasets for training, evaluation and testing, additional results and a stand-alone alignment tool implemented in C++ and python are available at http://www.fml.mpg.de/raetsch/projects/palma.

ei

Web DOI [BibTex]

Web DOI [BibTex]


Thumb xl aperture
Decoding grasp aperture from motor-cortical population activity

Artemiadis, P., Shakhnarovich, G., Vargas-Irwin, C., Donoghue, J. P., Black, M. J.

In The 3rd International IEEE EMBS Conference on Neural Engineering, pages: 518-521, May 2007 (inproceedings)

ps

pdf [BibTex]

pdf [BibTex]


Thumb xl ner07
Multi-state decoding of point-and-click control signals from motor cortical activity in a human with tetraplegia

Kim, S., Simeral, J., Hochberg, L., Donoghue, J. P., Friehs, G., Black, M. J.

In The 3rd International IEEE EMBS Conference on Neural Engineering, pages: 486-489, May 2007 (inproceedings)

Abstract
Basic neural-prosthetic control of a computer cursor has been recently demonstrated by Hochberg et al. [1] using the BrainGate system (Cyberkinetics Neurotechnology Systems, Inc.). While these results demonstrate the feasibility of intracortically-driven prostheses for humans with paralysis, a practical cursor-based computer interface requires more precise cursor control and the ability to “click” on areas of interest. Here we present a practical point and click device that decodes both continuous states (e.g. cursor kinematics) and discrete states (e.g. click state) from single neural population in human motor cortex. We describe a probabilistic multi-state decoder and the necessary training paradigms that enable point and click cursor control by a human with tetraplegia using an implanted microelectrode array. We present results from multiple recording sessions and quantify the point and click performance.

ps

pdf [BibTex]

pdf [BibTex]


no image
The role of the striatum in adaptation learning: a computational model

Grosse-Wentrup, M., Contreras-Vidal, J.

Biological Cybernetics, 96(4):377-388, April 2007 (article)

Abstract
To investigate the functional role of the striatum in visuo-motor adaptation, we extend the DIRECT-model for visuo-motor reaching movements formulated by Bullock et al.(J Cogn Neurosci 5:408–435,1993) through two parallel loops, each modeling a distinct contribution of the cortico–cerebellar–thalamo–cortical and the cortico–striato–thalamo–cortical networks to visuo-motor adaptation. Based on evidence of Robertson and Miall(Neuroreport 10(5): 1029–1034, 1999), we implement the function of the cortico–cerebellar–thalamo–cortical loop as a module that gradually adapts to small changes in sensorimotor relationships. The cortico–striato–thalamo–cortical loop on the other hand is hypothesized to act as an adaptive search element, guessing new sensorimotor-transformations and reinforcing successful guesses while punishing unsuccessful ones. In a first step, we show that the model reproduces trajectories and error curves of healthy subjects in a two dimensional center-out reaching task with rotated screen cursor visual feedback. In a second step, we disable learning processes in the cortico–striato– thalamo–cortical loop to simulate subjects with Parkinson’s disease (PD), and show that this leads to error curves typical of subjects with PD. We conclude that the results support our hypothesis, i.e., that the role of the cortico–striato–thalamo–cortical loop in visuo-motor adaptation is that of an adaptive search element.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Change-Point Detection using Krylov Subspace Learning

Ide, T., Tsuda, K.

In SDM 2007, pages: 515-520, (Editors: Apte, C. ), Society for Industrial and Applied Mathematics, Pittsburgh, PA, USA, SIAM International Conference on Data Mining, April 2007 (inproceedings)

Abstract
We propose an efficient algorithm for principal component analysis (PCA) that is applicable when only the inner product with a given vector is needed. We show that Krylov subspace learning works well both in matrix compression and implicit calculation of the inner product by taking full advantage of the arbitrariness of the seed vector. We apply our algorithm to a PCA-based change-point detection algorithm, and show that it results in about 50 times improvement in computational time.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
A Bayesian Approach to Nonlinear Parameter Identification for Rigid Body Dynamics

Ting, J., Mistry, M., Peters, J., Schaal, S., Nakanishi, J.

In RSS 2006, pages: 247-254, (Editors: Sukhatme, G. S., S. Schaal, W. Burgard, D. Fox), MIT Press, Cambridge, MA, USA, Robotics: Science and Systems II (RSS ), April 2007 (inproceedings)

Abstract
For robots of increasing complexity such as humanoid robots, conventional identification of rigid body dynamics models based on CAD data and actuator models becomes difficult and inaccurate due to the large number of additional nonlinear effects in these systems, e.g., stemming from stiff wires, hydraulic hoses, protective shells, skin, etc. Data driven parameter estimation offers an alternative model identification method, but it is often burdened by various other problems, such as significant noise in all measured or inferred variables of the robot. The danger of physically inconsistent results also exists due to unmodeled nonlinearities or insufficiently rich data. In this paper, we address all these problems by developing a Bayesian parameter identification method that can automatically detect noise in both input and output data for the regression algorithm that performs system identification. A post-processing step ensures physically consistent rigid body parameters by nonlinearly projecting the result of the Bayesian estimation onto constraints given by positive definite inertia matrices and the parallel axis theorem. We demonstrate on synthetic and actual robot data that our technique performs parameter identification with 5 to 20% higher accuracy than traditional methods. Due to the resulting physically consistent parameters, our algorithm enables us to apply advanced control methods that algebraically require physical consistency on robotic platforms.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
A robust fetal ECG detection method for abdominal recordings

Martens, SMM., Rabotti, C., Mischi, M., Sluijter, RJ.

Physiological Measurement, 28(4):373-388, April 2007, Martin Black Prize for best paper Physiological Measurement 2007 (article)

Abstract
In this paper, we propose a new method for FECG detection in abdominal recordings. The method consists of a sequential analysis approach, in which the a priori information about the interference signals is used for the detection of the FECG. Our method is evaluated on a set of 20 abdominal recordings from pregnant women with different gestational ages. Its performance in terms of fetal heart rate (FHR) detection success is compared with that of independent component analysis (ICA). The results show that our sequential estimation method outperforms ICA with a FHR detection rate of 85% versus 60% of ICA. The superior performance of our method is especially evident in recordings with a low signal-to-noise ratio (SNR). This indicates that our method is more robust than ICA for FECG detection.

ei

DOI [BibTex]

DOI [BibTex]


no image
Learning causality by identifying common effects with kernel-based dependence measures

Sun, X., Janzing, D.

In ESANN 2007, pages: 453-458, D-Side, Evere, Belgium, 15th European Symposium on Artificial Neural Networks, April 2007 (inproceedings)

Abstract
We describe a method for causal inference that measures the strength of statistical dependence by the Hilbert-Schmidt norm of kernel-based conditional cross-covariance operators. We consider the increase of the dependence of two variables X and Y by conditioning on a third variable Z as a hint for Z being a common effect of X and Y. Based on this assumption, we collect "votes" for hypothetical causal directions and orient the edges according to the majority vote. For most of our experiments with artificial and real-world data our method has outperformed the conventional constraint-based inductive causation (IC) algorithm.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Exploring the causal order of binary variables via exponential hierarchies of Markov kernels

Sun, X., Janzing, D.

In ESANN 2007, pages: 465-470, D-Side, Evere, Belgium, 15th European Symposium on Artificial Neural Networks, April 2007 (inproceedings)

Abstract
We propose a new algorithm for estimating the causal structure that underlies the observed dependence among n (n>=4) binary variables X_1,...,X_n. Our inference principle states that the factorization of the joint probability into conditional probabilities for X_j given X_1,...,X_{j-1} often leads to simpler terms if the order of variables is compatible with the directed acyclic graph representing the causal structure. We study joint measures of OR/AND gates and show that the complexity of the conditional probabilities (the so-called Markov kernels), defined by a hierarchy of exponential models, depends on the order of the variables. Some toy and real-data experiments support our inference rule.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning

Peters, J., Schaal, S.

In Proceedings of the 15th European Symposium on Artificial Neural Networks (ESANN 2007), pages: 295-300, D-Side, Evere, Belgium, 15th European Symposium on Artificial Neural Networks (ESANN), April 2007 (inproceedings)

Abstract
In this paper, we investigate motor primitive learning with the Natural Actor-Critic approach. The Natural Actor-Critic consists out of actor updates which are achieved using natural stochastic policy gradients while the critic obtains the natural policy gradient by linear regression. We show that this architecture can be used to learn the “building blocks of movement generation”, called motor primitives. Motor primitives are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. We show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fast Newton-type Methods for the Least Squares Nonnegative Matrix Approximation Problem

Kim, D., Sra, S., Dhillon, I.

In SDM 2007, pages: 343-354, (Editors: Apte, C. ), Society for Industrial and Applied Mathematics, Pittsburgh, PA, USA, SIAM International Conference on Data Mining, April 2007 (inproceedings)

Abstract
Nonnegative Matrix Approximation is an effective matrix decomposition technique that has proven to be useful for a wide variety of applications ranging from document analysis and image processing to bioinformatics. There exist a few algorithms for nonnegative matrix approximation (NNMA), for example, Lee & Seung’s multiplicative updates, alternating least squares, and certain gradient descent based procedures. All of these procedures suffer from either slow convergence, numerical instabilities, or at worst, theoretical unsoundness. In this paper we present new and improved algorithms for the least-squares NNMA problem, which are not only theoretically well-founded, but also overcome many of the deficiencies of other methods. In particular, we use non-diagonal gradient scaling to obtain rapid convergence. Our methods provide numerical results superior to both Lee & Seung’s method as well to the alternating least squares (ALS) heuristic, which is known to work well in some situations but has no theoretical guarantees (Berry et al. 2006). Our approach extends naturally to include regularization and box-constraints, without sacrificing convergence guarantees. We present experimental results on both synthetic and realworld datasets to demonstrate the superiority of our methods, in terms of better approximations as well as efficiency.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Distinguishing Between Cause and Effect via Kernel-Based Complexity Measures for Conditional Distributions

Sun, X., Janzing, D., Schölkopf, B.

In Proceedings of the 15th European Symposium on Artificial Neural Networks , pages: 441-446, (Editors: M Verleysen), D-Side Publications, Evere, Belgium, ESANN, April 2007 (inproceedings)

Abstract
We propose a method to evaluate the complexity of probability measures from data that is based on a reproducing kernel Hilbert space seminorm of the logarithm of conditional probability densities. The motivation is to provide a tool for a causal inference method which assumes that conditional probabilities for effects given their causes are typically simpler and smoother than vice-versa. We present experiments with toy data where the quantitative results are consistent with our intuitive understanding of complexity and smoothness. Also in some examples with real-world data the probability measure corresponding to the true causal direction turned out to be less complex than those of the reversed order.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Deterministic Annealing for Multiple-Instance Learning

Gehler, P., Chapelle, O.

In JMLR Workshop and Conference Proceedings Volume 2: AISTATS 2007, pages: 123-130, (Editors: Meila, M. , X. Shen), MIT Press, Cambridge, MA, USA, 11th International Conference on Artificial Intelligence and Statistics, March 2007 (inproceedings)

Abstract
In this paper we demonstrate how deterministic annealing can be applied to different SVM formulations of the multiple-instance learning (MIL) problem. Our results show that we find better local minima compared to the heuristic methods those problems are usually solved with. However this does not always translate into a better test error suggesting an inadequacy of the objective function. Based on this finding we propose a new objective function which together with the deterministic annealing algorithm finds better local minima and achieves better performance on a set of benchmark datasets. Furthermore the results also show how the structure of MIL datasets influence the performance of MIL algorithms and we discuss how future benchmark datasets for the MIL problem should be designed.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Bayesian Inference and Optimal Design in the Sparse Linear Model

Seeger, M., Steinke, F., Tsuda, K.

In JMLR Workshop and Conference Proceedings Volume 2: AISTATS 2007, pages: 444-451, (Editors: Meila, M. , X. Shen), JMLR, Cambridge, MA, USA, 11th International Conference on Artificial Intelligence and Statistics, March 2007 (inproceedings)

Abstract
The sparse linear model has seen many successful applications in Statistics, Machine Learning, and Computational Biology, such as identification of gene regulatory networks from micro-array expression data. Prior work has either approximated Bayesian inference by expensive Markov chain Monte Carlo, or replaced it by point estimation. We show how to obtain a good approximation to Bayesian analysis efficiently, using the Expectation Propagation method. We also address the problems of optimal design and hyperparameter estimation. We demonstrate our framework on a gene network identification task.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Stick-breaking Construction for the Indian Buffet Process

Teh, Y., Görür, D., Ghahramani, Z.

In JMLR Workshop and Conference Proceedings Volume 2: AISTATS 2007, pages: 556-563, (Editors: Meila, M. , X. Shen), MIT Press, Cambridge, MA, USA, 11th International Conference on Artificial Intelligence and Statistics, March 2007 (inproceedings)

Abstract
The Indian buffet process (IBP) is a Bayesian nonparametric distribution whereby objects are modelled using an unbounded number of latent features. In this paper we derive a stick-breaking representation for the IBP. Based on this new representation, we develop slice samplers for the IBP that are efficient, easy to implement and are more generally applicable than the currently available Gibbs sampler. This representation, along with the work of Thibaux and Jordan [17], also illuminates interesting theoretical connections between the IBP, Chinese restaurant processes, Beta processes and Dirichlet processes.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fast Kernel ICA using an Approximate Newton Method

Shen, H., Jegelka, S., Gretton, A.

In JMLR Workshop and Conference Proceedings Volume 2: AISTATS 2007, pages: 476-483, (Editors: Meila, M. , X. Shen), MIT Press, Cambridge, MA, USA, 11th International Conference on Artificial Intelligence and Statistics, March 2007 (inproceedings)

Abstract
Recent approaches to independent component analysis (ICA) have used kernel independence measures to obtain very good performance, particularly where classical methods experience difficulty (for instance, sources with near-zero kurtosis). We present Fast Kernel ICA (FastKICA), a novel optimisation technique for one such kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). Our search procedure uses an approximate Newton method on the special orthogonal group, where we estimate the Hessian locally about independence. We employ incomplete Cholesky decomposition to efficiently compute the gradient and approximate Hessian. FastKICA results in more accurate solutions at a given cost compared with gradient descent, and is relatively insensitive to local minima when initialised far from independence. These properties allow kernel approaches to be extended to problems with larger numbers of sources and observations. Our method is competitive with other modern and classical ICA approaches in both speed and accuracy.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Neighborhood Property based Pattern Selection for Support Vector Machines

Shin, H., Cho, S.

Neural Computation, 19(3):816-855, March 2007 (article)

Abstract
The support vector machine (SVM) has been spotlighted in the machine learning community because of its theoretical soundness and practical performance. When applied to a large data set, however, it requires a large memory and a long time for training. To cope with the practical difficulty, we propose a pattern selection algorithm based on neighborhood properties. The idea is to select only the patterns that are likely to be located near the decision boundary. Those patterns are expected to be more informative than the randomly selected patterns. The experimental results provide promising evidence that it is possible to successfully employ the proposed algorithm ahead of SVM training.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Training a Support Vector Machine in the Primal

Chapelle, O.

Neural Computation, 19(5):1155-1178, March 2007 (article)

Abstract
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and non-linear SVMs, and that there is no reason for ignoring this possibilty. On the contrary, from the primal point of view new families of algorithms for large scale SVM training can be investigated.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Transductive Classification via Local Learning Regularization

Wu, M., Schölkopf, B.

In JMLR Workshop and Conference Proceedings Volume 2: AISTATS 2007, pages: 628-635, (Editors: M Meila and X Shen), 11th International Conference on Artificial Intelligence and Statistics, March 2007 (inproceedings)

Abstract
The idea of local learning, classifying a particular point based on its neighbors, has been successfully applied to supervised learning problems. In this paper, we adapt it for Transductive Classification (TC) problems. Specifically, we formulate a Local Learning Regularizer (LL-Reg) which leads to a solution with the property that the label of each data point can be well predicted based on its neighbors and their labels. For model selection, an efficient way to compute the leave-one-out classification error is provided for the proposed and related algorithms. Experimental results using several benchmark datasets illustrate the effectiveness of the proposed approach.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Quantifying the value of visual and haptic position feedback in force-based motion control

Kuchenbecker, K. J., Gurari, N., Okamura, A. M.

In Proc. IEEE World Haptics Conference, pages: 561-562, Tsukuba, Japan, March 2007, Poster presentation given by Kuchenbecker (inproceedings)

hi

[BibTex]

[BibTex]


no image
Shaping event-based haptic transients via an improved understanding of real contact dynamics

Fiene, J. P., Kuchenbecker, K. J.

In Proc. IEEE World Haptics Conference, pages: 170-175, Tsukuba, Japan, March 2007, Oral presentation given by Fiene. {B}est Haptic Technology Paper Award (inproceedings)

hi

[BibTex]

[BibTex]


no image
Improving the Caenorhabditis elegans Genome Annotation Using Machine Learning

Rätsch, G., Sonnenburg, S., Srinivasan, J., Witte, H., Müller, K., Sommer, R., Schölkopf, B.

PLoS Computational Biology, 3(2, e20):0313-0322, February 2007 (article)

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
The Independent Components of Natural Images are Perceptually Dependent

Bethge, M., Wiecki, T., Wichmann, F.

In Human Vision and Electronic Imaging XII, pages: 1-12, (Editors: Rogowitz, B. E.), SPIE, Bellingham, WA, USA, SPIE Human Vision and Electronic Imaging Conference, February 2007 (inproceedings)

Abstract
The independent components of natural images are a set of linear filters which are optimized for statistical independence. With such a set of filters images can be represented without loss of information. Intriguingly, the filter shapes are localized, oriented, and bandpass, resembling important properties of V1 simple cell receptive fields. Here we address the question of whether the independent components of natural images are also perceptually less dependent than other image components. We compared the pixel basis, the ICA basis and the discrete cosine basis by asking subjects to interactively predict missing pixels (for the pixel basis) or to predict the coefficients of ICA and DCT basis functions in patches of natural images. Like Kersten (1987) we find the pixel basis to be perceptually highly redundant but perhaps surprisingly, the ICA basis showed significantly higher perceptual dependencies than the DCT basis. This shows a dissociation between statistical and perceptual dependence measures.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Statistical Consistency of Kernel Canonical Correlation Analysis

Fukumizu, K., Bach, F., Gretton, A.

Journal of Machine Learning Research, 8, pages: 361-383, February 2007 (article)

Abstract
While kernel canonical correlation analysis (CCA) has been applied in many contexts, the convergence of finite sample estimates of the associated functions to their population counterparts has not yet been established. This paper gives a mathematical proof of the statistical convergence of kernel CCA, providing a theoretical justification for the method. The proof uses covariance operators defined on reproducing kernel Hilbert spaces, and analyzes the convergence of their empirical estimates of finite rank to their population counterparts, which can have infinite rank. The result also gives a sufficient condition for convergence on the regularization coefficient involved in kernel CCA: this should decrease as n^{-1/3}, where n is the number of data.

ei

PDF [BibTex]

PDF [BibTex]


no image
Unsupervised learning of a steerable basis for invariant image representations

Bethge, M., Gerwinn, S., Macke, J.

In Human Vision and Electronic Imaging XII, pages: 1-12, (Editors: Rogowitz, B. E.), SPIE, Bellingham, WA, USA, SPIE Human Vision and Electronic Imaging Conference, February 2007 (inproceedings)

Abstract
There are two aspects to unsupervised learning of invariant representations of images: First, we can reduce the dimensionality of the representation by finding an optimal trade-off between temporal stability and informativeness. We show that the answer to this optimization problem is generally not unique so that there is still considerable freedom in choosing a suitable basis. Which of the many optimal representations should be selected? Here, we focus on this second aspect, and seek to find representations that are invariant under geometrical transformations occuring in sequences of natural images. We utilize ideas of steerability and Lie groups, which have been developed in the context of filter design. In particular, we show how an anti-symmetric version of canonical correlation analysis can be used to learn a full-rank image basis which is steerable with respect to rotations. We provide a geometric interpretation of this algorithm by showing that it finds the two-dimensional eigensubspaces of the avera ge bivector. For data which exhibits a variety of transformations, we develop a bivector clustering algorithm, which we use to learn a basis of generalized quadrature pairs (i.e. complex cells) from sequences of natural images.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Subspace Kernel for Nonlinear Feature Extraction

Wu, M., Farquhar, J.

In IJCAI-07, pages: 1125-1130, (Editors: Veloso, M. M.), AAAI Press, Menlo Park, CA, USA, International Joint Conference on Artificial Intelligence, January 2007 (inproceedings)

Abstract
Kernel based nonlinear Feature Extraction (KFE) or dimensionality reduction is a widely used pre-processing step in pattern classification and data mining tasks. Given a positive definite kernel function, it is well known that the input data are implicitly mapped to a feature space with usually very high dimensionality. The goal of KFE is to find a low dimensional subspace of this feature space, which retains most of the information needed for classification or data analysis. In this paper, we propose a subspace kernel based on which the feature extraction problem is transformed to a kernel parameter learning problem. The key observation is that when projecting data into a low dimensional subspace of the feature space, the parameters that are used for describing this subspace can be regarded as the parameters of the kernel function between the projected data. Therefore current kernel parameter learning methods can be adapted to optimize this parameterized kernel function. Experimental results are provided to validate the effectiveness of the proposed approach.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graph kernels for disease outcome prediction from protein-protein interaction networks

Borgwardt, KM., Vishwanathan, SVN., Schraudolph, N., Kriegel, H-P.

In pages: 4-15, (Editors: Altman, R.B. A.K. Dunker, L. Hunter, T. Murray, T.E. Klein), World Scientific, Hackensack, NJ, USA, Pacific Symposium on Biocomputing (PSB), January 2007 (inproceedings)

Abstract
It is widely believed that comparing discrepancies in the protein-protein interaction (PPI) networks of individuals will become an important tool in understanding and preventing diseases. Currently PPI networks for individuals are not available, but gene expression data is becoming easier to obtain and allows us to represent individuals by a co-integrated gene expression/protein interaction network. Two major problems hamper the application of graph kernels – state-of-the-art methods for whole-graph comparison – to compare PPI networks. First, these methods do not scale to graphs of the size of a PPI network. Second, missing edges in these interaction networks are biologically relevant for detecting discrepancies, yet, these methods do not take this into account. In this article we present graph kernels for biological network comparison that are fast to compute and take into account missing interactions. We evaluate their practical performance on two datasets of co-integrated gene expression/PPI networks.

ei

PDF [BibTex]

PDF [BibTex]


no image
Some observations on the pedestal effect

Henning, G., Wichmann, F.

Journal of Vision, 7(1:3):1-15, January 2007 (article)

Abstract
The pedestal or dipper effect is the large improvement in the detectability of a sinusoidal grating observed when it is added to a masking or pedestal grating of the same spatial frequency, orientation, and phase. We measured the pedestal effect in both broadband and notched noiseVnoise from which a 1.5-octave band centered on the signal frequency had been removed. Although the pedestal effect persists in broadband noise, it almost disappears in the notched noise. Furthermore, the pedestal effect is substantial when either high- or low-pass masking noise is used. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies different from that of the signal and the pedestal. We speculate that the spatial-frequency components of the notched noise above and below the spatial frequency of the signal and the pedestal prevent ‘‘off-frequency looking,’’ that is, prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and the pedestal. Thus, the pedestal or dipper effect measured without notched noise appears not to be a characteristic of individual spatial-frequency-tuned channels.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]