Header logo is


2009


no image
Notes on Graph Cuts with Submodular Edge Weights

Jegelka, S., Bilmes, J.

In pages: 1-6, NIPS Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra (DISCML), December 2009 (inproceedings)

Abstract
Generalizing the cost in the standard min-cut problem to a submodular cost function immediately makes the problem harder. Not only do we prove NP hardness even for nonnegative submodular costs, but also show a lower bound of (|V |1/3) on the approximation factor for the (s, t) cut version of the problem. On the positive side, we propose and compare three approximation algorithms with an overall approximation factor of O(min{|V |,p|E| log |V |}) that appear to do well in practice.

ei

PDF Web [BibTex]

2009


PDF Web [BibTex]


no image
Guest editorial: special issue on structured prediction

Parker, C., Altun, Y., Tadepalli, P.

Machine Learning, 77(2-3):161-164, December 2009 (article)

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Structured prediction by joint kernel support estimation

Lampert, CH., Blaschko, MB.

Machine Learning, 77(2-3):249-269, December 2009 (article)

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Learning new basic Movements for Robotics

Kober, J., Peters, J.

In AMS 2009, pages: 105-112, (Editors: Dillmann, R. , J. Beyerer, C. Stiller, M. Zöllner, T. Gindele), Springer, Berlin, Germany, Autonome Mobile Systeme, December 2009 (inproceedings)

Abstract
Obtaining novel skills is one of the most important problems in robotics. Machine learning techniques may be a promising approach for automatic and autonomous acquisition of movement policies. However, this requires both an appropriate policy representation and suitable learning algorithms. Employing the most recent form of the dynamical systems motor primitives originally introduced by Ijspeert et al. [1], we show how both discrete and rhythmic tasks can be learned using a concerted approach of both imitation and reinforcement learning, and present our current best performing learning algorithms. Finally, we show that it is possible to include a start-up phase in rhythmic primitives. We apply our approach to two elementary movements, i.e., Ball-in-a-Cup and Ball-Paddling, which can be learned on a real Barrett WAM robot arm at a pace similar to human learning.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
From Motor Learning to Interaction Learning in Robots

Sigaud, O., Peters, J.

In Proceedings of 7ème Journées Nationales de la Recherche en Robotique, pages: 189-195, JNRR, November 2009 (inproceedings)

Abstract
The number of advanced robot systems has been increasing in recent years yielding a large variety of versatile designs with many degrees of freedom. These robots have the potential of being applicable in uncertain tasks outside well-structured industrial settings. However, the complexity of both systems and tasks is often beyond the reach of classical robot programming methods. As a result, a more autonomous solution for robot task acquisition is needed where robots adaptively adjust their behaviour to the encountered situations and required tasks. Learning approaches pose one of the most appealing ways to achieve this goal. However, while learning approaches are of high importance for robotics, we cannot simply use off-the-shelf methods from the machine learning community as these usually do not scale into the domains of robotics due to excessive computational cost as well as a lack of scalability. Instead, domain appropriate approaches are needed. We focus here on several core domains of robot learning. For accurate task execution, we need motor learning capabilities. For fast learning of the motor tasks, imitation learning offers the most promising approach. Self improvement requires reinforcement learning approaches that scale into the domain of complex robots. Finally, for efficient interaction of humans with robot systems, we will need a form of interaction learning. This contribution provides a general introduction to these issues and briefly presents the contributions of the related book chapters to the corresponding research topics.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
A note on ethical aspects of BCI

Haselager, P., Vlek, R., Hill, J., Nijboer, F.

Neural Networks, 22(9):1352-1357, November 2009 (article)

Abstract
This paper focuses on ethical aspects of BCI, as a research and a clinical tool, that are challenging for practitioners currently working in the field. Specifically, the difficulties involved in acquiring informed consent from locked-in patients are investigated, in combination with an analysis of the shared moral responsibility in BCI teams, and the complications encountered in establishing effective communication with media.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Model Learning with Local Gaussian Process Regression

Nguyen-Tuong, D., Seeger, M., Peters, J.

Advanced Robotics, 23(15):2015-2034, November 2009 (article)

Abstract
Precise models of robot inverse dynamics allow the design of significantly more accurate, energy-efficient and compliant robot control. However, in some cases the accuracy of rigid-body models does not suffice for sound control performance due to unmodeled nonlinearities arising from hydraulic cable dynamics, complex friction or actuator dynamics. In such cases, estimating the inverse dynamics model from measured data poses an interesting alternative. Nonparametric regression methods, such as Gaussian process regression (GPR) or locally weighted projection regression (LWPR), are not as restrictive as parametric models and, thus, offer a more flexible framework for approximating unknown nonlinearities. In this paper, we propose a local approximation to the standard GPR, called local GPR (LGP), for real-time model online learning by combining the strengths of both regression methods, i.e., the high accuracy of GPR and the fast speed of LWPR. The approach is shown to have competitive learning performance for hig h-dimensional data while being sufficiently fast for real-time learning. The effectiveness of LGP is exhibited by a comparison with the state-of-the-art regression techniques, such as GPR, LWPR and ν-support vector regression. The applicability of the proposed LGP method is demonstrated by real-time online learning of the inverse dynamics model for robot model-based control on a Barrett WAM robot arm.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval

Lampert, CH.

In ICCV 2009, pages: 987-994, IEEE Computer Society, Piscataway, NJ, USA, Twelfth IEEE International Conference on Computer Vision, October 2009 (inproceedings)

Abstract
We study the task of detecting the occurrence of objects in large image collections or in videos, a problem that combines aspects of content based image retrieval and object localization. While most previous approaches are either limited to special kinds of queries, or do not scale to large image sets, we propose a new method, efficient subimage retrieval (ESR), which is at the same time very flexible and very efficient. Relying on a two-layered branch-and-bound setup, ESR performs object-based image retrieval in sets of 100,000 or more images within seconds. An extensive evaluation on several datasets shows that ESR is not only very fast, but it also achieves detection accuracies that are on par with or superior to previously published methods for object-based image retrieval.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Inferring textual entailment with a probabilistically sound calculus

Harmeling, S.

Natural Language Engineering, 15(4):459-477, October 2009 (article)

Abstract
We introduce a system for textual entailment that is based on a probabilistic model of entailment. The model is defined using a calculus of transformations on dependency trees, which is characterized by the fact that derivations in that calculus preserve the truth only with a certain probability. The calculus is successfully evaluated on the datasets of the PASCAL Challenge on Recognizing Textual Entailment.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Modeling and Visualizing Uncertainty in Gene Expression Clusters using Dirichlet Process Mixtures

Rasmussen, CE., de la Cruz, BJ., Ghahramani, Z., Wild, DL.

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(4):615-628, October 2009 (article)

Abstract
Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture models provide a non-parametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model based clustering methods have been to short time series data. In this paper we present a case study of the application of non-parametric Bayesian clustering methods to the clustering of high-dimensional non-time series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a Dirichlet process mixture model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A new non-monotonic algorithm for PET image reconstruction

Sra, S., Kim, D., Dhillon, I., Schölkopf, B.

In IEEE - Nuclear Science Symposium Conference Record (NSS/MIC), 2009, pages: 2500-2502, (Editors: B Yu), IEEE, Piscataway, NJ, USA, IEEE Nuclear Science Symposium and Medical Imaging Conference, October 2009 (inproceedings)

Abstract
Maximizing some form of Poisson likelihood (either with or without penalization) is central to image reconstruction algorithms in emission tomography. In this paper we introduce NMML, a non-monotonic algorithm for maximum likelihood PET image reconstruction. NMML offers a simple and flexible procedure that also easily incorporates standard convex regular-ization for doing penalized likelihood estimation. A vast number image reconstruction algorithms have been developed for PET, and new ones continue to be designed. Among these, methods based on the expectation maximization (EM) and ordered-subsets (OS) framework seem to have enjoyed the greatest popularity. Our method NMML differs fundamentally from methods based on EM: i) it does not depend on the concept of optimization transfer (or surrogate functions); and ii) it is a rapidly converging nonmonotonic descent procedure. The greatest strengths of NMML, however, are its simplicity, efficiency, and scalability, which make it especially attractive for tomograph ic reconstruction. We provide a theoretical analysis NMML, and empirically observe it to outperform standard EM based methods, sometimes by orders of magnitude. NMML seamlessly allows integreation of penalties (regularizers) in the likelihood. This ability can prove to be crucial, especially because with the rapidly rising importance of combined PET/MR scanners, one will want to include more “prior” knowledge into the reconstruction.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Approximation Algorithms for Tensor Clustering

Jegelka, S., Sra, S., Banerjee, A.

In Algorithmic Learning Theory: 20th International Conference, pages: 368-383, (Editors: Gavalda, R. , G. Lugosi, T. Zeugmann, S. Zilles), Springer, Berlin, Germany, ALT, October 2009 (inproceedings)

Abstract
We present the first (to our knowledge) approximation algo- rithm for tensor clustering—a powerful generalization to basic 1D clustering. Tensors are increasingly common in modern applications dealing with complex heterogeneous data and clustering them is a fundamental tool for data analysis and pattern discovery. Akin to their 1D cousins, common tensor clustering formulations are NP-hard to optimize. But, unlike the 1D case no approximation algorithms seem to be known. We address this imbalance and build on recent co-clustering work to derive a tensor clustering algorithm with approximation guarantees, allowing metrics and divergences (e.g., Bregman) as objective functions. Therewith, we answer two open questions by Anagnostopoulos et al. (2008). Our analysis yields a constant approximation factor independent of data size; a worst-case example shows this factor to be tight for Euclidean co-clustering. However, empirically the approximation factor is observed to be conservative, so our method can also be used in practice.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Active learning using mean shift optimization for robot grasping

Kroemer, O., Detry, R., Piater, J., Peters, J.

In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2009), pages: 2610-2615, IEEE Service Center, Piscataway, NJ, USA, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2009 (inproceedings)

Abstract
When children learn to grasp a new object, they often know several possible grasping points from observing a parent‘s demonstration and subsequently learn better grasps by trial and error. From a machine learning point of view, this process is an active learning approach. In this paper, we present a new robot learning framework for reproducing this ability in robot grasping. For doing so, we chose a straightforward approach: first, the robot observes a few good grasps by demonstration and learns a value function for these grasps using Gaussian process regression. Subsequently, it chooses grasps which are optimal with respect to this value function using a mean-shift optimization approach, and tries them out on the real system. Upon every completed trial, the value function is updated, and in the following trials it is more likely to choose even better grasping points. This method exhibits fast learning due to the data-efficiency of Gaussian process regression framework and the fact th at t he mean-shift method provides maxima of this cost function. Experiments were repeatedly carried out successfully on a real robot system. After less than sixty trials, our system has adapted its grasping policy to consistently exhibit successful grasps.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Sparse online model learning for robot control with support vector regression

Nguyen-Tuong, D., Schölkopf, B., Peters, J.

In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2009), pages: 3121-3126, IEEE Service Center, Piscataway, NJ, USA, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2009 (inproceedings)

Abstract
The increasing complexity of modern robots makes it prohibitively hard to accurately model such systems as required by many applications. In such cases, machine learning methods offer a promising alternative for approximating such models using measured data. To date, high computational demands have largely restricted machine learning techniques to mostly offline applications. However, making the robots adaptive to changes in the dynamics and to cope with unexplored areas of the state space requires online learning. In this paper, we propose an approximation of the support vector regression (SVR) by sparsification based on the linear independency of training data. As a result, we obtain a method which is applicable in real-time online learning. It exhibits competitive learning accuracy when compared with standard regression techniques, such as nu-SVR, Gaussian process regression (GPR) and locally weighted projection regression (LWPR).

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Causality Discovery with Additive Disturbances: An Information-Theoretical Perspective

Zhang, K., Hyvärinen, A.

In Machine Learning and Knowledge Discovery in Databases, pages: 570-585, (Editors: Buntine, W. , M. Grobelnik, D. Mladenić, J. Shawe-Taylor ), Springer, Berlin, Germany, European Conference on Machine Learning and Knowledge Discovery in Databases: Part II (ECML PKDD '09), September 2009 (inproceedings)

Abstract
We consider causally sufficient acyclic causal models in which the relationship among the variables is nonlinear while disturbances have linear effects, and show that three principles, namely, the causal Markov condition (together with the independence between each disturbance and the corresponding parents), minimum disturbance entropy, and mutual independence of the disturbances, are equivalent. This motivates new and more efficient methods for some causal discovery problems. In particular, we propose to use multichannel blind deconvolution, an extension of independent component analysis, to do Granger causality analysis with instantaneous effects. This approach gives more accurate estimates of the parameters and can easily incorporate sparsity constraints. For additive disturbance-based nonlinear causal discovery, we first make use of the conditional independence relationships to obtain the equivalence class; undetermined causal directions are then found by nonlinear regression and pairwise independence tests. This avoids the brute-force search and greatly reduces the computational load.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Thermodynamic efficiency of information and heat flow

Allahverdyan, A., Janzing, D., Mahler, G.

Journal of Statistical Mechanics: Theory and Experiment, 2009(09):P09011, September 2009 (article)

Abstract
A basic task of information processing is information transfer (flow). P0 Here we study a pair of Brownian particles each coupled to a thermal bath at temperatures T1 and T2 . The information flow in such a system is defined via the time-shifted mutual information. The information flow nullifies at equilibrium, and its efficiency is defined as the ratio of the flow to the total entropy production in the system. For a stationary state the information flows from higher to lower temperatures, and its efficiency is bounded from above by (max[T1 , T2 ])/(|T1 − T2 |). This upper bound is imposed by the second law and it quantifies the thermodynamic cost for information flow in the present class of systems. It can be reached in the adiabatic situation, where the particles have widely different characteristic times. The efficiency of heat flow—defined as the heat flow over the total amount of dissipated heat—is limited from above by the same factor. There is a complementarity between heat and information flow: the set-up which is most efficient for the former is the least efficient for the latter and vice versa. The above bound for the efficiency can be (transiently) overcome in certain non-stationary situations, but the efficiency is still limited from above. We study yet another measure of information processing (transfer entropy) proposed in the literature. Though this measure does not require any thermodynamic cost, the information flow and transfer entropy are shown to be intimately related for stationary states.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Does Cognitive Science Need Kernels?

Jäkel, F., Schölkopf, B., Wichmann, F.

Trends in Cognitive Sciences, 13(9):381-388, September 2009 (article)

Abstract
Kernel methods are among the most successful tools in machine learning and are used in challenging data analysis problems in many disciplines. Here we provide examples where kernel methods have proven to be powerful tools for analyzing behavioral data, especially for identifying features in categorization experiments. We also demonstrate that kernel methods relate to perceptrons and exemplar models of categorization. Hence, we argue that kernel methods have neural and psychological plausibility, and theoretical results concerning their behavior are therefore potentially relevant for human category learning. In particular, we believe kernel methods have the potential to provide explanations ranging from the implementational via the algorithmic to the computational level.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Implicit Wiener Series Analysis of Epileptic Seizure Recordings

Barbero, A., Franz, M., Drongelen, W., Dorronsoro, J., Schölkopf, B., Grosse-Wentrup, M.

In EMBC 2009, pages: 5304-5307, (Editors: Y Kim and B He and G Worrell and X Pan), IEEE Service Center, Piscataway, NJ, USA, 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, September 2009 (inproceedings)

Abstract
Implicit Wiener series are a powerful tool to build Volterra representations of time series with any degree of nonlinearity. A natural question is then whether higher order representations yield more useful models. In this work we shall study this question for ECoG data channel relationships in epileptic seizure recordings, considering whether quadratic representations yield more accurate classifiers than linear ones. To do so we first show how to derive statistical information on the Volterra coefficient distribution and how to construct seizure classification patterns over that information. As our results illustrate, a quadratic model seems to provide no advantages over a linear one. Nevertheless, we shall also show that the interpretability of the implicit Wiener series provides insights into the inter-channel relationships of the recordings.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Incorporating Prior Knowledge on Class Probabilities into Local Similarity Measures for Intermodality Image Registration

Hofmann, M., Schölkopf, B., Bezrukov, I., Cahill, N.

In Proceedings of the MICCAI 2009 Workshop on Probabilistic Models for Medical Image Analysis , pages: 220-231, (Editors: W Wells and S Joshi and K Pohl), PMMIA, September 2009 (inproceedings)

Abstract
We present a methodology for incorporating prior knowledge on class probabilities into the registration process. By using knowledge from the imaging modality, pre-segmentations, and/or probabilistic atlases, we construct vectors of class probabilities for each image voxel. By defining new image similarity measures for distribution-valued images, we show how the class probability images can be nonrigidly registered in a variational framework. An experiment on nonrigid registration of MR and CT full-body scans illustrates that the proposed technique outperforms standard mutual information (MI) and normalized mutual information (NMI) based registration techniques when measured in terms of target registration error (TRE) of manually labeled fiducials.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Inference algorithms and learning theory for Bayesian sparse factor analysis

Rattray, M., Stegle, O., Sharp, K., Winn, J.

Journal of Physics: Conference Series , IW-SMI 2009, 197(1: International Workshop on Statistical-Mechanical Informatics 2009):1-10, (Editors: Inoue, M. , S. Ishii, Y. Kabashima, M. Okada), Institute of Physics, Bristol, UK, International Workshop on Statistical-Mechanical Informatics (IW-SMI), September 2009 (article)

Abstract
Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Finite-time output stabilization with second order sliding modes

Dinuzzo, F., Ferrara, A.

Automatica, 45(9):2169-2171, September 2009 (article)

Abstract
In this note, a class of discontinuous feedback laws that switch over branches of parabolas in the auxiliary state plane is analyzed. Conditions are provided under which controllers belonging to this class are second order sliding-mode algorithms: they ensure uniform global finite-time output stability for uncertain systems of relative degree two.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Markerless 3D Face Tracking (DAGM 2009)

Walder, C., Breidt, M., Bülthoff, H., Schölkopf, B., Curio, C.

In Pattern Recognition, Lecture Notes in Computer Science, Vol. 5748 , pages: 41-50, (Editors: J Denzler and G Notni and H Süsse), Springer, Berlin, Germany, 31st Symposium of the German Association for Pattern Recognition (DAGM), September 2009 (inproceedings)

Abstract
We present a novel algorithm for the markerless tracking of deforming surfaces such as faces. We acquire a sequence of 3D scans along with color images at 40Hz. The data is then represented by implicit surface and color functions, using a novel partition-of-unity type method of efficiently combining local regressors using nearest neighbor searches. Both these functions act on the 4D space of 3D plus time, and use temporal information to handle the noise in individual scans. After interactive registration of a template mesh to the first frame, it is then automatically deformed to track the scanned surface, using the variation of both shape and color as features in a dynamic energy minimization problem. Our prototype system yields high-quality animated 3D models in correspondence, at a rate of approximately twenty seconds per timestep. Tracking results for faces and other objects are presented.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Robot Learning

Peters, J., Morimoto, J., Tedrake, R., Roy, N.

IEEE Robotics and Automation Magazine, 16(3):19-20, September 2009 (article)

Abstract
Creating autonomous robots that can learn to act in unpredictable environments has been a long-standing goal of robotics, artificial intelligence, and the cognitive sciences. In contrast, current commercially available industrial and service robots mostly execute fixed tasks and exhibit little adaptability. To bridge this gap, machine learning offers a myriad set of methods, some of which have already been applied with great success to robotics problems. As a result, there is an increasing interest in machine learning and statistics within the robotics community. At the same time, there has been a growth in the learning community in using robots as motivating applications for new algorithms and formalisms. Considerable evidence of this exists in the use of learning in high-profile competitions such as RoboCup and the Defense Advanced Research Projects Agency (DARPA) challenges, and the growing number of research programs funded by governments around the world.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Object Localization with Global and Local Context Kernels

Blaschko, M., Lampert, C.

In British Machine Vision Conference 2009, pages: 1-11, BMVC, September 2009 (inproceedings)

Abstract
Recent research has shown that the use of contextual cues significantly improves performance in sliding window type localization systems. In this work, we propose a method that incorporates both global and local context information through appropriately defined kernel functions. In particular, we make use of a weighted combination of kernels defined over local spatial regions, as well as a global context kernel. The relative importance of the context contributions is learned automatically, and the resulting discriminant function is of a form such that localization at test time can be solved efficiently using a branch and bound optimization scheme. By specifying context directly with a kernel learning approach, we achieve high localization accuracy with a simple and efficient representation. This is in contrast to other systems that incorporate context for which expensive inference needs to be done at test time. We show experimentally on the PASCAL VOC datasets that the inclusion of context can significantly improve localization performance, provided the relative contributions of context cues are learned appropriately.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Efficient Sample Reuse in EM-Based Policy Search

Hachiya, H., Peters, J., Sugiyama, M.

In 16th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages: 469-484, (Editors: Buntine, W. , M. Grobelnik, D. Mladenic, J. Shawe-Taylor), Springer, Berlin, Germany, ECML PKDD, September 2009 (inproceedings)

Abstract
Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend a EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse, is demonstrated through a robot learning experiment.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Active Structured Learning for High-Speed Object Detection

Lampert, C., Peters, J.

In DAGM 2009, pages: 221-231, (Editors: Denzler, J. , G. Notni, H. Süsse), Springer, Berlin, Germany, 31st Annual Symposium of the German Association for Pattern Recognition, September 2009 (inproceedings)

Abstract
High-speed smooth and accurate visual tracking of objects in arbitrary, unstructured environments is essential for robotics and human motion analysis. However, building a system that can adapt to arbitrary objects and a wide range of lighting conditions is a challenging problem, especially if hard real-time constraints apply like in robotics scenarios. In this work, we introduce a method for learning a discriminative object tracking system based on the recent structured regression framework for object localization. Using a kernel function that allows fast evaluation on the GPU, the resulting system can process video streams at speed of 100 frames per second or more. Consecutive frames in high speed video sequences are typically very redundant, and for training an object detection system, it is sufficient to have training labels from only a subset of all images. We propose an active learning method that select training examples in a data-driven way, thereby minimizing the required number of training labeling. Experiments on realistic data show that the active learning is superior to previously used methods for dataset subsampling for this task.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Higher order sliding mode controllers with optimal reaching

Dinuzzo, F., Ferrara, A.

IEEE Transactions on Automatic Control, 54(9):2126-2136, September 2009 (article)

Abstract
Higher order sliding mode (HOSM) control design is considered for systems with a known permanent relative degree. In this paper, we introduce the robust Fuller's problem that is a robust generalization of the Fuller's problem, a standard optimal control problem for a chain of integrators with bounded control. By solving the robust Fuller's problem it is possible to obtain feedback laws that are HOSM algorithms of generic order and, in addition, provide optimal finite-time reaching of the sliding manifold. A common difficulty in the use of existing HOSM algorithms is the tuning of design parameters: our methodology proves useful for the tuning of HOSM controller parameters in order to assure desired performances and prevent instabilities. The convergence and stability properties of the proposed family of controllers are theoretically analyzed. Simulation evidence demonstrates their effectiveness.

ei

DOI [BibTex]

DOI [BibTex]


no image
Kernel Methods in Computer Vision

Lampert, CH.

Foundations and Trends in Computer Graphics and Vision, 4(3):193-285, September 2009 (article)

Abstract
Over the last years, kernel methods have established themselves as powerful tools for computer vision researchers as well as for practitioners. In this tutorial, we give an introduction to kernel methods in computer vision from a geometric perspective, introducing not only the ubiquitous support vector machines, but also less known techniques for regression, dimensionality reduction, outlier detection and clustering. Additionally, we give an outlook on very recent, non-classical techniques for the prediction of structure data, for the estimation of statistical dependency and for learning the kernel function itself. All methods are illustrated with examples of successful application from the recent computer vision research literature.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Generalized Clustering via Kernel Embeddings

Jegelka, S., Gretton, A., Schölkopf, B., Sriperumbudur, B., von Luxburg, U.

In KI 2009: AI and Automation, Lecture Notes in Computer Science, Vol. 5803, pages: 144-152, (Editors: B Mertsching and M Hund and Z Aziz), Springer, Berlin, Germany, 32nd Annual Conference on Artificial Intelligence (KI), September 2009 (inproceedings)

Abstract
We generalize traditional goals of clustering towards distinguishing components in a non-parametric mixture model. The clusters are not necessarily based on point locations, but on higher order criteria. This framework can be implemented by embedding probability distributions in a Hilbert space. The corresponding clustering objective is very general and relates to a range of common clustering concepts.

ei

PDF PDF Web DOI [BibTex]

PDF PDF Web DOI [BibTex]


no image
Discovering Temporal Patterns of Differential Gene Expression in Microarray Time Series

Stegle, O., Denby, KJ., Wild, DL., McHattie, S., Mead, A., Ghahramani, Z., Borgwardt, KM.

In Proceedings of the German Conference on Bioinformatics 2009 (GCB 2009), pages: 133-142, (Editors: Grosse, I. , S. Neumann, S. Posch, F. Schreiber, P. F. Stadler), Gesellschaft für Informatik, Bonn, Germany, German Conference on Bioinformatics (GCB '09), September 2009 (inproceedings)

Abstract
A wealth of time series of microarray measurements have become available over recent years. Several two-sample tests for detecting differential gene expression in these time series have been defined, but they can only answer the question whether a gene is differentially expressed across the whole time series, not in which intervals it is differentially expressed. In this article, we propose a Gaussian process based approach for studying these dynamics of differential gene expression. In experiments on Arabidopsis thaliana gene expression levels, our novel technique helps us to uncover that the family of WRKY transcription factors appears to be involved in the early response to infection by a fungal pathogen.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
A Novel Approach to the Selection of Spatially Invariant Features for the Classification of Hyperspectral Images with Improved Generalization Capability

Bruzzone, L., Persello, C.

IEEE Transactions on Geoscience and Remote Sensing, 47(9):3180-3191, September 2009 (article)

Abstract
This paper presents a novel approach to feature selection for the classification of hyperspectral images. The proposed approach aims at selecting a subset of the original set of features that exhibits at the same time high capability to discriminate among the considered classes and high invariance in the spatial domain of the investigated scene. This approach results in a more robust classification system with improved generalization properties with respect to standard feature-selection methods. The feature selection is accomplished by defining a multiobjective criterion function made up of two terms: (1) a term that measures the class separability and (2) a term that evaluates the spatial invariance of the selected features. In order to assess the spatial invariance of the feature subset, we propose both a supervised method (which assumes that training samples acquired in two or more spatially disjoint areas are available) and a semisupervised method (which requires only a standard training set acquired in a single area of the scene and takes advantage of unlabeled samples selected in portions of the scene spatially disjoint from the training set). The choice for the supervised or semisupervised method depends on the available reference data. The multiobjective problem is solved by an evolutionary algorithm that estimates the set of Pareto-optimal solutions. Experiments carried out on a hyperspectral image acquired by the Hyperion sensor on a complex area confirmed the effectiveness of the proposed approach.

ei

Web DOI [BibTex]


no image
Fast Kernel-Based Independent Component Analysis

Shen, H., Jegelka, S., Gretton, A.

IEEE Transactions on Signal Processing, 57(9):3498-3511, September 2009 (article)

Abstract
Recent approaches to independent component analysis (ICA) have used kernel independence measures to obtain highly accurate solutions, particularly where classical methods experience difficulty (for instance, sources with near-zero kurtosis). FastKICA (fast HSIC-based kernel ICA) is a new optimization method for one such kernel independence measure, the Hilbert-Schmidt Independence Criterion (HSIC). The high computational efficiency of this approach is achieved by combining geometric optimization techniques, specifically an approximate Newton-like method on the orthogonal group, with accurate estimates of the gradient and Hessian based on an incomplete Cholesky decomposition. In contrast to other efficient kernel-based ICA algorithms, FastKICA is applicable to any twice differentiable kernel function. Experimental results for problems with large numbers of sources and observations indicate that FastKICA provides more accurate solutions at a given cost than gradient descent on HSIC. Comparing with other recently published ICA methods, FastKICA is competitive in terms of accuracy, relatively insensitive to local minima when initialized far from independence, and more robust towards outliers. An analysis of the local convergence properties of FastKICA is provided.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Qualia: The Geometry of Integrated Information

Balduzzi, D., Tononi, G.

PLoS Computational Biology, 5(8):1-24, August 2009 (article)

Abstract
According to the integrated information theory, the quantity of consciousness is the amount of integrated information generated by a complex of elements, and the quality of experience is specified by the informational relationships it generates. This paper outlines a framework for characterizing the informational relationships generated by such systems. Qualia space (Q) is a space having an axis for each possible state (activity pattern) of a complex. Within Q, each submechanism specifies a point corresponding to a repertoire of system states. Arrows between repertoires in Q define informational relationships. Together, these arrows specify a quale—a shape that completely and univocally characterizes the quality of a conscious experience. Φ— the height of this shape—is the quantity of consciousness associated with the experience. Entanglement measures how irreducible informational relationships are to their component relationships, specifying concepts and modes. Several corollaries follow from these premises. The quale is determined by both the mechanism and state of the system. Thus, two different systems having identical activity patterns may generate different qualia. Conversely, the same quale may be generated by two systems that differ in both activity and connectivity. Both active and inactive elements specify a quale, but elements that are inactivated do not. Also, the activation of an element affects experience by changing the shape of the quale. The subdivision of experience into modalities and submodalities corresponds to subshapes in Q. In principle, different aspects of experience may be classified as different shapes in Q, and the similarity between experiences reduces to similarities between shapes. Finally, specific qualities, such as the “redness” of red, while generated by a local mechanism, cannot be reduced to it, but require considering the entire quale. Ultimately, the present framework may offer a principled way for translating qualitative properties of experience into mathematics.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Guest editorial: Special issue on robot learning, Part B

Peters, J., Ng, A.

Autonomous Robots, 27(2):91-92, August 2009 (article)

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Policy Search for Motor Primitives

Peters, J., Kober, J.

KI - Zeitschrift K{\"u}nstliche Intelligenz, 23(3):38-40, August 2009 (article)

Abstract
Many motor skills in humanoid robotics can be learned using parametrized motor primitives from demonstrations. However, most interesting motor learning problems require self-improvement often beyond the reach of current reinforcement learning methods due to the high dimensionality of the state-space. We develop an EM-inspired algorithm applicable to complex motor learning tasks. We compare this algorithm to several well-known parametrized policy search methods and show that it outperforms them. We apply it to motor learning problems and show that it can learn the complex Ball-in-a-Cup task using a real Barrett WAM robot arm.

ei

Web [BibTex]

Web [BibTex]


no image
Control Design Based on Analytical Stability Criteria for Optimized Kinesthetic Perception in Scaled Teleoperation

Son, HI., Bhattacharjee, T., Lee, DY.

In ICCAS-SICE International Joint Conference, pages: 3365-3370, IEEE, Piscataway, NJ, USA, ICCAS-SICE International Joint Conference, August 2009 (inproceedings)

Abstract
This paper considers kinesthetic perception as the main performance objective for a scaled teleoperation system, and devises a scheme to optimize it with constraints of position tracking and absolute stability. Analytical criteria for monitoring stability have been derived for position-position, force-position, and four-channel control architectures using Llewellyn's absolute stability criteria. This helps to reduce the optimization complexity and provides an easy and effective design guideline for selecting control gains amongst the range. Optimization results indicate that trade-offs exist among different control architectures. This paper provides guidelines based on application-dependent selection of control scheme.

ei

Web [BibTex]

Web [BibTex]


no image
A neurophysiologically plausible population code model for human contrast discrimination

Goris, R., Wichmann, F., Henning, G.

Journal of Vision, 9(7):1-22, July 2009 (article)

Abstract
The pedestal effect is the improvement in the detectability of a sinusoidal grating in the presence of another grating of the same orientation, spatial frequency, and phase—usually called the pedestal. Recent evidence has demonstrated that the pedestal effect is differently modified by spectrally flat and notch-filtered noise: The pedestal effect is reduced in flat noise but virtually disappears in the presence of notched noise (G. B. Henning & F. A. Wichmann, 2007). Here we consider a network consisting of units whose contrast response functions resemble those of the cortical cells believed to underlie human pattern vision and demonstrate that, when the outputs of multiple units are combined by simple weighted summation—a heuristic decision rule that resembles optimal information combination and produces a contrast-dependent weighting profile—the network produces contrast-discrimination data consistent with psychophysical observations: The pedestal effect is present without noise, reduced in broadband noise, but almost disappears in notched noise. These findings follow naturally from the normalization model of simple cells in primary visual cortex, followed by response-based pooling, and suggest that in processing even low-contrast sinusoidal gratings, the visual system may combine information across neurons tuned to different spatial frequencies and orientations.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Novel Context-Sensitive Semisupervised SVM Classifier Robust to Mislabeled Training Samples

Bruzzone, L., Persello, C.

IEEE Transactions on Geoscience and Remote Sensing, 47(7):2142-2154, July 2009 (article)

Abstract
This paper presents a novel context-sensitive semisupervised support vector machine (CS4VM) classifier, which is aimed at addressing classification problems where the available training set is not fully reliable, i.e., some labeled samples may be associated to the wrong information class (mislabeled patterns). Unlike standard context-sensitive methods, the proposed CS4VM classifier exploits the contextual information of the pixels belonging to the neighborhood system of each training sample in the learning phase to improve the robustness to possible mislabeled training patterns. This is achieved according to both the design of a semisupervised procedure and the definition of a novel contextual term in the cost function associated with the learning of the classifier. In order to assess the effectiveness of the proposed CS4VM and to understand the impact of the addressed problem in real applications, we also present an extensive experimental analysis carried out on training sets that include different percentages of mislabeled patterns having different distributions on the classes. In the analysis, we also study the robustness to mislabeled training patterns of some widely used supervised and semisupervised classification algorithms (i.e., conventional support vector machine (SVM), progressive semisupervised SVM, maximum likelihood, and k-nearest neighbor). Results obtained on a very high resolution image and on a medium resolution image confirm both the robustness and the effectiveness of the proposed CS4VM with respect to standard classification algorithms and allow us to derive interesting conclusions on the effects of mislabeled patterns on different classifiers.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Falsificationism and Statistical Learning Theory: Comparing the Popper and Vapnik-Chervonenkis Dimensions

Corfield, D., Schölkopf, B., Vapnik, V.

Journal for General Philosophy of Science, 40(1):51-58, July 2009 (article)

Abstract
We compare Karl Popper’s ideas concerning the falsifiability of a theory with similar notions from the part of statistical learning theory known as VC-theory. Popper’s notion of the dimension of a theory is contrasted with the apparently very similar VC-dimension. Having located some divergences, we discuss how best to view Popper’s work from the perspective of statistical learning theory, either as a precursor or as aiming to capture a different learning activity.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Active learning for classification of remote sensing images

Bruzzone, L., Persello, C.

In pages: III-693-III-696 , IEEE, Piscataway, NJ, USA, IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2009 (inproceedings)

Abstract
This paper presents an analysis of active learning techniques for the classification of remote sensing images and proposes a novel active learning method based on support vector machines (SVMs). The proposed method exploits a query function for the inclusion of batches of unlabeled samples in the training set, which is based on the evaluation of two criteria: uncertainty and diversity. This query function adopts a stochastic approach to the selection of unlabeled samples, which is based on a function of uncertainty estimated from the distribution of errors on the validation set (which is assumed available for the model selection of the SVM classifier). Experimental results carried out on a very high resolution image confirm the effectiveness of the proposed active learning technique, which results more accurate than standard methods.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A novel approach to the selection of spatially invariant features for classification of hyperspectral images

Persello, C., Bruzzone, L.

In pages: II-61-II-64 , IEEE, Piscataway, NJ, USA, IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2009 (inproceedings)

Abstract
This paper presents a novel approach to feature selection for the classification of hyperspectral images. The proposed approach aims at selecting a subset of the original set of features that exhibits two main properties: i) high capability to discriminate among the considered classes, ii) high invariance in the spatial domain of the investigated scene. This approach results in a more robust classification system with improved generalization properties with respect to standard feature-selection methods. The feature selection is accomplished by defining a multi-objective criterion function made up of two terms: i) a term that measures the class separability, ii) a term that evaluates the spatial invariance of the selected features. In order to assess the spatial invariance of the feature subset we propose both a supervised method and a semisupervised method (which choice depends on the available reference data). The multi-objective problem is solved by an evolutionary algorithm that estimates the set of Pareto-optimal solutions. Experiments carried out on a hyperspectral image acquired by the Hyperion sensor on a complex area confirmed the effectiveness of the proposed approach.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Guest editorial: Special issue on robot learning, Part A

Peters, J., Ng, A.

Autonomous Robots, 27(1):1-2, July 2009 (article)

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
A Geometric Approach to Confidence Sets for Ratios: Fieller’s Theorem, Generalizations, and Bootstrap

von Luxburg, U., Franz, V.

Statistica Sinica, 19(3):1095-1117, July 2009 (article)

Abstract
We present a geometric method to determine confidence sets for the ratio E(Y)/E(X) of the means of random variables X and Y. This method reduces the problem of constructing confidence sets for the ratio of two random variables to the problem of constructing confidence sets for the means of one-dimensional random variables. It is valid in a large variety of circumstances. In the case of normally distributed random variables, the so constructed confidence sets coincide with the standard Fieller confidence sets. Generalizations of our construction lead to definitions of exact and conservative confidence sets for very general classes of distributions, provided the joint expectation of (X,Y) exists and the linear combinations of the form aX + bY are well-behaved. Finally, our geometric method allows to derive a very simple bootstrap approach for constructing conservative confidence sets for ratios which perform favorably in certain situations, in particular in the asymmetric heavy-tailed regime.

ei

PDF PDF Web [BibTex]


no image
Varieties of Justification in Machine Learning

Corfield, D.

In Proceedings of Multiplicity and Unification in Statistics and Probability, pages: 1-10, Multiplicity and Unification in Statistics and Probability, June 2009 (inproceedings)

Abstract
The field of machine learning has flourished over the past couple of decades. With huge amounts of data available, efficient algorithms can learn to extrapolate from their training sets to become very accurate classifiers. For example, it is straightforward now to develop classifiers which achieve accuracies of around 99% on databases of handwritten digits. Now these algorithms have been devised by theorists who arrive at the problem of machine learning with a range of different philosophical outlooks on the subject of inductive reasoning. This has led to a wide range of theoretical rationales for their work. In this talk I shall classify the different forms of justification for inductive machine learning into four kinds, and make some comparisons between them. With little by way of theoretical knowledge to aid in the learning tasks, while the relevance of these justificatory approaches for the inductive reasoning of the natural sciences is questionable, certain issues surrounding the presuppositions of inductive reasoning are brought sharply into focus. In particular, Frequentist, Bayesian and MDL outlooks can be compared.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Effects of Stimulus Type and of Error-Correcting Code Design on BCI Speller Performance

Hill, J., Farquhar, J., Martens, S., Biessmann, F., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 665-672, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
From an information-theoretic perspective, a noisy transmission system such as a visual Brain-Computer Interface (BCI) speller could benefit from the use of errorcorrecting codes. However, optimizing the code solely according to the maximal minimum-Hamming-distance criterion tends to lead to an overall increase in target frequency of target stimuli, and hence a significantly reduced average target-to-target interval (TTI), leading to difficulties in classifying the individual event-related potentials (ERPs) due to overlap and refractory effects. Clearly any change to the stimulus setup must also respect the possible psychophysiological consequences. Here we report new EEG data from experiments in which we explore stimulus types and codebooks in a within-subject design, finding an interaction between the two factors. Our data demonstrate that the traditional, rowcolumn code has particular spatial properties that lead to better performance than one would expect from its TTIs and Hamming-distances alone, but nonetheless error-correcting codes can improve performance provided the right stimulus type is used.

ei

PDF PDF Web [BibTex]

PDF PDF Web [BibTex]


no image
Influence of graph construction on graph-based clustering measures

Maier, M., von Luxburg, U., Hein, M.

In Advances in neural information processing systems 21, pages: 1025-1032, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Local Gaussian Process Regression for Real Time Online Model Learning and Control

Nguyen-Tuong, D., Seeger, M., Peters, J.

In Advances in neural information processing systems 21, pages: 1193-1200, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Learning in real-time applications, e.g., online approximation of the inverse dynamics model for model-based robot control, requires fast online regression techniques. Inspired by local learning, we propose a method to speed up standard Gaussian Process regression (GPR) with local GP models (LGP). The training data is partitioned in local regions, for each an individual GP model is trained. The prediction for a query point is performed by weighted estimation using nearby local models. Unlike other GP approximations, such as mixtures of experts, we use a distance based measure for partitioning of the data and weighted prediction. The proposed method achieves online learning and prediction in real-time. Comparisons with other nonparametric regression methods show that LGP has higher accuracy than LWPR and close to the performance of standard GPR and nu-SVR.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Detecting the Direction of Causal Time Series

Peters, J., Janzing, D., Gretton, A., Schölkopf, B.

In Proceedings of the 26th International Conference on Machine Learning, pages: 801-808, (Editors: A Danyluk and L Bottou and ML Littman), ACM Press, New York, NY, USA, ICML, June 2009 (inproceedings)

Abstract
We propose a method that detects the true direction of time series, by fitting an autoregressive moving average model to the data. Whenever the noise is independent of the previous samples for one ordering of the observations, but dependent for the opposite ordering, we infer the former direction to be the true one. We prove that our method works in the population case as long as the noise of the process is not normally distributed (for the latter case, the direction is not identificable). A new and important implication of our result is that it confirms a fundamental conjecture in causal reasoning - if after regression the noise is independent of signal for one direction and dependent for the other, then the former represents the true causal direction - in the case of time series. We test our approach on two types of data: simulated data sets conforming to our modeling assumptions, and real world EEG time series. Our method makes a decision for a significant fraction of both data sets, and these decisions are mostly correct. For real world data, our approach outperforms alternative solutions to the problem of time direction recovery.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer

Lampert, C., Nickisch, H., Harmeling, S.

In CVPR 2009, pages: 951-958, IEEE Service Center, Piscataway, NJ, USA, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009 (inproceedings)

Abstract
We study the problem of object classification when training and test classes are disjoint, i.e. no training examples of the target classes are available. This setup has hardly been studied in computer vision research, but it is the rule rather than the exception, because the world contains tens of thousands of different object classes and for only a very few of them image, collections have been formed and annotated with suitable class labels. In this paper, we tackle the problem by introducing attribute-based classification. It performs object detection based on a human-specified high-level description of the target objects instead of training images. The description consists of arbitrary semantic attributes, like shape, color or even geographic information. Because such properties transcend the specific learning task at hand, they can be pre-learned, e.g. from image datasets unrelated to the current task. Afterwards, new classes can be detected based on their attribute representation, without the need for a new training phase. In order to evaluate our method and to facilitate research in this area, we have assembled a new large-scale dataset, ldquoAnimals with Attributesrdquo, of over 30,000 animal images that match the 50 classes in Osherson's classic table of how strongly humans associate 85 semantic attributes with animal classes. Our experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Learning Taxonomies by Dependence Maximization

Blaschko, M., Gretton, A.

In Advances in neural information processing systems 21, pages: 153-160, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data.

ei

PDF Web [BibTex]

PDF Web [BibTex]