In Advances in Neural Information Processing Systems 25, pages: 189-196, (Editors: P Bartlett, FCN Pereira, CJC. Burges, L Bottou, and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)
In Advances in Neural Information Processing Systems 25, pages: 10-18, (Editors: P Bartlett, FCN Pereira, CJC. Burges, L Bottou, and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)
Del Favero, S., Varagnolo, D., Dinuzzo, F., Schenato, L., Pillonetto, G.
In pages: 3210-3215, IEEE, Piscataway, NJ, USA, 50th IEEE Conference on Decision and Control and European Control Conference (CDC - ECC), December 2011 (inproceedings)
We analyze the problem of data sets reduction for support vector classification. The work is also motivated by distributed problems, where sensors collect binary measurements at different locations moving inside an environment that needs to be divided into a collection of regions labeled in two different ways. The scope is to let each agent retain and exchange only those measurements that are mostly informative for the collective reconstruction of the decision boundary. For the case of separable classes, we provide the exact conditions and an efficient algorithm to determine if an element in the training set can become a support vector when new data arrive. The analysis is then extended to the non-separable case deriving a sufficient discardability condition and a general data selection
scheme for classification. Numerical experiments relative to the distributed problem show that the proposed procedure allows the agents to exchange a small amount of the collected data to obtain a highly predictive decision boundary.
In JMLR Workshop and Conference Proceedings Volume 20, pages: 181-196, (Editors: Hsu, C.-N. , W.S. Lee), JMLR, Cambridge, MA, USA, 3rd Asian Conference on Machine Learning (ACML) , November 2011 (inproceedings)
Output kernel learning techniques allow to simultaneously learn a vector-valued function and a positive semidefinite matrix which describes the relationships between the outputs. In this paper, we introduce a new formulation that imposes a low-rank constraint on the output kernel and operates directly on a factor of the kernel matrix. First, we investigate the connection between output kernel learning and a regularization problem for an architecture
with two layers. Then, we show that a variety of methods such as nuclear norm regularized regression, reduced-rank regression, principal component analysis, and low rank matrix approximation can be seen as special cases of the output kernel learning framework. Finally, we introduce a block coordinate descent strategy for learning low-rank output kernels.
IEEE Transactions on Neural Networks, 22(10):1576-1587, October 2011 (article)
In this paper, we analyze the convergence of two general classes of optimization algorithms for regularized kernel methods with convex loss function and quadratic norm regularization. The first methodology is a new class of algorithms based on fixed-point iterations that are well-suited for a parallel implementation and can be used with any convex loss function. The second methodology is based on coordinate descent, and generalizes some techniques previously proposed for linear support vector machines. It exploits the structure of additively separable loss functions to compute solutions of line searches in closed form. The two methodologies are both very easy to implement. In this paper, we also show how to remove non-differentiability of the objective functional by exactly reformulating a convex regularization problem as an unconstrained differentiable stabilization problem.
In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages: 49-56, ICML ’11, (Editors: Getoor, Lise and Scheffer, Tobias), ACM, New York, NY, USA, ICML, June 2011 (inproceedings)
IEEE Transactions on Neural Networks, 22(2):290-303, February 2011 (article)
A client-server architecture to simultaneously solve multiple learning tasks from distributed datasets is described. In such architecture, each client corresponds to an individual learning task and the associated dataset of examples. The goal of the architecture is to perform information fusion from multiple datasets while preserving privacy of individual data. The role of the server is to collect data in real time from the clients and codify the information in a common database. Such information can be used by all the clients to solve their individual learning task, so that each client can exploit the information content of all the datasets without actually having access to private data of others. The proposed algorithmic framework, based on regularization and kernel methods, uses a suitable class of “mixed effect” kernels. The methodology is illustrated through a simulated recommendation system, as well as an experiment involving pharmacological data coming from a multicentric clinical trial.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2):193-205, February 2010 (article)
Standard single-task kernel methods have recently been extended to the case of multitask learning in the context of regularization theory. There are experimental results, especially in biomedicine, showing the benefit of the multitask approach compared to the single-task one. However, a possible drawback is computational complexity. For instance, when regularization networks are used, complexity scales as the cube of the overall number of training data, which may be large when several tasks are involved. The aim of this paper is to derive an efficient computational scheme for an important class of multitask kernels. More precisely, a quadratic loss is assumed and each task consists of the sum of a common term and a task-specific one. Within a Bayesian setting, a recursive online algorithm is obtained, which updates both estimates and confidence intervals as new data become available. The algorithm is tested on two simulated problems and a real data set relative to xenobiotics administration in human patients.
In this paper, we review some recent research directions regarding the synthesis of functions from data using kernel methods. We start by highlighting the central role of the representer theorem and then outline some recent advances in large scale optimization, learning the kernel, and multi-task learning.
In pages: 6715-6719, IEEE, Piscataway, NJ, USA, 48th IEEE Conference on Decision and Control (CDC), December 2009 (inproceedings)
It is presented a discontinuous controller that ensure uniform finite-time zero stabilization of the output for uncertain SISO systems of relative degree two, while keeping the auxiliary system state within a prescribed convex polygon. The proposed method extends applicability of second order sliding modes controllers to the case of uncertain dynamical systems with constraints.
Automatica, 45(9):2169-2171, September 2009 (article)
In this note, a class of discontinuous feedback laws that switch over branches of parabolas in the auxiliary state plane is analyzed. Conditions are provided under which controllers belonging to this class are second order sliding-mode algorithms: they ensure uniform global finite-time output stability for uncertain systems of relative degree two.
IEEE Transactions on Automatic Control, 54(9):2126-2136, September 2009 (article)
Higher order sliding mode (HOSM) control design is considered for systems with a known permanent relative degree. In this paper, we introduce the robust Fuller's problem that is a robust generalization of the Fuller's problem, a standard optimal control problem for a chain of integrators with bounded control. By solving the robust Fuller's problem it is possible to obtain feedback laws that are HOSM algorithms of generic order and, in addition, provide optimal finite-time reaching of the sliding manifold. A common difficulty in the use of existing HOSM algorithms is the tuning of design parameters: our methodology proves useful for the tuning of HOSM controller parameters in order to assure desired performances and prevent instabilities. The convergence and stability properties of the proposed family of controllers are theoretically analyzed. Simulation evidence demonstrates their effectiveness.
Machine Learning, 74(3):315-345, March 2009 (article)
The representer theorem for kernel methods states that the solution of the associated variational problem can be expressed as the linear combination of a finite number of kernel functions. However, for non-smooth loss functions, the analytic characterization of the coefficients poses nontrivial problems. Standard approaches resort to constrained optimization reformulations which, in general, lack a closed-form solution. Herein, by a proper change of variable, it is shown that, for any convex loss function, the coefficients satisfy a system of algebraic equations in a fixed-point form, which may be directly obtained from the primal formulation. The algebraic characterization is specialized to regression and classification methods and the fixed-point equations are explicitly characterized for many loss functions of practical interest. The consequences of the main result are then investigated along two directions. First, the existence of an unconstrained smooth reformulation of the original non-smooth problem is proven. Second, in the context of SURE (Stein’s Unbiased Risk Estimation), a general formula for the degrees of freedom of kernel regression methods is derived.
In pages: 4517-4522, IEEE Service Center, Piscataway, NJ, USA, 2008 American Control Conference (ACC), June 2008 (inproceedings)
Recently, standard single-task kernel methods have been extended to the case of multi-task learning under the framework of regularization. Experimental results have shown that such an approach can perform much better than single-task techniques, especially when few examples per task are available. However, a possible drawback may be computational complexity. For instance, when using regularization networks, complexity scales as the cube of the overall number of data associated with all the tasks. In this paper, an efficient computational scheme is derived for a widely applied class of multi-task kernels. More precisely, a quadratic loss is assumed and the multi-task kernel is the sum of a common term and a task-specific one. The proposed algorithm performs online learning recursively updating the estimates as new data become available. The learning problem is formulated in a Bayesian setting. The optimal estimates are obtained by solving a sequence of subproblems which involve projection of random variables onto suitable subspaces. The algorithm is tested on a simulated data set.
Journal of Machine Learning Research, 8, pages: 2467-2495, October 2007 (article)
Support Vector Regression (SVR) for discrete data is considered. An alternative formulation of the representer theorem is derived. This result is based on the newly introduced notion of pseudoresidual and the use of subdifferential calculus. The representer theorem is exploited to analyze the sensitivity properties of ε-insensitive SVR and introduce the notion of approximate degrees of freedom. The degrees of freedom are shown to play a key role in the evaluation of the optimism, that is the difference between the expected in-sample error and the expected empirical risk. In this way, it is possible to define a Cp-like statistic that can be used for tuning the parameters of SVR. The proposed tuning procedure is tested on a simulated benchmark problem and on a real world problem (Boston Housing data set).
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems