Header logo is


2006


no image
Statistical Properties of Kernel Principal Component Analysis

Blanchard, G., Bousquet, O., Zwald, L.

Machine Learning, 66(2-3):259-294, March 2006 (article)

Abstract
We study the properties of the eigenvalues of Gram matrices in a non-asymptotic setting. Using local Rademacher averages, we provide data-dependent and tight bounds for their convergence towards eigenvalues of the corresponding kernel operator. We perform these computations in a functional analytic framework which allows to deal implicitly with reproducing kernel Hilbert spaces of infinite dimension. This can have applications to various kernel algorithms, such as Support Vector Machines (SVM). We focus on Kernel Principal Component Analysis (KPCA) and, using such techniques, we obtain sharp excess risk bounds for the reconstruction error. In these bounds, the dependence on the decay of the spectrum and on the closeness of successive eigenvalues is made explicit.

ei

PDF PDF DOI [BibTex]

2006


PDF PDF DOI [BibTex]


no image
Network-based de-noising improves prediction from microarray data

Kato, T., Murata, Y., Miura, K., Asai, K., Horton, P., Tsuda, K., Fujibuchi, W.

BMC Bioinformatics, 7(Suppl. 1):S4-S4, March 2006 (article)

Abstract
Prediction of human cell response to anti-cancer drugs (compounds) from microarray data is a challenging problem, due to the noise properties of microarrays as well as the high variance of living cell responses to drugs. Hence there is a strong need for more practical and robust methods than standard methods for real-value prediction. We devised an extended version of the off-subspace noise-reduction (de-noising) method to incorporate heterogeneous network data such as sequence similarity or protein-protein interactions into a single framework. Using that method, we first de-noise the gene expression data for training and test data and also the drug-response data for training data. Then we predict the unknown responses of each drug from the de-noised input data. For ascertaining whether de-noising improves prediction or not, we carry out 12-fold cross-validation for assessment of the prediction performance. We use the Pearson‘s correlation coefficient between the true and predicted respon se values as the prediction performance. De-noising improves the prediction performance for 65% of drugs. Furthermore, we found that this noise reduction method is robust and effective even when a large amount of artificial noise is added to the input data. We found that our extended off-subspace noise-reduction method combining heterogeneous biological data is successful and quite useful to improve prediction of human cell cancer drug responses from microarray data.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Model-based Design Analysis and Yield Optimization

Pfingsten, T., Herrmann, D., Rasmussen, C.

IEEE Transactions on Semiconductor Manufacturing, 19(4):475-486, February 2006 (article)

Abstract
Fluctuations are inherent to any fabrication process. Integrated circuits and micro-electro-mechanical systems are particularly affected by these variations, and due to high quality requirements the effect on the devices’ performance has to be understood quantitatively. In recent years it has become possible to model the performance of such complex systems on the basis of design specifications, and model-based Sensitivity Analysis has made its way into industrial engineering. We show how an efficient Bayesian approach, using a Gaussian process prior, can replace the commonly used brute-force Monte Carlo scheme, making it possible to apply the analysis to computationally costly models. We introduce a number of global, statistically justified sensitivity measures for design analysis and optimization. Two models of integrated systems serve us as case studies to introduce the analysis and to assess its convergence properties. We show that the Bayesian Monte Carlo scheme can save costly simulation runs and can ensure a reliable accuracy of the analysis.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Weighting of experimental evidence in macromolecular structure determination

Habeck, M., Rieping, W., Nilges, M.

Proceedings of the National Academy of Sciences of the United States of America, 103(6):1756-1761, February 2006 (article)

Abstract
The determination of macromolecular structures requires weighting of experimental evidence relative to prior physical information. Although it can critically affect the quality of the calculated structures, experimental data are routinely weighted on an empirical basis. At present, cross-validation is the most rigorous method to determine the best weight. We describe a general method to adaptively weight experimental data in the course of structure calculation. It is further shown that the necessity to define weights for the data can be completely alleviated. We demonstrate the method on a structure calculation from NMR data and find that the resulting structures are optimal in terms of accuracy and structural quality. Our method is devoid of the bias imposed by an empirical choice of the weight and has some advantages over estimating the weight by cross-validation.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Classification of Faces in Man and Machine

Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B.

Neural Computation, 18(1):143-165, January 2006 (article)

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Gaussian Processes for Machine Learning

Rasmussen, CE., Williams, CKI.

pages: 248, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, USA, January 2006 (book)

Abstract
Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

ei

Web [BibTex]

Web [BibTex]