My scientific interests are in the field of machine learning and inference from empirical data. In particular, I study kernel methods for extracting regularities from possibly high-dimensional data. These regularities are usually statistical ones, however, in recent years I have also become interested in methods for finding causal structures that underly statistical dependences. I have worked on a number of different applications of machine learning - in our field, you get "to play in everyone's backyard." Most recently, I have been trying to play in the backyard of astronomers and photographers.
With the growing interest in (how to make money with) big data, machine learning has significantly gained in popularity. We have published an article in the German newspaper FAZ in January 2015, discussing some of the implications. Disclaimer: the newspaper added some text that appears above our names - this was not written or approved by us.
In March 2018, I published an article about the cybernetic revolution in the German newspaper SZ. It starts with the thesis that the current revolution is about processing (generating, converting, industrializing) information in much the same way the first two industrial revolutions dealt with processing (generating, converting, industrializing) energy. I have occasionally put forward this thesis (but I'm sure I am not the only one who thinks of it this way), for instance during a NYU symposium on the future of AI in January 2016 (here are some notes written by Max Tegmark). The article also provides recommendations on what Europe should do to keep up with the development.
My department and/or members of the department (incl. myself) receive funding from a number of sources including Max Planck, the DFG, the Alexander-von-Humboldt foundation, Amazon, Google, Bosch, Facebook, the BMBF (German Ministry of Science), the EU, the ETH Zürich, the Land Baden-Wuerttemberg, the Koerber foundation, CIFAR, and the Stanford Center on Philanthropy and Civil Society.
M.Sc. in mathematics and Lionel Cooper Memorial Prize, University of London (1992)
Diplom in physics (Tübingen, 1994)
doctorate in computer science from the Technical University Berlin (1997); thesis on Support Vector Learning (main advisor: V. Vapnik, AT&T Bell Labs) won the annual dissertation prize of the German Association for Computer Science (GI)
If you'd like to contact me, please consider these two notes:
1. I recently became co-editor-in-chief of JMLR. I work for JMLR because I believe in its open access model, but it takes a lot of time. During my JMLR term, please don't convince me to do other journal or grant reviewing duties.
2. I am not very organized with my e-mail so if you want to apply for a position in my lab, please send your application only to Sekretariat-Schoelkopf@tuebingen.mpg.de. Note that we do not respond to non-personalized applications that look like they are being sent to a large number of places simultaneously.
We are always happy to receive outstanding applications for PhD positions and postdocs.
In Proceedings of the Twenty-Ninth Conference Annual Conference on Uncertainty in Artificial Intelligence, pages: 440-448, (Editors: A Nicholson and P Smyth), AUAI Press, Corvallis, Oregon, UAI, 2013 (inproceedings)
Journal of Nuclear Medicine, 54(10):1768-1774, 2013 (article)
Hybrid PET/MR systems have recently entered clinical practice. Thus, the accuracy of MR-based attenuation correction in simultaneously acquired data can now be investigated. We assessed the accuracy of 4 methods of MR-based attenuation correction in lesions within soft tissue, bone, and MR susceptibility artifacts: 2 segmentation-based methods (SEG1, provided by the manufacturer, and SEG2, a method with atlas-based susceptibility artifact correction); an atlas- and pattern recognition–based method (AT&PR), which also used artifact correction; and a new method combining AT&PR and SEG2 (SEG2wBONE). Methods: Attenuation maps were calculated for the PET/MR datasets of 10 patients acquired on a whole-body PET/MR system, allowing for simultaneous acquisition of PET and MR data. Eighty percent iso-contour volumes of interest were placed on lesions in soft tissue (n = 21), in bone (n = 20), near bone (n = 19), and within or near MR susceptibility artifacts (n = 9). Relative mean volume-of-interest differences were calculated with CT-based attenuation correction as a reference. Results: For soft-tissue lesions, none of the methods revealed a significant difference in PET standardized uptake value relative to CT-based attenuation correction (SEG1, −2.6% ± 5.8%; SEG2, −1.6% ± 4.9%; AT&PR, −4.7% ± 6.5%; SEG2wBONE, 0.2% ± 5.3%). For bone lesions, underestimation of PET standardized uptake values was found for all methods, with minimized error for the atlas-based approaches (SEG1, −16.1% ± 9.7%; SEG2, −11.0% ± 6.7%; AT&PR, −6.6% ± 5.0%; SEG2wBONE, −4.7% ± 4.4%). For lesions near bone, underestimations of lower magnitude were observed (SEG1, −12.0% ± 7.4%; SEG2, −9.2% ± 6.5%; AT&PR, −4.6% ± 7.8%; SEG2wBONE, −4.2% ± 6.2%). For lesions affected by MR susceptibility artifacts, quantification errors could be reduced using the atlas-based artifact correction (SEG1, −54.0% ± 38.4%; SEG2, −15.0% ± 12.2%; AT&PR, −4.1% ± 11.2%; SEG2wBONE, 0.6% ± 11.1%). Conclusion: For soft-tissue lesions, none of the evaluated methods showed statistically significant errors. For bone lesions, significant underestimations of −16% and −11% occurred for methods in which bone tissue was ignored (SEG1 and SEG2). In the present attenuation correction schemes, uncorrected MR susceptibility artifacts typically result in reduced attenuation values, potentially leading to highly reduced PET standardized uptake values, rendering lesions indistinguishable from background. While AT&PR and SEG2wBONE show accurate results in both soft tissue and bone, SEG2wBONE uses a two-step approach for tissue classification, which increases the robustness of prediction and can be applied retrospectively if more precision in bone areas is needed.
In Advances in Neural Information Processing Systems 26, pages: 2535-2543, (Editors: C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013 (inproceedings)
In Proceedings of the Fifth International Brain-Computer Interface Meeting: Defining the Future, pages: Article ID: 086, (Editors: J.d.R. Millán, S. Gao, R. Müller-Putz, J.R. Wolpaw, and J.E. Huggins), Verlag der Technischen Universität Graz, 5th International Brain-Computer Interface Meeting, 2013, Article ID: 086 (inproceedings)
In Advances in Neural Information Processing Systems 26, pages: 154-162, (Editors: C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013 (inproceedings)
Kam-Thong, T., Azencott, C., Cayton, L., Pütz, B., Altmann, A., Karbalai, N., Sämann, P., Schölkopf, B., Müller-Myhsok, B., Borgwardt, K.
Human Heredity, 73(4):220-236, September 2012 (article)
Due to recent advances in genotyping technologies, mapping phenotypes to single loci in the genome has become a standard technique in statistical genetics. However, one-locus mapping fails to explain much of the phenotypic variance in complex traits. Here, we present GLIDE, which maps phenotypes to pairs of genetic loci and systematically searches for the epistatic interactions expected to reveal part of this missing heritability. GLIDE makes use of the computational power of consumer-grade graphics cards to detect such interactions via linear regression. This enabled us to conduct a systematic two-locus mapping study on seven disease data sets from the Wellcome Trust Case Control Consortium and on in-house hippocampal volume data in 6 h per data set, while current single CPU-based approaches require more than a year’s time to complete the same task.
20th Annual Scientific Meeting ISMRM, May 2012 (poster)
Patient motion in the scanner is one of the most challenging problems in MRI. We propose a new retrospective motion correction method for which no tracking devices or specialized sequences are required. We seek the motion parameters such that the image gradients in the spatial domain become sparse. We then use these parameters to invert the motion and recover the sharp image. In our experiments we acquired 2D TSE images and 3D FLASH/MPRAGE volumes of the human head. Major quality improvements are possible in the 2D case and substantial improvements in the 3D case.
Artificial Intelligence, 182-183, pages: 1-31, May 2012 (article)
While conventional approaches to causal inference are mainly based on conditional (in)dependences, recent methods also account for the shape of (conditional) distributions. The idea is that the causal hypothesis “X causes Y” imposes that the marginal distribution PX and the conditional distribution PY|X represent independent mechanisms of nature. Recently it has been postulated that the shortest description of the joint distribution PX,Y should therefore be given by separate descriptions of PX and PY|X. Since description length in the sense of Kolmogorov complexity is uncomputable, practical implementations rely on other notions of independence. Here we define independence via orthogonality in information space. This way, we can explicitly describe the kind of dependence that occurs between PY and PX|Y making the causal hypothesis “Y causes X” implausible. Remarkably, this asymmetry between cause and effect becomes particularly simple if X and Y are deterministically related. We present an inference method that works in this case. We also discuss some theoretical results for the non-deterministic case although it is not clear how to employ them for a more general inference method.
Journal of Neural Engineering, 9(4):046001, May 2012 (article)
Subjects operating a brain–computer interface (BCI) based on sensorimotor rhythms exhibit large variations in performance over the course of an experimental session. Here, we show that
high-frequency γ-oscillations, originating in fronto-parietal networks, predict such variations on a trial-to-trial basis. We interpret this finding as empirical support for an influence of attentional networks on BCI performance via modulation of the sensorimotor rhythm.
Journal of Machine Learning Research, 13, pages: 723-773, March 2012 (article)
We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distribution-free tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
Journal of Neural Engineering, 9(2):026011, February 2012 (article)
We report on the development and online testing of an electroencephalogram-based brain–computer interface (BCI) that aims to be usable by completely paralysed users—for whom visual or motor-system-based BCIs may not be suitable, and among whom reports of successful BCI use have so far been very rare. The current approach exploits covert shifts of attention to auditory stimuli in a dichotic-listening stimulus design. To compare the efficacy of event-related potentials (ERPs) and steady-state auditory evoked potentials (SSAEPs), the stimuli were designed such that they elicited both ERPs and SSAEPs simultaneously. Trial-by-trial feedback was provided online, based on subjects' modulation of N1 and P3 ERP components measured during single 5 s stimulation intervals. All 13 healthy subjects were able to use the BCI, with performance in a binary left/right choice task ranging from 75% to 96% correct across subjects (mean 85%). BCI classification was based on the contrast between stimuli in the attended stream and stimuli in the unattended stream, making use of every stimulus, rather than contrasting frequent standard and rare 'oddball' stimuli. SSAEPs were assessed offline: for all subjects, spectral components at the two exactly known modulation frequencies allowed discrimination of pre-stimulus from stimulus intervals, and of left-only stimuli from right-only stimuli when one side of the dichotic stimulus pair was muted. However, attention modulation of SSAEPs was not sufficient for single-trial BCI communication, even when the subject's attention was clearly focused well enough to allow classification of the same trials via ERPs. ERPs clearly provided a superior basis for BCI. The ERP results are a promising step towards the development of a simple-to-use, reliable yes/no communication system for users in the most severely paralysed states, as well as potential attention-monitoring and -training applications outside the context of assistive technology.
(3), Max-Planck-Institut für Intelligente Systeme, Tübingen, February 2012 (techreport)
Subjects operating a brain-computer interface (BCI) based on sensorimotor rhythms exhibit large variations in performance over the course of an experimental session. Here, we show that high-frequency gamma-oscillations, originating in fronto-parietal networks, predict such variations on a trial-to-trial basis. We interpret this nding as empirical support for an in uence of attentional networks on BCI-performance via modulation of the sensorimotor rhythm.
In Computer Vision - ECCV 2012, LNCS Vol. 7574, pages: 187-200, (Editors: A Fitzgibbon, S Lazebnik, P Perona, Y Sato, and C Schmid), Springer, Berlin, Germany, 12th IEEE European Conference on Computer Vision, ECCV, 2012 (inproceedings)
Camera lenses are a critical component of optical imaging systems, and lens imperfections compromise image quality. While traditionally, sophisticated lens design and quality control aim at limiting optical aberrations, recent works [1,2,3] promote the correction of optical flaws by computational means. These approaches rely on elaborate measurement procedures to characterize an optical system, and perform image correction by non-blind deconvolution.
In this paper, we present a method that utilizes physically plausible assumptions to estimate non-stationary lens aberrations blindly, and thus can correct images without knowledge of specifics of camera and lens. The blur estimation features a novel preconditioning step that enables fast deconvolution. We obtain results that are competitive with state-of-the-art non-blind approaches.
In Advances in Neural Information Processing Systems 25, pages: 189-196, (Editors: P Bartlett, FCN Pereira, CJC. Burges, L Bottou, and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)
In Proceedings of Robotics: Science and Systems VIII, pages: 8, R:SS, 2012 (inproceedings)
Inference of human intention may be an essential step towards understanding human actions  and is hence
important for realizing efficient human-robot interaction. In this paper, we propose the Intention-Driven Dynamics Model (IDDM), a latent variable model for inferring unknown human intentions. We train the model based on observed human behaviors/actions and we introduce an approximate inference algorithm to efficiently infer the human’s intention from an ongoing action.
We verify the feasibility of the IDDM in two scenarios, i.e., target inference in robot table tennis and action recognition for interactive humanoid robots. In both tasks, the IDDM achieves substantial improvements over state-of-the-art regression and classification.
In Advances in Neural Information Processing Systems 25, pages: 10-18, (Editors: P Bartlett, FCN Pereira, CJC. Burges, L Bottou, and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)
In Computer Vision - ECCV 2012, LNCS Vol. 7578, pages: 27-40, (Editors: A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid), Springer, Berlin, Germany, 12th European Conference on Computer Vision, ECCV , 2012 (inproceedings)
Motion blur due to camera shake is one of the predominant sources of degradation in handheld photography. Single image blind deconvolution (BD) or motion deblurring aims at restoring a sharp latent image from the blurred recorded picture without knowing the camera motion that took place during the exposure. BD is a long-standing problem, but has attracted much attention recently, cumulating in several algorithms able to restore photos degraded by real camera motion in high quality. In this paper, we present a benchmark dataset for motion deblurring that allows quantitative performance evaluation and comparison of recent approaches featuring non-uniform blur models. To this end, we record and analyse real camera motion, which is played back on a robot platform such that we can record a sequence of sharp images sampling the six dimensional camera motion trajectory. The goal of deblurring is to recover one of these sharp images, and our dataset contains all information to assess how closely various algorithms approximate that goal. In a comprehensive comparison, we evaluate state-of-the-art single image BD algorithms incorporating uniform and non-uniform blur models.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems