Header logo is

Positional Oligomer Importance Matrices




At the heart of many important bioinformatics problems, such as gene finding and function prediction, is the classification of biological sequences, above all of DNA and proteins. In many cases, the most accurate classifiers are obtained by training SVMs with complex sequence kernels, for instance for transcription starts or splice sites. However, an often criticized downside of SVMs with complex kernels is that it is very hard for humans to understand the learned decision rules and to derive biological insights from them. To close this gap, we introduce the concept of positional oligomer importance matrices (POIMs) and develop an efficient algorithm for their computation. We demonstrate how they overcome the limitations of sequence logos, and how they can be used to find relevant motifs for different biological phenomena in a straight-forward way. Note that the concept of POIMs is not limited to interpreting SVMs, but is applicable to general k−mer based scoring systems.

Author(s): Sonnenburg, S. and Zien, A. and Philips, P. and Rätsch, G.
Year: 2007
Month: December
Day: 0

Department(s): Empirical Inference
Bibtex Type: Talk (talk)

Digital: 0
Event Name: NIPS 2007 Workshop on Machine Learning in Computational Biology
Event Place: Whistler, BC, Canada
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik

Links: Web


  title = {Positional Oligomer Importance Matrices},
  author = {Sonnenburg, S. and Zien, A. and Philips, P. and R{\"a}tsch, G.},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = dec,
  year = {2007},
  month_numeric = {12}