Clustering Graphs by Weighted Substructure Mining
2006
Conference Paper
ei
Graph data is getting increasingly popular in, e.g., bioinformatics and text processing. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraphs, the dimensionality gets too large for usual statistical methods. We propose an efficient method for learning a binomial mixture model in this feature space. Combining the $ell_1$ regularizer and the data structure called DFS code tree, the MAP estimate of non-zero parameters are computed efficiently by means of the EM algorithm. Our method is applied to the clustering of RNA graphs, and is compared favorably with graph kernels and the spectral graph distance.
Author(s): | Tsuda, K. and Kudo, T. |
Book Title: | ICML 2006 |
Journal: | Proceedings of the 23rd International Conference on Machine Learning (ICML 2006) |
Pages: | 953-960 |
Year: | 2006 |
Month: | June |
Day: | 0 |
Editors: | Cohen, W. W., A. Moore |
Publisher: | ACM Press |
Department(s): | Empirische Inferenz |
Bibtex Type: | Conference Paper (inproceedings) |
DOI: | 10.1145/1143844.1143964 |
Event Name: | 23rd International Conference on Machine Learning |
Event Place: | Pittsburgh, PA, USA |
Address: | New York, NY, USA |
Digital: | 0 |
Language: | en |
Organization: | Max-Planck-Gesellschaft |
School: | Biologische Kybernetik |
Links: |
PDF
Web |
BibTex @inproceedings{4068, title = {Clustering Graphs by Weighted Substructure Mining}, author = {Tsuda, K. and Kudo, T.}, journal = {Proceedings of the 23rd International Conference on Machine Learning (ICML 2006)}, booktitle = {ICML 2006}, pages = {953-960}, editors = {Cohen, W. W., A. Moore}, publisher = {ACM Press}, organization = {Max-Planck-Gesellschaft}, school = {Biologische Kybernetik}, address = {New York, NY, USA}, month = jun, year = {2006}, doi = {10.1145/1143844.1143964}, month_numeric = {6} } |