April 2012 Featured Scientist – Mitsunori Ogihara, PhD

Dr. Mitsunori Ogihara

April 2012

Overview – Dr. Mitsunori Ogihara is Director of the Data Mining Group in the Center for Computational Science.  He holds the rank of Professor of Computer Science in the College of Arts and Sciences and is a Professor of Electric and Computer Engineering in the College of Engineering.  On January 1, 2012, he began serving as Associate Dean for Digital Library Innovations in the College of Arts and Sciences, a position supported jointly by the Richter Library and the College of Arts and Science.  Prior to joining the University of Miami, he  served as chair of Computer Science at the University of Rochester for eight years.

Dr. Ogihara holds a PhD in Information Sciences from Tokyo Institute of Technology.  His PhD thesis was a theoretical study of computational complexity classes in relation to sets of low density (sparse and tally sets).  In his PhD work, he obtained the definite answer to a decade-long open question about sparse sets derived from the Berman-Hartmanis Conjecture.  Since then he has published more than 100 papers in theoretical computer science.  Currently, he serves on the editorial board for two journals that cover foundational research: Theory of Computing Systems and International Journal of Foundations of Computer Science.

While continuing work on theoretical research, he developed interests in scholarly inquiries that require computation as tools for discovery and has begun publishing in such areas as data mining, bioinformatics, and music information retrieval.   In data mining Dr. Ogihara is interested in developing techniques for high-dimensional data, in particular, those whose attributes are discrete.  In his recent work he developed algorithms for estimating entropy of network flows with respect to the frequencies of IP packet addresses (in collaboration with groups from Georgia Tech., ATT, and Denison University).  A key issue in designing such network traffic monitoring algorithms is that there is not enough space to record all the packets.  The entropy estimation algorithms use sampling packets to obtain a summary that on average gives a close-to-reality representation of the packet distribution.

Research Interests

Document Analysis

Another area of interest in his data mining research is document analysis. Summarizing texts into a few keywords or to a sentence is an important topic where a large volume of poorly annotated texts is being dealt with.

In a recent paper with Zhang et al., he used a parsing technique (a method for computationally identifying parts of speech in given sentences) to identify writer’s sentiment (positive or negative feelings toward the subject of writing) in the collection of technical columns that appeared in The Wall Street Journals. (Fig. 1)

A parsing example in sentiment analysis

Fig. 1 A parsing example in sentiment analysis

High Dimensional Biological Data

In bioinformatics, Dr. Ogihara’s research has been on algorithms for high dimensional biological data.  His group in the CCS is currently developing methods for estimating abundance of transcriptomes from high-throughput sequencing data using fast solutions to the Least Squares Problem.

Soft clustering of activation profiles

Fig. 2 Soft clustering of activation profiles

Also, with Vineet Gupta (Department of Medicine at UM) and a former CCS scientist,Qiong Cheng (University of Illinois), he is developing methods for inferring dynamics of protein-protein interaction through integration of interaction data (Y2H data, for example) and activation data.  In this study an input network is examined to identify sets of interactions that are likely to occur at different times and then using that information activation patterns are grouped into clusters. (Fig. 2)

Music Information Retrieval

In music information retrieval, he is interested in integration of various types of music data (metadata, lyrics, acoustic data, listening patterns, and user social networks) for classification and recommendation.  In a recent paper with a graduate student Yajie Hu he studied the question of modeling transitions from one genre to another in music listening behaviors using freshness in listeners’ memories that decay exponentially in time so as to make meaningful recommendations for the “next piece” to listen to.  Also, with a CCS scientist Dingding Wang he studied the problem of exploring “Twitter follower” networks to improve automatic genre/style classification of artists.  His recent book with Tao Li (Florida International University) and George Tzanetakis (University of British Columbia) “Music Data Mining” surveys a variety of problems and techniques that appear in large-scale music data analysis.