Document Analysis

Document or Content Analysis is “‘a wide and heterogeneous set of manual or computer-assisted techniques for contextualized interpretations of documents produced by communication processes in the strict sense of that phrase (any kind of text, written, iconic, multimedia, etc.) or signification processes (traces and artifacts), having as ultimate goal the production of valid and trustworthy inferences.’  Though the locution “content analysis” has come to be a sort of ‘umbrella term’ referring to an almost boundless set of quite diverse research approaches and techniques, it is still today in use in the Social and Computer Science domains and in the Humanities to identify methods for studying and/or retrieving meaningful information from documents.”

Summarizing texts into a few keywords or a sentence is an important topic where a large volume of poorly annotated texts is being dealt with.  In a recent paper with Qi Zhang et al. (“Mining Product Reviews Based on Shallow Dependency Parsing“), Program Director Dr. Mitsunori Ogihara used a parsing technique (a method for computationally identifying parts of speech in given sentences) to identify writer’s sentiment (positive or negative feelings toward the subject of writing) in the collection of technical columns that appeared in The Wall Street Journals. (Fig. 1)


A parsing example in sentiment analysis


Fig. 1  A parsing example in sentiment analysis

Skip to toolbar