From CS260Wiki
Jump to: navigation, search

Research Question

What does an effective interface for studying stylistic adjective use in a text look like?

Related Work

Interfaces for text exploration are an active area of inquiry within human-computer interaction. Laypeople want to browse personal documents, journalists and decision makers want detailed news media analytics, and lately, scholars in the humanities want to analyze recently digitized sources. We will first describe the most closely related work in the digital humanities, and then describe motivating literature from natural language processing and human-computer interaction.

Digital Humanities

In the digital humanities, the closest work to our project comes from two well-known text analytics efforts. The first is the MONK project [1] incorporating the SEASR analysis toolkits [2], a collaborative effort from various universities and institutions in the United States and Canada, supported by the Mellon foundation. These projects offers two computational linguistics tools in addition to word distribution and frequency statistics: tagging words with their parts of speech and extracting named entities. Users can visualize occurrence patterns of word sequences within a chosen text, and plot networks of how often named entities occur near each other. A subset of these researchers, some from the HCI Lab at UMD, produced visual text-mining analyses of Emily Dickinson’s correspondence [3], and of Gertrude Stein’s “The Making of Americans”[4], and an interface for exploring the parts of speech used near query words of interest [5].

The second is Voyeur, a project from Stefan Sinclair’s group at McMaster University [6]. Voyeur operates entirely at the word level. It allows users to plot word frequencies, see concordances (contexts in which words occur) and create tag clouds.

Other digital humanities projects have used more advanced language processing, but have not developed them into user interfaces or combined them with visualizations. At the recently formed Humanities Computing Lab at Stanford University, topic modeling is being applied to 19th Century British and American novels [7]. These novels were also the subjects of cutting-edge computational linguistics research at Columbia University that showed how to automatically extract social networks from free text [8]. At the university of Washington, researchers are applying topic modeling to the compendium of Danish, Norwegian, and Swedish folklore collected by Evald Tang Kristensen, and have won an award to continue analysis on Google Books’ Scandinavian collection[9]. In the field of visualization, applications to text in the humanities have been limited to word clouds, and node-and-link diagrams of named entities, and co-occurrences.

Outside the realm of text, but in the domain of comparative exploration, LISA, a comparison search interface for cultural heritage artifacts was created by Amin et. al. [10].

Computer Science

The digital humanities work described above comes from the application of ideas from human-computer interaction and natural language processing. From human-computer interaction, we are informed by general principles of search user interface design described by Hearst [11], especially those about visualizing linguistic information, and of visual exploration of large data-sets described by Shneiderman [12]. The idea of visually representing text as a block of tiles goes as far back as 1995, with Hearst using it to show the distribution of search terms in retrieved documents [13].

Interfaces for exploring the grammatical relationships between words is a newer area of research, probably due to the fact that natural language parsers have only recently become fast enough to process large amounts of text in a reasonable time span. In this sub-field, we are only aware of the work of van Ham, Wattenberg and Viegas on Phrase Nets [14], which we adapt for comparing narratives side-by-side.

In natural language processing, there are two areas of research that are related to our work: sentiment analysis, and meme tracking, which aim to give users a feeling for the contents of a large body of text. Sentiment analysis involves extracting relevant features about an item of interest from in product reviews, news articles, and other streams of text, and categorizing the language used to describe them as positive or negative. State of the art methods by Popescu and Etzioni [15] use pattern-based information extraction [16][17] and dependency parsing to extract adjectives that apply to e.g. various features of consumer products.

Meme tracking involves either modeling text as a bag-of-words generated by topics that vary over time (an approach due to Blei and Lafferty) [18], or by tracking the distribution of variations on popular phrases [19], used by Leskovec et. al.


[1] “The MONK Project.” [Online]. Available: [Accessed: 29-Oct-2010].

[2] “SEASR.” [Online]. Available: [Accessed: 29-Oct-2010].

[3] C. Plaisant et al., “Exploring erotics in Emily Dickinson's correspondence with text mining and visual interfaces,” in Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pp. 141-150, 2006.

[4] A. Don et al., “Discovering interesting usage patterns in text collections: integrating text mining with visualization,” in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 213-222, 2007.

[5] R. Vuillemot, T. Clement, C. Plaisant, and A. Kumar, “What's being said near “Martha”? Exploring name entities in literary text collections,” in Visual Analytics Science and Technology, 2009. VAST 2009. IEEE Symposium on, pp. 107–114, 2009.

[6] “Voyeur Tools: See Through Your Texts | – The Rhetoric of Text Analysis.” [Online]. Available: [Accessed: 29-Oct-2010].

[7] M. Jockers, “What is a Literature Lab: Not Grunts and Dullards | Matthew L. Jockers.” [Online]. Available: [Accessed: 29-Oct-2010].

[8] D. K. Elson, N. Dames, and K. R. McKeown, “Extracting social networks from literary fiction,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 138–147, 2010.

[9] “Google Books Grant to Fund Research on Swedish, Danish and Norwegian texts | CultureMob.” [Online]. Available: [Accessed: 29-Oct-2010].

[10] A. K. Amin, M. Hildebrand, J. van Ossenbruggen, and L. Hardman, “Designing a thesaurus-based comparison search interface for linked cultural heritage sources,” in Proceeding of the 14th international conference on Intelligent user interfaces, pp. 249–258, 2010.

[11] M. Hearst, Search user interfaces. Cambridge Univ Pr, 2009.

[12] B. Shneiderman, “The Eyes Have it: A Task by Data Type Taxonomy for Information Visualizations,” 1996. [Online]. Available: [Accessed: 22-Apr-2010].

[13] M. A. Hearst, “TileBars: visualization of term distribution information in full text information access,” in Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 59–66, 1995.

[14] F. Van Ham, M. Wattenberg, and F. B. Viégas, “Mapping text with phrase nets,” Visualization and Computer Graphics, IEEE Transactions on, vol. 15, no. 6, pp. 1169–1176, 2009.

[15] A. Popescu and O. Etzioni, “Extracting Product Features and Opinions from Reviews,” presented at the EMNLP, 2005.

[16] M. A. Hearst, “Automatic acquisition of hyponyms from large text corpora,” in Proceedings of the 14th conference on Computational linguistics-Volume 2, pp. 539–545, 1992.

[17] A. Yates, M. Cafarella, M. Banko, O. Etzioni, M. Broadhead, and S. Soderland, “TextRunner: open information extraction on the web,” in Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations on XX, pp. 25–26, 2007.

[18] D. M. Blei and J. D. Lafferty, “Dynamic topic models,” in Proceedings of the 23rd international conference on Machine learning, pp. 113–120, 2006.

[19] J. Leskovec, L. Backstrom, and J. Kleinberg, “Meme-tracking and the dynamics of the news cycle,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 497–506, 2009.