Beverly J. Silver (Professor and Chair, Sociology Department; Director, Arrighi Center for Global Studies)
Sahan Savas Karatasli (Sociology and Arrighi Center for Global Studies)
Christopher Nealon (Professor and Chair, English Department)
The purpose of the seed proposal is to develop methods to semi-automate the collection of data on protest and other events from newspapers and similar sources with the goal of both reducing the time and increasing the accuracy for coding event information (e.g., location, actors, actions, demands). Most existing social science research in this area automate the data collection process, but do so at the cost of including an unacceptable level of false positives and failing to take advantage of the rich detailed information provided in the newspaper articles themselves. Our current NSF-funded research on Global Social Protest uses search strings to extract relevant articles from the digitized newspaper archives and relies on a custom-built website for data coding and analysis; however, to avoid the above-mentioned pitfalls it relies on human coding of articles (which is time consuming). The seed project seeks to develop natural language processing tools that allow for a middle path between full automation and manual coding. In addition to English language newspapers, we will run pilots on French, Japanese, Korean and Spanish newspapers. The extension of the project to other languages allows us to widen and deepen ongoing international research collaborations.