POSTPONED: Impact Assessment of Information Products and Data Provenance

Jana Diesner*****

Due to unforeseen circumstances this talk is POSTPONED. We apologize for the inconvenience and will reschedule at a later date.


Wednesdays@NICO | 12:00-1:00 PM, April 26, 2017 | Chambers Hall, Lower Level

Jana Diesner - Assistant Professor, School of Information Sciences, University of Illinois at Urbana-Champaign

Live Stream:

To join the Meeting:
To join via Browser:


The emerging field of human-centered data science has led to several transformative advances in research and technology: With groups of people generating digital data, some social effects can be measured instead of having to be estimated. Also, the availability of such data may allow us to listen to peoples’ signals instead of having to ask them questions. Finally, both the structure and content of human interactions can be considered for data analysis, and applying mixed methods to such data is becoming a routine approach.

These advances have broadened the scope in possibilities in impact assessment research, among other fields. I present our work on developing new computational solutions for identifying the impact of information products on people by leveraging theories from linguistics and the social sciences as well as methods from natural language processing and machine learning. I focus on a study where we developed and evaluated a theoretically grounded categorization schema, codebook, corpus annotation, and prediction model for detecting multiple practically relevant types of impact that documentary films can have on individuals, such as change versus reaffirmation of people’s behavior, cognition, and emotions. This work uses reviews as a form of user-generated content. We use linguistic, lexical, and psychological features for supervised learning; achieving an accuracy rate of about 81% (F1).

The outlined advances also imply several challenges: Verifying the accuracy of large-scale data is crucial for enabling collaborations, sharing data, and generating reliable results, but is challenging if the data provenance process lacks transparency. While choices about data collection, preparation and analysis are increasingly embedded in datasets and technologies, we still have a poor understanding of the impact of these decisions on research results and further actions. I present on our work on entity resolution of social network data, highlight the impact of common strategies and shortcomings on node and graph level properties, and discuss implications of biased results for decision and policy making.


Jana Diesner is an Assistant Professor at the iSchool/School of Information Sciences at the University of Illinois at Urbana-Champaign.

She conducts research in human-centered data science by combining network analysis, natural language processing and machine learning into computational, mixed-methods solutions that are grounded in theories from linguistics and the social sciences. With her lab, she has been addressing the following problems: 1) Impact assessment I: How can we assess the impact of information products on people beyond relying on count metrics? 2) Impact assessment II: How do limitations in data quality and data provenance impact research findings? 3) NLP for building and enhancing graph data and theory: How can we use user-generated content to construct, infer and refine network data? 4) Ethics and regulations for working with human-centered and online data: How to be rule compliant and still innovate?  

Jana holds a PhD from the Computation, Organizations and Society program (now Societal Computing) at Carnegie Mellon's School of Computer Science. She was a 2015 faculty fellow at the National Center for Supercomputing Applications (NCSA) at Illinois, and a 2016 research fellow in the Dori J. Maynard Senior Research Fellows program.


Add this event to your calendar via Plan It Purple