Events: Northwestern Institute on Complex Systems

Events

Details

Speaker:

Momin Malik - Data Science Postdoctoral Fellow, Berkman Klein Center for Internet & Society, Harvard University

Title:

Revisiting "All Models are Wrong": Addressing Limitations in Big Data, Machine Learning, and Computational Social Science

Abstract:

In the immortal words of George E. P. Box (1979), "All models are wrong, but some are useful." This is an important lesson to recall amidst hopes and claims that digital trace data, the high-dimension and low-assumption models of machine learning, and advancements in computational social science are overcoming the limitations of the past. In this talk, I review the fundamental limitations with which all quantitative research must grapple, and discuss how these limitations manifest today.

Larger data captures more heterogeneity and allows for studying finer and finer subpopulations and phenomena, but as I demonstrate with geotagged tweets, selection bias still makes results fail to generalize to larger populations. The platforms from which we gather data are not research utilities, and I model the introduction of Facebook's "People You May Know" recommender system to show how social media platforms' efforts to solicit desirable behavior from users changes what we think we observe. Through considering co-location data via mobile phone sensors versus friendship self-report, I consider how new forms of measurement do not necessarily supersede previous forms but capture different underlying constructs that can be fruitful opportunities for research. I conclude with a theoretical overview of limitations of forms of quantitative modeling, from the inevitable reliance on central tendencies in probability-based modeling through to how cross-validation can break down in the presence of dependencies.

This talk will serve as a useful overview about modeling limitations and critiques, as well as possible fixes, for researchers in and practitioners of data science, computational social science, social physics, statistics, and machine learning. It will also be useful as a primer for those outside these fields on the appropriate and inappropriate uses of techniques from them.

Speaker Bio:

Momin M. Malik is the Data Science Postdoctoral Fellow at the Berkman Klein Center for Internet & Society at Harvard University. He holds an undergraduate degree in history of science from Harvard, a master’s from the Oxford Internet Institute, and a master's in Machine Learning and a PhD in Societal Computing from the School of Computer Science, Carnegie Mellon University, where his dissertation measured how much social media platform effects, demographic biases, and reliance on mobile phone sensor data can threaten generalizability of findings in computational social science. His current work bridges machine learning and science studies to understand the sources of both success and failures in machine learning.

About the Speaker Series:

Wednesdays@NICO is a vibrant weekly seminar series focusing broadly on the topics of complex systems and data science. It brings together attendees ranging from graduate students to senior faculty who span all of the schools across Northwestern, from applied math to sociology to biology and every discipline in-between.

Live Stream:

bluejeans.com/8474912528

Related Info

click to see full descriptionmore click to hide full descriptionless

Time

Wednesday, February 5, 2020 at 12:00 PM - 1:00 PM

Location

Lower Level, Chambers Hall Map

Add to Calendar

Contact

Meghan Stagl

Calendar

Northwestern Institute on Complex Systems (NICO)

Details

MAY MEETING: Thursday, May 28, 2026 at 5:30pm (US Central)

LOCATION:
ESAM Conference Room, Tech M416
2145 Sheridan Road, Evanston, IL 60208

AGENDA:
5:30pm - Meet and greet with refreshments
6:00pm - Talk with Xudong Tang, PhD Student, Computer Science, NICO, and the Human-AI Collaboration Lab, Northwestern University

TALK TITLE:
Human and Machine Perception of Voice Similarity

ABSTRACT:
Modern voice cloning systems generate synthetic speech that listeners frequently cannot identify as being synthetic. But a voice can sound natural without sounding like the intended person, and what determines whether a clone is heard as a particular person is an open question. Here we report a large-scale preregistered experiment in which we collected 92,239 responses from 175 participants on their perception of pairs of real recordings, voice clones, and continuously morphed voices drawn from 100 contemporary celebrities across 20 speaker groups. We find that voice clones do not reliably preserve perceived speaker identity, reducing same-speaker judgments by 12.7 percentage points even though the clones are produced by a state-of-the-art text-to-speech model, while leaving different-speaker judgments unchanged. Using continuously morphed stimuli, we find that speakers vary substantially in how much variation their perceived identity tolerates, and that this variation is not predicted by speaker demographics. Speaker embeddings account for 58.9\% (95\% CI = [55.7, 61.9]) of variance in identity judgments, which is more than acoustic features, social attributes, and clone status combined. Once all these observed features are accounted for, clone status adds no additional predictive power. These results shows that the perceptual impact of voice cloning is positional rather than categorical: we can model how listeners judge a voice by how close it falls to the perceptual boundary that defines each speaker's recognizable voice, applying the same criterion to real and synthetic speech alike.

DATA SCIENCE NIGHTS are monthly meetings featuring presentations and discussions about data-driven science and complex systems, organized by Northwestern University graduate students and scholars. Students and researchers of all levels are welcome! For more information: http://bit.ly/nico-dsn

FUTURE DATES:
Data Science Nights will return in September!

Related Info

click to see full descriptionmore click to hide full descriptionless

Time

Thursday, May 28, 2026 at 5:30 PM - 7:00 PM

Location

M416, Technological Institute Map

Add to Calendar

Contact

Stefan Pate

Calendar

Northwestern Institute on Complex Systems (NICO)

NORTHWESTERN INSTITUTEON COMPLEX SYSTEMS

Events

WED@NICO SEMINAR: Momin Malik, Harvard University "Revisiting 'All Models are Wrong': Addressing Limitations in Big Data, Machine Learning, and Computational Social Science"

Data Science Nights - MAY 2026 - Speaker: Xudong Tang, Computer Science and NICO