I2SC Lecture Series (Recording): Maria Antoniak (Computer Science, University of Colorado), From Stories to Sonnets: Data-Centered NLP for Creative Works
Date: January 10, 2025
Abstract:
In this talk, I'll share two recent studies that use natural language processing (NLP) techniques to model creative works like stories and poetry. In the first part of the talk, I'll discuss NLP approaches for story detection and analysis, focusing on how NLP methods can help us study storytelling at large scales and across diverse contexts. In the second part, I'll discuss the poetic capabilities of large language models (LLMs), focusing on audits of the vast pretraining datasets used to build these models. Both studies will highlight the challenges in creating open evaluation datasets for creative works and the importance of interdisciplinary collaboration between NLP and the humanities.
You can watch the recording of the talk below.