I2SC Lecture Series (Recording): Jan Lause (Computational Neuroscience, University of Tübingen) Delving into ChatGPT usage in academic writing through excess vocabulary
Date: May 23, 2025
Abstract:
Recent large language models (LLMs) can generate and revise text with human-level performance and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How widespread is LLM usage in the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions about academic LLM usage. We study vocabulary changes in 14 million PubMed abstracts from 2010-2024 and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. Our analysis based on excess word usage suggests that at least 10% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, and was as high as 30% for some PubMed sub-corpora. We show that the appearance of LLM-based writing assistants has had an unprecedented impact on the scientific literature, surpassing the effect of major world events such as the COVID-19 pandemic.
You can watch the recording of the talk below.