I2SC Lecture Series (Recording): Jan Lause (Computational Neuroscience, University of Tübingen) Delving into ChatGPT usage in academic writing through excess vocabulary

May 30, 2025

Date: May 23, 2025

Abstract:

Recent large language models (LLMs) can generate and revise text with human-level performance and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How widespread is LLM usage in the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions about academic LLM usage. We study vocabulary changes in 14 million PubMed abstracts from 2010-2024 and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. Our analysis based on excess word usage suggests that at least 10% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, and was as high as 30% for some PubMed sub-corpora. We show that the appearance of LLM-based writing assistants has had an unprecedented impact on the scientific literature, surpassing the effect of major world events such as the COVID-19 pandemic.

You can watch the recording of the talk below.

Check out our lecture series: https://www.i2sc.net/events/i2sc-lecture-series

And, sign up to our mailing list if you don't want to miss these talks.

Updates from I2SC

I2SC Lecture Series (Recording): Jan Lause (Computational Neuroscience, University of Tübingen) Delving into ChatGPT usage in academic writing through excess vocabulary

Popular posts from this blog

AI Girlfriend or AI Boyfriend? Social Determinants of Human-AI Relationships

Blog: The Importance of Good Data in Satellite Imagery Analysis

Job: Student Research Assistant (m/w/d) for AI & Misinformation detection