Blog: When Images Lie: Mastering Multimodal Fact-Checking to Combat Multimodal Misinformation

June 05, 2025

Picture this: It's Tuesday morning, and my inbox is flooded with reports about a viral social media post showing a dramatic graph claiming that COVID-19 infections have reached nearly 8 billion people by 2024. The post combines an alarming chart with text stating "This is the latest news of COVID-19." Within hours, it's been shared thousands of times, sparking panic and conspiracy theories. As a professional fact-checker, I know this is exactly the kind of multimodal misinformation that makes my job both crucial and challenging.

The Multimodal Misinformation Problem

Misinformation today isn't just about false text—it's about misleading claims, manipulated images, memes, deepfakes, and so on. What makes multimodal misinformation particularly dangerous is that combining text with images appears more convincing to audiences than text-only false information. When people see a professional-looking graph alongside authoritative-sounding text, they're more likely to believe and share it without verification.

The COVID-19 graph example perfectly illustrates this phenomenon. The visual element—a steep upward trending chart—immediately grabs attention and seems to provide concrete evidence for the text claim. However, as a fact-checker, I need to verify both the visual and textual components to determine the post's accuracy.

Why Traditional Fact-Checking Falls Short

Traditional unimodal fact-checking (text-only) is insufficient for hybrid content. When I encounter posts like the COVID-19 example, I can't just verify the text claim—I must also analyze the image, check for manipulation, verify data sources, and assess the relationship between visual and textual elements.

The challenges are significant: Cross-modal misalignment, temporal and spatial complexity, and context dependency make multimodal fact-checking difficult. Images might be real but used out of context, graphs might display accurate data but with misleading scales, or text might make claims that contradict what's actually shown in accompanying visuals.

Modern Solutions

Fortunately, new technological approaches are revolutionizing how we tackle multimodal misinformation. One of the promising methods emerged from recent research:

MAFT (Multimodal Automated Fact-Checking via Textualization) converts all media types into text for analysis. For the COVID-19 post, MAFT would first convert the graph into a textual description, then extract specific claims, gather evidence through web searches, and generate a comprehensive verification report. In the example case, it discovered that actual global COVID-19 cases by early 2024 were approximately 704 million—not 8 billion as the graph suggested.

The Future of Fact-Checking

Misinformation is multimodal, and our response must be equally sophisticated. While AI research like MAFT shows promising results—achieving significant improvements over previous methods—human expertise remains essential for contextual understanding and nuanced judgment.

- Written by Hewan Shrestha

References:

MAFT: Multimodal Automated Fact-Checking via Textualization

Updates from I2SC

Blog: When Images Lie: Mastering Multimodal Fact-Checking to Combat Multimodal Misinformation

Popular posts from this blog

AI Girlfriend or AI Boyfriend? Social Determinants of Human-AI Relationships

Blog: The Importance of Good Data in Satellite Imagery Analysis

Job: Student Research Assistant (m/w/d) for AI & Misinformation detection