Tim Müller – Data Scientist and Social Researcher


Exploring the intersection of data science and society.

Written by

×

Visualizing Migration-Related News in Germany 2023

Overview

This project examines migration-related news coverage in German nationwide and regional newspapers throughout 2023. Using a structured sampling approach, I collected headlines containing the term “migration” and processed them to uncover reporting trends and visualize frequently used words. The analysis highlights the complexity of data extraction, the prominence of migration in public discourse, and the shifts in debate topics throughout the year.

You can find the Jupyter Notebook of this project here.

Data Collection

  • Source: The GENIOS database tracking articles from German newspapers.
  • Sampling Strategy:
    • Two days per week were randomly selected.
    • Headlines containing “migration” from all available newspapers were retrieved.
  • Scope:
    • 104 text files covering metadata and headlines, expanded into a dataset of 21,546 individual records after cleaning.

Data Processing

Processing the data required overcoming several challenges:

  1. Extracting Metadata:
    • Key fields (e.g., date, region, article type, number of results) were extracted and cleaned.
    • A normalized metric was created to calculate the share of migration-related articles as a percentage of total articles on sampled days.
  2. Headline Extraction:
    • Headlines were embedded in text blocks with inconsistent formatting.
    • Custom rules were applied for extraction based on string length and positional patterns.
    • Special handling was implemented for complex cases, such as identifying and removing author information or duplicate lines.

Insights from Data

  • Descriptive Statistics:
    • On average, 224 articles/day contained “migration.”
    • Migration-related articles made up just 2.16% of total articles but reflected recurring duplication across regional papers.
  • Trends in Reporting:
    • Peaks in coverage occurred in February, May, and autumn/winter 2023.
    • Sunday newspapers published fewer articles overall but often had a higher proportion of migration-related content.

Visualizing Headline Content

To explore public discourse, I created word clouds visualizing the most frequently used words in migration-related headlines.

1. Yearly Overview

  • Key Words: “Deutschland,” “AfD,” “Migration,” “Geflüchtete,” and “Geld” dominated the headlines.
  • Context:
    • Frequent mentions of “AfD” highlight the political discourse.
    • Words like “Geflüchtete” (refugees) reflect debates about funding for housing and schooling at the state and municipal levels.

2. Monthly Word Clouds

  • Method: Word clouds were generated for each month to track the evolution of the migration debate.
  • Observations:
    • January: Discussions centered on New Year’s Eve violence.
    • Spring: Focused on state demands for additional funding and the number of people with migrant backgrounds.
    • Summer: Border checks and asylum rights dominated headlines.
    • Autumn: Agreement on increased funding for municipalities and states.
    • December: News shifted to PISA education results.
Wordcloud, January 2023
Wordcloud, March 2023
Wordcloud, August 2023
Wordcloud, November 2023

Summary

This project demonstrates the potential of data visualization to reveal patterns in public discourse. By processing migration-related newspaper headlines:

  • I visualized the most frequent words and topics of 2023.
  • Tracked shifts in the migration debate over the year.
  • Highlighted the prominence of political parties and funding-related discussions.

Future Directions

This project provides a foundation for deeper exploration:

  • Regional Analysis: Examine reporting trends across local newspapers.
  • Topic Modelling: Identify thematic dimensions in headlines using NLP techniques.
  • Impact Assessment: Investigate correlations between news coverage and political opinion polls.