English Section

Russian propaganda floods Wikimedia Commons, shaping AI training data, Polish expert warns

24.06.2026 14:05
A Polish expert has warned that Russian state media content is flooding Wikimedia Commons, the open multimedia repository that serves as a major source for training artificial intelligence models.
FILE PHOTO:
FILE PHOTO:EPA/ALEX HOFFORD

Large volumes of Russian propaganda material, including content about the annexation of Crimea, are being uploaded to Wikimedia Commons in what one expert says is a deliberate effort to shape how artificial intelligence systems understand the world.

Marcin Żabiński, head of the Kybernetes Institute of Socio-Political Technologies, a Polish research organization, and a member of the foreign minister's advisory council on disinformation resilience, told Poland's PAP news agency that the pattern is intentional.

"Wikipedia and Wikimedia Commons are among the most important sources for training AI and for AI to query knowledge about the real world," he said.

A search of the repository for "annexation of Crimea" illustrates the problem: of 51 results, more than 40 originate from the official website of the Russian president or from a Russian state broadcaster called Independent Television Sevastopol.

The materials include footage of pro-annexation rallies and images from the signing of the treaty incorporating Crimea into Russia, along with Putin's statements on Crimean policy.

The Wikimedia Foundation's own data supports concerns about AI harvesting. According to the foundation, bandwidth used to download multimedia content from Wikimedia Commons increased by 50 percent from 2024, driven not by readers or media organizations but by "automated programs that crawl Wikimedia Commons' open-license image catalog and upload images to AI models."

Żabiński also flagged the risk posed by metadata and image descriptions attached to uploaded files.

"A much more interesting piece of meta-information is the short description of what is in the photo. And that is already a very large space for abuse, because it can influence how a recipient or a language model interprets the file," he said.

He further warned that images could contain hidden instructions through steganography—techniques invisible to the human eye but readable by AI models—that could alter how a model analyzes or responds to content.

Wikimedia Polska, the Polish chapter of the Wikimedia movement, offered a different reading of the situation.

Based on the examples provided and consultations with active volunteers, the association said it saw "no basis to speak of documented, coordinated Russian interference in Wikimedia Commons."

It attributed the high volume of Russian-sourced material to licensing: part of the Russian presidential website's content is published under a Creative Commons Attribution 4.0 International license, making it eligible for upload provided it also meets educational utility criteria.

A volunteer cited by the association said that the repository uploads external materials available under acceptable free licenses, and that doing so does not imply endorsement of the originating institution's narratives.

Żabiński acknowledged the issue remains poorly understood. He warned that if searches on Wikimedia Commons consistently underrepresent Ukrainian suffering while foregrounding geopolitical framing, the effect is to relativize the human cost of the conflict.

"Meanwhile, the scale of Russian entities' activity in the Wikimedia repository will only intensify, reaching overwhelming proportions," he said.

(jh/gs)

Source: PAP