The companies that develop artificial intelligence endanger Wikipedia. That cry for help (the Wikimedia Foundation, the parent company of the popular online encyclopedia in a Blog published Tuesday.
The problem stems from the fast -growing popularity of Wikipedia and the sister site Wikimedia Commons, where image, sound and video can be found. But it is not in the first place to have been visibly visiting the Wikimedia websites since last year and thereby heavily burdening the infrastructure. They are bots.
These computer programs are used by AI developers to automatically graze the internet and to collect as much data as possible for training their so -called large language models (LLMs). Those language models form the basis of systems such as Chatgpt from Open-AI, which can produce text in response to questions, and more and more image and sound.
The huge collections of Wikipedia and Wikimedia Commons brought together by volunteers contain a wealth of reliable information and can be used for free for everyone. That makes them an attractive source for their systems for AI companies.
Other dynamics
The contents of Wikimedia, write the three authors in their blog, was always an important part of the results that people were shown in search engines. “They in turn led users back to our websites. But with the rise of artificial intelligence (AI) the dynamics are changing.”
The automatic systems of the AI companies flood the websites of Wikimedia, but the chatbots then often do not mention where they got their information-and even if they do, the Wikimedia yields relatively few visitors. The large -scale visit of the bots to the websites is a heavy burden on the infrastructure: the traffic of the bots takes up a lot of ‘bandwidth’, so that delays occurs for everyone – also for human users.
Our content is free, but our infrastructure is not
“Our infrastructure is built to be able to absorb a sudden increase in human visitors at special events,” says Wikimedia. “But the amount of traffic we get through bots that scrape the internet is unprecedented and entails major risks and costs.”
Risks because Wikimedia technicians have their hands full of accommodating the large -scale visit of Bots, who come to satisfy their hunger for data. As a result, the technicians have less time to make the extra human visit to the websites run smoothly in the event of unexpected major events.
The extra costs come through the need to use more technical staff and, moreover, making more bandwidth available. “We acknowledge that the entire internet uses our content, but it must happen in a way that can be sustained for us,” said Wikimedia. “Our content is free, but our infrastructure is not.” The Wikimedia Foundation is almost completely financed by donations.
Server close by
Wikimedia uses data centers throughout the world to be able to serve its users quickly. For example, if an article on Wikipedia is often requested by a user or group of users, the content is stored on a server close to those users, so that it can be delivered quickly. Articles that are rarely requested remain on a central server. If they are still requested, the request must first ‘travel’ all the way to that data center, and then the article back to the user, which costs more time and money.
Human users often request the same articles, for example because the topics are in the news. But the bots of the AI companies are interested in all content, including the less popular pages, for which they often have to be led to the central servers, “what makes it more expensive for us,” said Wikimedia. For example, the AI-Bots Wikipedia stifle, “writes Technology Columnist Casey Newton in his newsletter Platformer.
Read also
The foundation behind the idealistic Wikipedia took a commercial exit
Just like for news companies, Wikipedia also plays the problem that the AI companies are rapidly developing into competitive sources of information. As the use of chatbots for collecting information becomes established further, the habit of consulting news websites and Wikipedia is gradually becoming in disuse. “In the long run, the risk that the AI bots will ensure that a visit to websites such as Wikipedia is no longer necessary,” writes Newton, “and that Wikipedia itself can no longer be continued.” With which the bots then dried up the source that they also drink themselves.

