- A growing number of major news sites are blocking the Wayback Machine
- This would include 23 organizations that block their content from appearing in the archive.
- This is happening due to fears that the Wayback Machine could be exploited for AI content scraping.
The Wayback Machine is under serious threat (and not for the first time), as a growing number of major news sites appear to be blocking the archiving system.
If you’re not familiar with the Wayback Machine, it’s run by the nonprofit Internet Archive and is essentially a time machine that preserves the history of the web (and much more). This can be vital when dealing with, for example, historical searches or monitoring changes to websites.
As Wired reports (via 9 to 5 Mac), there is a growing trend of online media outlets blocking the web crawler used by the Internet Archive to gather its snapshots. Some 23 major news sites now do so, according to Originality AI (specializing in AI detection).
Article continues below
This includes the New York Times (based on a Nieman Lab report) and USA Today, with Wired noting that the latter recently published a report on how U.S. Immigration and Customs Enforcement delayed releasing key information about the impact of detention policies. This was a piece that made extensive use of the Wayback Machine in its research.
The irony of USA Today using this data in this way, while preventing Wayback Machine from accessing its own content — which could potentially keep the news site itself honest in the future — is not lost on Wayback Machine director Mark Graham.
Graham told Wired: “They’re able to gather their research on history because the Wayback Machine exists. At the same time, they’re blocking access.”
Of course, if more and more organizations start blocking the Wayback Machine, its ability to maintain a history of online content will be increasingly eroded.
Analysis: Blame AI (again)
So why is this happening? This isn’t about readers bypassing paid content using the Wayback Machine, in case you thought that was the issue at hand. Would you be surprised to learn that it’s actually AI, in a roundabout way? Of course, that wouldn’t be the case, and predictably it seems the Internet Archive is caught up in the broad backlash against AI here.
What these news organizations say they object to is not the preservation of a historical record of their content, but the fact that these archives can be used by third-party AI companies to train their models (LLM).
As Wired points out, New York Times spokesperson Graham James said: “The problem is that Times content on the Internet Archive is being used by AI companies in violation of copyright law to compete directly with us. »
In short, the concern for these companies is that they might be able to block such AI scraping activities themselves, but it will still happen behind their backs via the Wayback Machine. It’s not just mainstream media outlets that have these concerns, but also social media platforms, including Reddit, which blocked the Wayback Machine web crawler due to the exact same concerns.
Although there are other possible sources and means to indirectly extract news content, the Wayback Machine is the most obvious target for malicious AI operators because it maintains a very extensive library of web history.
So this is a complex issue related to AI scraping and many gray areas in terms of legality. However, the impact on what is an important resource for policing governments or media giants – and holding them accountable for what has been said in the past, or what has been removed from the web entirely in some cases – is clearly worrying.
Graham says: “There is no doubt that the blanket locking down of an increasing share of the public web is impacting society’s ability to understand what is happening in our world. »
A petition titled “Journalists Applaud Internet Archive’s Role in Preserving Public Records” was prepared and sent with over 100 signatures from working journalists. Meanwhile, a dialogue remains ongoing between the Internet Archive and said news publishers, so hope of finding a viable solution here is not yet lost.

The best computers for every budget
Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!
And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.




