- The Wayback Machine is threatened again by AI
- The AI boom has tripled the price of the large hard drives needed for these vast web archives
- This is an additional danger for the Wayback Machine, which is also in trouble due to its crawler being blocked by news sites, which is also caused by AI.
It’s an increasingly desperate time for those trying to keep track of the history of the web, as AI once again proves to be a serious obstacle to the efforts of organizations like the Internet Archive – and this time, it’s soaring hard drive prices.
You may remember that last month we covered another angle of the difficulties caused by AI at the Internet Archive’s Wayback Machine. This is the nonprofit’s story of the Web, and there’s a problem in that, as part of measures designed to thwart the AI that scrapes their content, online news sites are increasingly blocking the crawler that the Internet Archive uses to compile snapshots of the Web pages that make up the archive.
And now, 404 Media reports (via Tom’s Hardware) that the Internet Archive is suffering from the AI-driven shortage of hard drives (as larger drives are needed in data centers for AI workloads).
Yes, the AI boom is not just about LLMs (Large Language Models) consuming your RAM and SSDs, but also about hard drives (along with indirect effects on other components).
The massive hard drives – on the order of 30TB – that the Internet Archive needs to house the Wayback Machine’s historical archives are now up to three times more expensive, or even completely out of stock. In this way, the AI boom is now a “very real problem that is costing us time and money,” Internet Archive founder Brewster Kahle commented to 404 Media.
With some 210 petabytes (210,000 TB) of web page snapshots in its library, which is growing by 100 TB per day, you can appreciate the extent of web archiving going on here.
Wikipedia’s parent nonprofit, the Wikimedia Foundation, would face similar difficulties, as you might imagine. It has to host some 65 million articles, which takes up a lot of disk space. A Wikimedia Foundation spokesperson told 404 Media that the main issues are “purchasing memory and hard drives”, but also server delivery times.
Analysis: There are many workarounds, but what about bands?
So, is the Wayback Machine really in danger? Will we see the cogs of the “living history of the Internet” begin to come undone? Well, there’s no immediate peril, because apparently donors and the community around the Wayback Machine are banding together to get around the problem of spiraling travel costs.
Still, this is clearly a problem going forward – and blocking the Internet Archive’s web crawler is even more so. The problem is that news sites block AI scraping, but these blocks can be circumvented if the AI owner targets the content through the Wayback Machine. This is a thorny issue, but talks are ongoing and we hope both sides will reach some sort of resolution.
And on the readers’ side, if you’re wondering why the Internet Archive can’t switch to tape as a storage medium, the problem is that it’s a “living” archive of the web because it’s online, allowing users to access these snapshots of web pages on demand. As such, hard drives are required for this access to be responsive. The tape just isn’t up to the task in terms of performance in this case.
The Internet Archive does use tapes for longer-term content backups, but that’s only part of the puzzle in this regard. Hard drives are essential to the daily operation of the Wayback Machine as we know it, allowing users to quickly deliver the content they need online.

The best laptops for every budget
Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds.




