The Wayback Machine faces another AI threat: ridiculously high prices for hard drives

The Wayback Machine is threatened again by AI
The AI boom has tripled the price of the large hard drives needed for these vast web archives
This is an additional danger for the Wayback Machine, which is also in trouble due to its crawler being blocked by news sites, which is also caused by AI.

It’s an increasingly desperate time for those trying to keep track of the history of the web, as AI once again proves to be a serious obstacle to the efforts of organizations like the Internet Archive – and this time, it’s soaring hard drive prices.

You may remember that last month we covered another angle of the difficulties caused by AI at the Internet Archive’s Wayback Machine. This is the nonprofit’s story of the Web, and there’s a problem in that, as part of measures designed to thwart the AI that scrapes their content, online news sites are increasingly blocking the crawler that the Internet Archive uses to compile snapshots of the Web pages that make up the archive.

And now, 404 Media reports (via Tom’s Hardware) that the Internet Archive is suffering from the AI-driven shortage of hard drives (as larger drives are needed in data centers for AI workloads).

Analysis: There are many workarounds, but what about bands?

So, is the Wayback Machine really in danger? Will we see the cogs of the “living history of the Internet” begin to come undone? Well, there’s no immediate peril, because apparently donors and the community around the Wayback Machine are banding together to get around the problem of spiraling travel costs.

Still, this is clearly a problem going forward – and blocking the Internet Archive’s web crawler is even more so. The problem is that news sites block AI scraping, but these blocks can be circumvented if the AI owner targets the content through the Wayback Machine. This is a thorny issue, but talks are ongoing and we hope both sides will reach some sort of resolution.

And on the readers’ side, if you’re wondering why the Internet Archive can’t switch to tape as a storage medium, the problem is that it’s a “living” archive of the web because it’s online, allowing users to access these snapshots of web pages on demand. As such, hard drives are required for this access to be responsive. The tape just isn’t up to the task in terms of performance in this case.

The Internet Archive does use tapes for longer-term content backups, but that’s only part of the puzzle in this regard. Hard drives are essential to the daily operation of the Wayback Machine as we know it, allowing users to quickly deliver the content they need online.