News publishers are finally finding out the worth of their human-written content. They don’t want to give it for free to AI Giants. Now, Getting the latest information for the LLMs will not be easy as Apple’s AI crawlers are getting blocked.
Highlights:
- Major news publishers including The New York Times and Financial Times are blocking Apple AI for training.
- This is after Apple’s new tool lets websites stop their data from being used to train their AI models.
- This shows growing tension between content publishers and AI companies on how to adjust in this new world.
Why Are Websites Blocking Apple’s AI?
So, this is what happened. Apple launched a tool that allows websites to opt out of their content from being accessed, especially for training their AI models. This is a great thing for website owners but maybe not for the iPhone manufacturer. At least, that’s what WIRED’s new report says.
As we know, AI models like GPT-4o (technically you can say ChatGPT) need information to train their systems. They get this by getting content from websites on the internet. But they also need the latest up-to-date information about the current events to provide as much accurate information as they can.
So, they also want to get the information from news publishers. But that doesn’t mean they get it for free. They need permission from these sites, otherwise, it would be an infringement of Intellectual Property!
Big news publishers like The New York Times, The Financial Times, Vox Media, and The Atlantic have already chosen to block Apple’s AI crawlers from using their content.
The new crawler called Applebot-Extended gives control to publishers over how their content can be used to train Apple’s foundation models that will later be used to power generative AI features across their products.
The company said, “Allowing Applebot-Extended will help improve the capabilities and quality of Apple’s generative AI models over time.” But why should news publishers give their hard-worked content for free?
Here’s why News Publishers are doing it.
News Publishers have updated their robots.txt files to block Applebot-Extended user agents. They’re doing this to make sure their content isn’t used without their permission. Vox Media’s Lauren Starke said:
“We’re blocking Applebot-Extended across all of Vox Media’s properties, as we have done with many other AI scraping tools when we don’t have a commercial agreement with the other party.”
Lauren Starke, Vox Media
This could lead to changes in how AI models are trained and might also result in new deals where companies pay to access content. OpenAI has partnered with many news publishers recently and they have made a deal with Stack Overflow as well.
So, if the AI giants want content, they need to pay the price for it. Nothing comes free for them. The decisions being made now could have a big impact on how AI and online content work together in the future.
Note that the new user agent is an update to their original web crawler, Applebot. While the latter helps power search features like Siri, this new extension lets websites choose if their data can be used to train their AI models. News publishers don’t have a problem with Applebot but they are not ready for Applebot-Extended.
In a similar study, Ben Welsh has found that about 25% of the news websites out of 1,167 (mainly from the US) are blocking the AI crawlers as well.
Conclusion:
The fight over AI data scraping is heating up, with more websites blocking their content from being used to train LLMs. This could change how AI is developed and how online content is protected. The choices being made today will likely shape the future of AI and digital content.