{"id":6361,"date":"2024-09-03T11:19:34","date_gmt":"2024-09-03T11:19:34","guid":{"rendered":"https:\/\/favtutor.com\/articles\/?p=6361"},"modified":"2024-09-03T11:19:35","modified_gmt":"2024-09-03T11:19:35","slug":"news-publishers-apple-ai-block","status":"publish","type":"post","link":"https:\/\/favtutor.com\/articles\/news-publishers-apple-ai-block\/","title":{"rendered":"News Sites Push Back Against Apple\u2019s AI Crawlers: Here\u2019s Why"},"content":{"rendered":"\n<p>News publishers are finally finding out the worth of their human-written content. They don&#8217;t want to give it for free to AI Giants. Now, Getting the latest information for the LLMs will not be easy as Apple&#8217;s AI crawlers are getting blocked.<\/p>\n\n\n\n<p><strong>Highlights:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Major news publishers including The New York Times and Financial Times are blocking Apple AI for training.<\/li>\n\n\n\n<li>This is after Apple\u2019s new tool lets websites stop their data from being used to train their AI models.<\/li>\n\n\n\n<li>This shows growing tension between content publishers and AI companies on how to adjust in this new world.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Why Are Websites Blocking Apple\u2019s AI?<\/strong><\/h2>\n\n\n\n<p>So, this is what happened. Apple launched a tool that allows websites to opt out of their content from being accessed, especially for training their AI models. This is a great thing for website owners but maybe not for the iPhone manufacturer. At least, that&#8217;s what <a href=\"https:\/\/www.wired.com\/story\/applebot-extended-apple-ai-scraping\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">WIRED&#8217;s new report<\/a> says.<\/p>\n\n\n\n<p>As we know, AI models like GPT-4o (technically you can say ChatGPT) need information to train their systems. They get this by getting content from websites on the internet. But they also need the latest up-to-date information about the current events to provide as much accurate information as they can. <\/p>\n\n\n\n<p>So, they also want to get the information from news publishers. But that doesn&#8217;t mean they get it for free. They need permission from these sites, otherwise, it would be an infringement of Intellectual Property!<\/p>\n\n\n\n<p><strong>Big news publishers like The New York Times, The Financial Times, Vox Media, and The Atlantic have already chosen to block Apple&#8217;s AI crawlers from using their content.<\/strong><\/p>\n\n\n\n<p>The new crawler called <a href=\"https:\/\/support.apple.com\/en-us\/119829\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Applebot-Extended<\/a> gives control to publishers over how their content can be used to train Apple\u2019s foundation models that will later be used to power generative AI features across their products. <\/p>\n\n\n\n<p>The company said, &#8220;<em>Allowing Applebot-Extended will help improve the capabilities and quality of Apple\u2019s generative AI models over time<\/em>.&#8221; But why should news publishers give their hard-worked content for free?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Here&#8217;s why News Publishers are doing it.<\/strong><\/h2>\n\n\n\n<p>News Publishers have updated their robots.txt files to block Applebot-Extended user agents. They\u2019re doing this to make sure their content isn\u2019t used without their permission. Vox Media\u2019s Lauren Starke said:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;We\u2019re blocking Applebot-Extended across all of Vox Media\u2019s properties, as we have done with many other AI scraping tools when we don\u2019t have a commercial agreement with the other party.&#8221;<\/p>\n<cite>Lauren Starke, Vox Media<\/cite><\/blockquote>\n\n\n\n<p>This could lead to changes in how AI models are trained and might also result in new deals where companies pay to access content. <a href=\"https:\/\/favtutor.com\/articles\/openai-deals-content-websites\/\">OpenAI has partnered with many news publishers<\/a> recently and they have made <a href=\"https:\/\/favtutor.com\/articles\/openai-stack-overflow-deal\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">a deal with Stack Overflow<\/a> as well.<\/p>\n\n\n\n<p>So, if the AI giants want content, they need to pay the price for it. Nothing comes free for them. The decisions being made now could have a big impact on how AI and online content work together in the future.<\/p>\n\n\n\n<p>Note that the new user agent is an update to their original web crawler, Applebot. While the latter helps power search features like Siri, this new extension lets websites choose if their data can be used to train their AI models. News publishers don&#8217;t have a problem with Applebot but they are not ready for Applebot-Extended.<\/p>\n\n\n\n<p>In a similar study, Ben Welsh has found that about 25% of the news websites out of 1,167 (mainly from the US) are blocking the AI crawlers as well.<\/p>\n\n\n\n<p><strong>Conclusion:<br><\/strong>The fight over AI data scraping is heating up, with more websites blocking their content from being used to train LLMs. This could change how AI is developed and how online content is protected. The choices being made today will likely shape the future of AI and digital content.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Find out why many major news publishers are not allowing Apple&#8217;s Applebot crawlers to access their content for AI training.<\/p>\n","protected":false},"author":8,"featured_media":6362,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jnews-multi-image_gallery":[],"jnews_single_post":null,"jnews_primary_category":{"id":"","hide":""},"footnotes":""},"categories":[57],"tags":[56,97,332],"class_list":["post-6361","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-ai","tag-apple","tag-applebot"],"_links":{"self":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/6361","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/comments?post=6361"}],"version-history":[{"count":2,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/6361\/revisions"}],"predecessor-version":[{"id":6365,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/6361\/revisions\/6365"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media\/6362"}],"wp:attachment":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media?parent=6361"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/categories?post=6361"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/tags?post=6361"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}