OpenAI is currently developing Media Manager, a tool to let creators and content owners better control how they want their content to be used in the training of AI models. OpenAI hopes to introduce the tool by 2025 as it works with creators, content owners, and regulators to set a standard across the AI industry.
Highlights:
- OpenAI is developing Media Manager to allow creators and content owners to control how their works are used in AI training.
- The tool aims to address the limitations of current protection solutions like robots.txt, which cannot handle content across multiple platforms.
- Media Manager’s development could be a response to criticism and lawsuits over OpenAI’s web scraping practices for AI training data.
Need for a tool like Media Manager
The need for OpenAI’s Media Manager arises from the limitations of current protections against AI data scraping. While creators can add code to their website’s robots.txt file to prevent web crawlers from accessing it for scraping, this solution falls short for those who post work on third-party platforms or wish to exempt only certain works from AI training data.
Last summer, OpenAI pioneered the use of web crawler permissions for AI, enabling publishers to express their preferences about the use of their content in AI systems. OpenAI takes these preferences into account when training new models. However, this solution is still not efficient.
Many creators do not control the websites where their content appears, and their work is often quoted, reviewed, remixed, reposted, and used as inspiration across multiple domains. As a result, there is a need for an efficient and scalable solution that allows content owners to express their preferences about the use of their content in AI systems, regardless of where it is hosted or shared. OpenAI’s proposed Media Manager aims to offer more granular control and options.
What is Media Manager?
OpenAI is developing Media Manager, a tool that will enable creators and content owners to identify their works and specify how they want those works to be included or excluded from machine learning research and training. Over time, OpenAI plans to introduce additional choices and features within Media Manager.
This will require cutting-edge machine learning research to build the first-ever tool of its kind that can help identify copyrighted text, images, audio, and video across multiple sources and reflect creator preferences. OpenAI is collaborating with creators, content owners, and regulators as they develop Media Manager.
How OpenAI could be designing Media Manager to counter legal actions?
The development of Media Manager might be in response to criticism and legal actions against OpenAI for scraping the web for training data without explicit permission, consent, or compensation from creators. OpenAI has defended its practices by pointing to the long-standing acceptance of web crawling and scraping by many companies.
Media Manager appears to be OpenAI’s response to growing criticism over its approach to developing AI, which heavily relies on scraping publicly available data from the web.
Recently, eight prominent US newspapers, including the Chicago Tribune, sued OpenAI for IP infringement. They accused the company of pilfering articles to train its generative AI models for commercial use without compensation or credit to the source publications.
Generative AI models like OpenAI’s are trained on massive datasets, typically sourced from public websites and online repositories. OpenAI and other AI vendors argue that scraping this public data for model training falls under fair use, the legal doctrine permitting the transformative use of copyrighted material. However, this position is facing increasing scrutiny.
Recently, OpenAI claimed in a blog post that developing practical AI models would be unfeasible without leveraging copyrighted works, further showing the contentious nature of this issue and the need for a balanced solution that considers the interests of both AI developers and content creators.
Despite ongoing legal battles, OpenAI aims to position itself as a cooperative and ethical company through its Media Manager tool. However, some creators may view this as too little, too late, as their works have already been used to train AI without consent. OpenAI maintains that it doesn’t store copies of scraped data, merely using it to generate new content and insights.
The Media Manager tool could potentially offer a more efficient and user-friendly way for creators to control which of their works are used for AI training, compared to existing options. However, it remains unclear whether creators will trust the tool, and whether it will be able to effectively block training by rival AI models that may continue scraping without consent.
Conclusion
Media Manager presents an opportunity for OpenAI to balance the interests of AI developers and content creators. While Media Manager could offer creators more control over their content’s use in AI, its success depends on gaining their trust and effectively setting a standard across the industry.