OpenAI packs up 5 years of News Corp. rights for big model training and Q&A!

ChatGPT expands again, this time to include news content from more than a dozen media outlets.

On May 22, local time, OpenAI announced a multi-year agreement with News Corp to gain access to current and archived content from major news and information publications, including The Wall Street Journal, Barron’s, The New York Post, The Times, The Sun, and more than a dozen other media outlets.

Under the agreement, OpenAI will be able to display content from News Corp. media outlets in ChatGPT and use it to answer users’ questions. Meanwhile, News Corp. will share journalistic expertise to help ensure that OpenAI’s products meet the highest journalistic standards.

According to foreign media, citing people familiar with the matter, the deal is for five years and may be worth more than $250 million (roughly Rs. 1.81 billion), including in the form of cash as well as credits for the use of OpenAI’s technology.

In addition, the partnership does not include access to other News Corp. businesses.OpenAI said the ultimate goal is to empower people to make informed choices based on reliable information and news sources.

OpenAI CEO Sam Altman said, “Our partnership with News Corp. is a proud moment for journalism and technology. We value News Corp’s history as a global leader in breaking news coverage and are excited to enhance user access to its high-quality reporting. Together, we will lay the groundwork for a future where AI deeply respects, enhances and upholds the standards of world-class journalism.”

Previously, OpenAI has announced a partnership with US social platform Reddit, which provides access to real-time content from the latter’s data API (application interface) and brings content into products such as ChatGPT. It has also reached an agreement with a number of media outlets, including the Financial Times, the Associated Press, and Le Monde, to license the use of repositories to train AI (artificial intelligence) models.

However, according to foreign media reports, the content of the above cooperation with different media is slightly different, for example, the Associated Press cooperation is only worth millions of dollars per year, mainly focusing on the content of the text archive library to be used for training. the value of OpenAI’s cooperation with the Financial Times is in the range of 5-10 million dollars per year, which includes the display of news content.

However, OpenAI’s path to copyright partnerships has not been easy. Dozens of media outlets, including The New York Times, The Intercept, and the New York Daily News, have filed copyright infringement lawsuits accusing OpenAI of illegally using their news content to train AI (artificial intelligence) models.

Regarding the use of publicly available Internet material to train AI models, OpenAI says that this fair use is supported by a long history of existence and extensive precedent. This principle is fair to creators and necessary for innovators. That said, the company also offers a simple opt-out process for publishers to prevent the company’s tools from accessing sites like the New York Times.

OpenAI says that because the Big Model learns from a huge collection of human knowledge, any one sector is only a small part of the overall training data, and that any single data source, including The New York Times, is not important to the Big Model’s intended learning.

The Wall Street Journal, which is owned by News Corp, notes that AI companies are hungry for publisher content that can help refine models and create new products, such as AI-powered search. Publishers are seeking to ensure that they can be paid handsomely for the use of their intellectual property, sparking complex and sometimes quite heated negotiations across the industry.

According to foreign media reports, in the agreement between News Corp. and OpenAI, it is ensured that news content will not be made available on ChatGPT immediately after publication. This is an area of concern for publishers at the moment, namely the loss of traffic and advertising revenue for publishers as AI provides complete answers based on news content that users don’t have to log into a news site to pay for access.

OpenAI is looking to give relevant links under the summaries of answered content, allowing users to see which publishing partners the content originated from, foreign media quoted people familiar with the matter as saying.