OpenAI suspects that China's DeepSeek AI models, significantly cheaper than Western counterparts, may have been trained using OpenAI data. This revelation, following Nvidia's massive stock plummet, has been termed a "wake-up call" for the U.S. tech industry by Donald Trump.
DeepSeek's R1 model, built on the open-source DeepSeek-V3, boasts significantly lower training costs (estimated at $6 million) and computational requirements compared to Western models like ChatGPT. While this claim is disputed, it has fueled investor concerns about the billions invested in AI by American tech giants, causing a market downturn affecting companies like Nvidia, Microsoft, Meta, Alphabet, and Dell. DeepSeek's app even topped U.S. download charts amidst the controversy.
OpenAI and Microsoft are investigating whether DeepSeek violated OpenAI's terms of service by employing "distillation," a technique to extract data from larger models, potentially using OpenAI's API. OpenAI acknowledges that Chinese companies frequently attempt to replicate leading U.S. AI models and is collaborating with the U.S. government to protect its intellectual property.
David Sacks, President Trump's AI czar, supports the claim that DeepSeek employed data distillation from OpenAI models. He anticipates further measures from leading AI companies to prevent such practices.
The situation highlights a significant irony: OpenAI, itself accused of utilizing copyrighted internet content to train ChatGPT, is now protesting DeepSeek's alleged actions. This hypocrisy has been widely noted on social media. OpenAI previously argued to the UK's House of Lords that training large language models without copyrighted material is impossible, a position further underscored by the ongoing lawsuits from the New York Times and 17 authors alleging copyright infringement. These lawsuits, along with a 2018 U.S. Copyright Office ruling against AI-generated art copyright, highlight the complex legal landscape surrounding AI training data.