The Chinese technology company DeepSeek, known for its developments in the field of artificial intelligence, was in the center of a new scandal.: A number of independent experts suspect that the latest version of its language model, R1-0528, may have been trained on data obtained from Google Gemini models.
Suspicions increased after the publication of the Australian developer Sam Patch, specializing in the analysis of the so-called "emotional intelligence" of AI. On the X social network, he presented a comparative analysis of vocabulary, syntax, and intermediate reasoning models, finding a high degree of overlap between DeepSeek R1 and Google Gemini 2.5 Pro. According to him, the DeepSeek model not only uses similar phrases, but also demonstrates a similar output structure when solving problems.
Additional arguments were provided by the creator of the anonymous SpeechMap project, which analyzes” freedom of speech " in generative AI. He pointed out a suspicious overlap in the way logic chains are constructed between DeepSeek and Gemini, which may indicate that the conclusions of a competing model are traced.
This is not the first accusation against DeepSeek. In December 2024, users recorded that another model of the company — DeepSeek V3-in some cases identified itself as ChatGPT, which raised suspicions about the use of OpenAI session logs. At the beginning of 2025, OpenAI representatives said that they have evidence of the use of distillation, a method in which one model is trained on the conclusions of another, more advanced system.
Back at the end of 2024, Microsoft, an OpenAI partner, recorded a significant amount of data leakage through internal accounts that, presumably, could be associated with DeepSeek.
Although the distillation technique is widely used in AI development, major market players, including OpenAI and Google, prohibit using the findings of their models to create competing solutions. However, in the context of large-scale distribution of AI content on the Internet, the line between legal and infringement of intellectual property is becoming less and less clear. Contamination of open sources-the result of massive generation of texts, codes, and images by bots-significantly complicates the process of filtering data during training.
However, according to Nathan Lambert, a researcher at AI2 (Allen Institute for AI), training DeepSeek on Gemini results remains a very likely scenario. The expert noted that in the current conditions, using the Gemini API can be faster and cheaper for DeepSeek developers than creating a completely original architecture.
Amid growing concerns, leading technology companies are stepping up security measures. OpenAI, in particular, has been requiring identity verification for access to advanced models since April, restricting access for a number of countries, including China. Google, in turn, has started reducing the detail of traces in its AI Studio to make it harder to reverse engineer Gemini's behavior. Similar steps were taken by the company Anthropic in May of this year.
Official representatives of DeepSeek at the time of publication have not commented on the situation.