On December 16, Beijing time, foreign media reported that ByteDance, which is lagging behind in the generative AI competition, wants to "cut corners." The company has been secretly using OpenAI's technology to develop its own large language model, which violates OpenAI's terms of service. Currently, ByteDance’s account has been suspended by OpenAI.
Foreign media said that in the field of AI, ByteDance’s behavior is usually regarded as a “discourteous” behavior and a direct violation of OpenAI’s terms of service.OpenAI’s terms of service state that models exported by the company cannot be used “to develop any AI models that compete with our products and services.” ByteDance purchased access to OpenAI through Microsoft, but Microsoft also formulated the same policy as OpenAI.
ByteDance’s internal documents obtained by foreign media confirm that ByteDance relies on OpenAI’s application programming interface (API) at almost every stage of development to develop its basic large language model code-named “Project Seed”, including training and evaluation models. Employees involved in Project Seed are well aware of the negative consequences of this practice. According to chat records of ByteDance employees on Lark, the overseas version of Feishu, an internal communication platform, they discussed how to whitewash evidence through "data desensitization."Foreign media said that ByteDance employees used OpenAI’s technology extensively, so much so that employees of the “Seed Project” often reached the maximum access limit of OpenAI API.
Internal documents show that ByteDance is using OpenAI’s technology more in the early stages of the “seed plan.” A few months ago, the company ordered the team to stop using GPT-generated text "at any stage of model development." Around this time, the company received approval to release its own large AI model "Beanbao", thus bringing the "Seed Project" online. However, ByteDance continues to use the API in ways that violate OpenAI and Microsoft's terms of service, including evaluating the performance of the models behind Doubao. One person with first-hand knowledge of ByteDance’s internal affairs noted,"They say they want to make sure everything is legal, but they really just don't want to get caught."
ByteDance spokesperson Jodi Seth responded that the data generated by GPT was used to annotate the model in the early development of the "Seed Project" and was removed from ByteDance's training data around the middle of this year. "ByteDance has obtained permission from Microsoft to use GPT API. We use GPT to drive products and functions in non-Chinese markets, but use our self-developed model to drive Doubao. Doubao is only available in China," Seth said in the statement.
OpenAI spokesperson Niko Felix issued a statement confirming that ByteDance’s account has been suspended."All API customers must comply with our usage policies to ensure that our technology is used for good. Although ByteDance rarely uses our APIs, we have suspended their accounts during further investigation. If we find that their use is not in compliance with company policies, we will require them to make the necessary changes or terminate their accounts." Felix said.
"Microsoft AI solutions such as the Azure Open AI service are part of our limited access framework, which means all customers must apply for and receive approval from Microsoft for access," Microsoft spokesman Frank Shaw said in a statement. "We also set standards and provide resources to help our customers use these technologies responsibly and comply with our terms of service. We also have processes in place to detect abuse and stop access when businesses violate our Code of Conduct."