Meta announced that it collected public posts on Facebook and Instagram to train some of the functions of its new artificial intelligence virtual assistant. The social media giant emphasized that it does not include users' private posts or information shared with friends and family as part of the training data.
"We try to exclude data sets where personal information accounts for a large proportion of the data," Nick Clegg, the company's president of global affairs, told Reuters in an interview at Meta's Connect conference last week. The former British deputy prime minister added that the vast majority of the data used by Meta for training was originally set to be public. "
Meta last Wednesday announced the beta version of MetaAI, an advanced conversational assistant available on WhatsApp, Messenger and Instagram, and will be available on Ray-Ban Meta smart glasses and Quest3.
MetaAI is powered by the LLaMA2 language model and Emu text-to-image model released in July this year, both of which were trained on public posts from Facebook and Instagram.
Clegg said LinkedIn is an example where Meta intentionally does not use its content for data training due to privacy concerns.
One of the many controversial elements of generative AI continues to be the copyright issue of the content on which its LLMs are trained. This year, artists launched copyright lawsuits against StableDiffusion and Midjourney, while writers including John Grisham and George R.R. Martin sued OpenAI. Clegg said he expected there would be a "significant amount of litigation" over the question of whether creative content is covered by existing fair use doctrines.
"We think so, but I strongly doubt that's going to come up in a lawsuit," Clegg said.
Meta isn't the only company using user content to train artificial intelligence. Elon Musk's xAI is doing the same thing using user tweets, and Google confirmed in a policy update in July that all posted user content will be used for AI training.
Last Wednesday, Meta's boss Mark Zuckerberg also announced the launch of a number of artificial intelligence-based chatbots, which will be based on celebrities and influential figures, including Tom Brady, Mr. Beast, Paris Hilton, Kendall Jenner and Snoop Dogg. Meta says it will launch 28 bots also powered by LLaMA2. The event was not a complete success.