One of the many concerns about generative AI is their ability to generate images using images scraped from the internet without the permission of the original creator. But a new tool can solve this problem by "poisoning" the data used to train the model.
MIT Technology Review highlights a new tool called Nightshade created by researchers at the University of Chicago. It works by making very small changes to image pixels that are invisible to the naked eye before uploading them. This poisons the training data used by tools like DALL-E, StableDiffusion, and Midjourney, causing the model to crash in unpredictable ways.
Some examples of how generative AI can incorrectly interpret images of people poisoned by nightshades include turning dogs into cats, cars into cows, hats into cakes, and handbags into toasters. It's also great for cueing different art styles: cubism becomes anime, cartoons become impressionism, conceptual art becomes abstraction.
A recent paper published by researchers on arXiv describes Nightshade as a hint-specific poisoning attack. Instead of poisoning millions of images, Nightshade can destroy stable diffusion cues with around 50 samples, as shown in the image below.
The researchers wrote that the tool could not only poison specific prompt terms like "dog," but could also "infiltrate" related concepts like "puppy," "hound," and "husky." It even affects indirectly related images; for example, poisoning "Fantasy Art" will turn the prompts for "a dragon," "a castle from Lord of the Rings," and "a painting by Michael Whelan" into something different.
Ben Zhao, a professor at the University of Chicago who led the team that created Nightshade, said he hopes the tool will act as a deterrent to AI companies that don’t respect artists’ copyrights and intellectual property rights. He acknowledged the potential for malicious use, but to do real damage to larger, more powerful models, attackers would need to poison thousands of images because these systems are trained on billions of data samples.
Generative AI model trainers can also use defenses against this practice, such as filtering high-loss data, frequency analysis, and other detection/removal methods, but Ben Zhao said they are not very robust.
Some large AI companies are giving artists the option to not have their work used in AI training datasets, but this can be an arduous process and doesn't address any work that may have been scrapped. Many believe artists should be able to opt in rather than having to opt out.