Altman said GPT-4o native image generation is now live in ChatGPT and OpenAI’s AI video generation product Sora, available to subscribers of the company’s $200 per month Pro plan. OpenAI said the feature will soon be rolled out to Plus and free users of ChatGPT, as well as developers using the company’s API services.
GPT-4o with image output "thinks" longer than the image generation model it effectively replaces, DALL-E3, producing what OpenAI describes as more accurate and detailed images. GPT-4o can edit existing images, including images with people in them - transforming them or "fixing" details such as foreground and background objects.
OpenAI did not disclose what image data it used to implement the new image generation feature. Many generative AI vendors view training data as a competitive advantage and are therefore secretive about it and the information surrounding it. But training data details could also trigger litigation related to intellectual property, another reason companies are reluctant to disclose too much information.
OpenAI provides an opt-out form that allows creators to request that their works be removed from their training data sets. The company also said it respected requests to ban its web-scraping bots from collecting training data, including images, from websites.
ChatGPT’s upgraded image generation capabilities come on the heels of Google’s experimental native image output for one of its flagship models, Gemini 2.0 Flash. This powerful feature is going viral on social media — and not necessarily for the good reasons. The graphics component of Gemini2.0 Flash has few protections, allowing people to remove watermarks and create images depicting copyrighted characters.