OpenAI launches Flex processing API for cheaper, slower AI tasks

In an effort to compete more aggressively with rival AI companies like Google, OpenAI has introduced Flex Processing, an API option that offers lower prices for using AI models, but with slower response times and "occasional resource unavailability."

Flex processing is a beta version of OpenAI's recently released o3 and o4-mini inference models and is designed to handle low-priority and "non-production" tasks such as model evaluation, data enrichment and asynchronous workloads, OpenAI said.

It cuts API costs by a full half. For o3, Flex processing prices are $5 per million input tokens (~750,000 words) and $20 per million output tokens, compared to standard prices of $10 per million input tokens and $40 per million output tokens. For o4-mini, Flex dropped the price from $1.10 per million input words and $4.40 per million output words to $0.55 per million input words and $2.20 per million output words.

The introduction of the Flex processors comes as prices for cutting-edge artificial intelligence continue to rise, and competitors are launching cheaper, more efficient, budget-oriented models. On Thursday, Google launched Gemini 2.5 Flash. The performance of this inference model is equivalent to or even better than DeepSeek R1, and the input word cost is lower.

In an email to customers announcing Flex pricing, OpenAI also noted that developers at levels 1-3 of its usage tiers must complete a newly introduced authentication process to access o3. (Levels are determined by the amount spent on OpenAI services.) O3’s inference digest and streaming API support also require authentication.

OpenAI has previously said that authentication is intended to deter bad actors from violating its usage policies.

learn more:

https://platform.openai.com/docs/guides/flex-processing