NVIDIA brings 1.5 billion to 32 billion parameter inference models to consumers

NVIDIAtodayReleased OpenReasoning-Nemotron, a collection of four streamlined inference models with 1.5 billion, 7 billion, 14 billion, and 32 billion parameters, all derived from DeepSeek R1 0528 with 67.1 billion parameters. By compressing a massive “teacher” model into four Qwen-2.5-based “student” models, NVIDIA enables advanced inference experiments even on standard gaming devices without worrying about high GPU fees and cloud usage.

The key isn't the sophisticated techniques, but the raw data. NVIDIA used the NeMo Skills pipeline to generate 5 million math, science, and code solutions, then fine-tuned each solution through pure supervised learning. Currently, the 32 billion parameter model scored 89.2 points on AIME24 and 73.8 points in the HMMT February competition, while even the 1.5 billion parameter version achieved solid scores of 55.5 and 31.5 points.

NVIDIA envisions these models as powerful research toolkits. All four checkpoints are available for download on Hugging Face, providing a solid foundation for exploring reinforcement learning-driven inference or customizing models for specific tasks. Using GenSelect mode (multiple iterations per question), it is possible to generate multiple parallel builds and select the best answer, resulting in exceptional 32B model performance that rivals or even exceeds OpenAI's o3-high performance on multiple math and coding benchmarks.

Because NVIDIA trained these models using only supervised fine-tuning and not reinforcement learning, the community has a clear, advanced starting point for future reinforcement learning experiments. For gamers and home enthusiasts, if you have a more powerful gaming GPU, we get a fully localized model that can get very close to the state-of-the-art.