Is the difficulty of torturing AI still increasing? No, there are new problems in image reasoning. The issue sparked a heated discussion on Reddit:There is currently no AI that can truly solve complex reasoning problems.The problem definition is actually very simple - how many small cubes need to be added to form a complete large cube in the picture below?



In response to this problem, large models that support image input at home and abroad have given different answers.

Among them, o3 gave 45 answers, and Gemini 2.5Pro gave only 10 answers.



The answers of large domestic models, such as DeepSeek and Qwen3, are 14 and 9 respectively.



What is the reason for these different answers? Please read below.

Why are there different answers?

Core reasons:Large models have different understandings of the specifications of the large cubes they form..

o3 understands the specifications of the final large cube as 5x5x5, but it still gives the wrong answer to the number of missing small cubes. Using human vision, 125 small cubes are needed to form a 5x5x5 large cube, and 46 are already given in the picture, so the answer should be 79.

The AI's error stems from its analysis of the structure and number of small cubes in the picture.


Gemini 2.5 Pro understands the specifications of the final large cube as 4x4x4.


Both DeepSeek and Qwen set the final large cube specification to 3x3x3.



With different understandings of the scale of the final large cube, each major model will naturally give different answers.

but,Combined with the prompts and trying many times, there are also large models that can gradually find the direction..

Netizens provided some solutions to these wrong answers:

For example, o3 was used for testing, and some small hints were given in the first two attempts. Although this also resulted in wrong answers, the third time, the correct result was obtained even without prompts.


Netizens believe that it is due to ChatGPT's long-term memory function, which allows it to remember the tips from the first two attempts (such as considering how many cubes there are in the longest run, focusing on strict counting rather than estimation), taking into account the failure experience, and integrating them all together.

Therefore, it can be said that o3 will learn through memory. And this difficult problem will also become training data in the future.

Netizen: Humans can also be confused

Some people say that this is not a question of reasoning at all, but a question of visual understanding.

The author believes that the wrong answers are due to unclear statement of the problem, which leads to deviations in the AI ​​analysis process.

Even humans will have similar confusion when facing this kind of problem. For example, is the requirement of the question based on the original arrangement structure or can the structure be disrupted and rearranged?



And, if the content of the picture can be explained more clearly to the AI ​​(telling it the arrangement structure of the small cubes in the picture):


Then the answer obtained by o3 is also correct:


Whether it is 3x3x3, 4x4x4, 5x5x5, or NxNxN, is it too difficult for AI to answer questions that humans themselves cannot agree on?

Netizen:


AI: Maybe I need a more scientific training method!

Reference links:

https://www.reddit.com/r/singularity/comments/1kc2po7/not_a_single_model_out_there_can_currently_solve/?rdt=36638