A large-scale study led by the Department of Psychology at the University of Montreal in Canada shows that generative artificial intelligence systems have outperformed ordinary human participants on some standardized creativity tests, but the most creative humans are still significantly ahead, which means that AI is more of a powerful creative aid than a replacement for human creators.

The research was led by Karim Jerbi, a professor at the University of Montreal, and the team members included Yoshua Bengio, a pioneer in deep learning and a professor at the University of Montreal. They conducted a systematic evaluation of a number of mainstream large language models (including ChatGPT, Claude, Gemini, etc.) and compared them with data from more than 100,000 human subjects. It is currently one of the largest comparative studies of human-machine creativity. The paper was published in "Scientific Reports", titled "Divergent creativity in humans and large language models".
The results show that on some tests used to measure "divergent language creativity", the average score of some large language models (including GPT‑4) has exceeded that of ordinary humans. Jerbi noted that this finding is “perhaps surprising and even disturbing,” but equally important: Even the strongest AI systems still perform worse than the most creative human individuals.
Further analysis showed that when the researchers only looked at the half of the participants whose creativity levels were in the "top half" of humans, this group's average performance was already better than that of all tested AI systems, and among the top 10% of creative people, the gap between humans and AI was even more obvious. Co-first authors Antoine Bellemare-Pépin, a postdoc at the University of Montreal, and François Lespinasse, a doctoral student at Concordia University, pointed out that this shows that "the highest level of creativity is still a unique human advantage."
In order to fairly compare humans and AI, the team used a variety of methods. The core tool was the "Divergent Association Task" (DAT) developed by collaborator Jay Olson, a researcher at the University of Toronto in Canada. This psychological test requires participants - whether human or AI - to give 10 words that are as semantically different from each other as possible in one response, such as "galaxy, fork, freedom, algae, harmonica, quantum, nostalgia, velvet, hurricane", etc., and measures divergent thinking ability by calculating the semantic distance between words.
Previous research has shown that human performance on the DAT is highly correlated with the results of other traditional creativity tests such as creative writing, idea generation, creative problem solving, etc., and therefore can be regarded as a quick proxy indicator of broader creative cognitive processes. Another feature of DAT is that it is easy to operate and short in time, usually taking only two to four minutes to complete, and is open to the public in an online form.
After completing the basic word test, the research team further examined whether this "linguistic level" performance could be transferred to more complex creative tasks. They arranged for the AI system to compete head-to-head with human participants in multiple writing scenarios, including creating haiku (three-line short poems), writing movie plot summaries, and creating short stories, and then evaluated the quality of the works. The results continue the previous pattern: in some tasks, the average performance of AI is better than that of ordinary humans, but among the higher-level human creators, especially those who are best at writing and storytelling, the human advantage is still obvious.
The research also explored a key question: Is AI's "creativity" controllable and adjustable? The answer is yes. The article pointed out that an important technical parameter is the "temperature" of the model, which affects the predictability and diversity of the output content: when the temperature is lower, the answers generated by AI are more conservative and more predictable; when the temperature is higher, the output is more variable and jumpy, often leading to more risky but also more innovative associations.
In addition, the way the prompt is written also has a significant impact. The study found that when instructions encouraged the model to consider the origin and structure of words etymologically, the AI was more likely to make unexpected associations and score higher on creativity ratings. This means that the creativity displayed by AI relies heavily on human input and guidance, and the design of human-computer interaction is becoming one of the core links in the creative process.
In the context of widespread concerns that AI will "replace" creative workers, this study provides a relatively prudent judgment. Jerbi emphasized that although AI now reaches or even surpasses the average human creativity level in some standardized tests, it is misleading to overly understand the human-machine relationship as "competition." In his view, generative AI is first of all an extremely powerful creative tool. "It will not replace creators, but will profoundly change the way creators imagine, explore and create - of course, this depends on whether people choose to use it."
At the end of the paper, it is pointed out that rather than predicting the end of creative careers, it is better to regard AI as a "creative assistant" that expands the boundaries of imagination. The future creativity ecology may no longer be a simple battle between humans and machines, but a new paradigm of human-machine collaboration: AI provides humans with inspiration, variations, and testing grounds, while humans grasp aesthetics, value, and meaning at a higher level. Jerbi believes that this kind of research that positively compares the capabilities of humans and machines is forcing the academic community and the public to rethink the fundamental question of "what counts as creativity."