The ÌÇÐÄVlog, Humboldt University, Berlin and the Zuse Institute Berlin, researchers say generative models “may encourage mid-level novelty but rarely produce radically original ideas”.
And this may “reinforce combinatorial rather than conceptual creativity”.
The research, published as a , pitted programmes against each other.
Thinking outside the box
Dr Paul Hanel, from ÌÇÐÄVlog’s Department of Psychology, worked with Dr Jennifer Haase from the Weizenbaum Institute and Humboldt University, Berlin, and Prof Sebastian Pokutta, Zuse Institute Berlin and Technical University Berlin, on the project.
The decline relates to divergent thinking tasks – or thinking outside of the box.
The researchers found the performance of Chat GPT4 dropped by 43-49% over the past 18-24 months - with the decline coming after the introduction of Chat GPT-4o in May 2024.
The researchers measured how AIs generate multiple, varied, and novel ideas in response to open ended problems.
Computers struggle to match best humans
And the study suggests that computers still struggle to match the best humans in generating original ideas – but still perform better than 60-80% of people.
Despite the fall in Chat GPT4’s performance the unreleased research preview GPT4.5 came out on top.
Dr Hanel said this mix in responses speaks to AI’s unpredictability.
“Some people believe that Large Language Models (LLMs) are superior to most humans whereas others believe that LLMs are only producing AI slop,” he said.
“Our research shows that LLMs can produce creative responses but also uncreative responses.
“They are fairly unpredictable.”
What's rare for humans is rarer for AI
Professor Pokutta added: “We often interact with LLMs in an anthropomorphized way and use them for creative work and as such it is natural to apply creative tests designed for humans to benchmark their creative potential.”
The research showed that LLMs performed better than the average person, but only 0.28% of AI responses reached the top 10% of human creativity benchmarks.
With humans still approximately 35.7 times more likely to produce standout ideas.
Dr Haase said: “What's rare for humans is even rarer for LLMs - highly unlikely, original ideas.
“But with easy access and endless motivation, LLMs are certainly great brainstorming partners for everyday creative tasks.”
The study looked at 14 different LLMs including the Chinese DeepSeek, Google’s Gemini, X’s Grok and Chat GPT.
True innovation demands human spark
The study used two well-established measures of divergent thinking. The Alternate Use Tasks asks participants to think of as many different applications for a certain object, while the Divergent Association Task asks participants to come up with 10 words that are as different as possible to each other.
The researchers say the study “speaks to the larger philosophical debate on artificial creativity”.
And suggest that without thoughtful use AI may unintentionally constrain creative diversity and reinforce existing patterns.
Dr Hanel added: “AI can be a tireless creative companion—but without human direction, it tends to circle around the familiar.
“True innovation still demands the human spark that asks not just ‘what’s next?’ but ‘what if?’”