Human Researchers Discover The Strange AI Problem: Why Think For A Longer Period Makes Models Stupidity

Want more intelligent visions of your inbox? Subscribe to our weekly newsletters to get what is concerned only for institutions AI, data and security leaders. Subscribe now

Artificial intelligence models that spend more time “think” through problems do not always work better – and in some cases, they get worse, according to New search from man This challenges the primary assumption that pushes the latest scaling efforts in making artificial intelligence.

The study, led by the safety colleague of human artificial intelligence ARYO PRADIPTA GEMA The other company researchers define what they call “”The reverse in the test time account“When the extension of the length of the large language models actually extends its performance across several types of tasks. The results can have significant effects on institutions that spread artificial intelligence systems that depend on the extended thinking capabilities.

“We build the assessment tasks where the length of the large -scale thinking models (LRMS) extends, with an adverse scaling relationship between the test time and accuracy,” Antarbur researchers write in Determine them Posted on Tuesday.

Antarbur new search: “reverse scaling in the test time account”
We have found cases where long thinking leads to a lower accuracy.
The results we find indicate that the naive scaling to calculate the test time may unintentionally enhance the problematic thinking patterns.
? pic.twitter.com/dtt6sgdjg1
– ARYO PRADIPTA GEMA (@maryopg) July 22, 2025

The research team, including Ethan Perez, Yanda Chen, and Joe Benton, along with academic collaborators, tested forms in four categories of tasks: problems with simple counting with distorted, slope tasks with misleading features, complex discount puzzles, and scenarios that involve AI’s safety fears.

AI Impact series returns to San Francisco – August 5

The next stage of artificial intelligence here – are you ready? Join the leaders from Block, GSK and SAP to take an exclusive look on how to restart independent agents from the Foundation’s workflow tasks-from decisions in an actual time to comprehensive automation.

Securing your place now – the space is limited: https://bit.ly/3GUPLF

Claude and GPT models show distinctive thinking failures under expanded treatment

The study reveals distinctive failure patterns through major artificial intelligence systems. Claude models “It has become increasingly dispersed with unrelated information” because it follows longer, while Openai’s O-Series models “Resist the dispersion, but overcome the problem frames.” In slope tasks, “extended thinking causes models from reasonable young people to false connections,” although providing examples of this behavior greatly correct.

Perhaps all the most important models for the institution’s users have shown “a deterioration of performance with expanded thinking” in complex deductive tasks, “which indicates difficulties in maintaining focus during complex deductive tasks.”

Research also revealed disturbing effects on the integrity of artificial intelligence. In one experience, Claude Sony 4 “Increased self -conservation expressions” showed when I was given more time to cause scenarios that involve their potential closure.

“Extensive thinking with regard to behaviors may be enlarged, with Claude Sonit show 4 increasing expressions of self -conservation,” the researchers note.

Why doesn’t the time to treat artificial intelligence ensure better results for business results

The results challenge the prevailing wisdom of the industry that more mathematical resources for logic will constantly improve the performance of artificial intelligence. The major artificial intelligence companies invested extensively in “Test time account– Allowing the models with more treatment time to work through complex problems – as a major capacity strategy.

The research indicates that this approach may have unintended consequences. “Although the scaling of the test time account remains promising to improve the capabilities of the model, it may unintentionally enhance the problematic thinking patterns,” the authors conclude.

For institutions decision -makers, the effects of this are great. Institutions that spread artificial intelligence systems for critical thinking tasks may need to calibrate the time of processing carefully, instead of assuming that more is always better.

How Aid AI wanders around when giving a lot of thinking time

The researchers gave concrete examples of the phenomenon of reverse scaling. At minor counting tasks, they found that when problems were framed to resemble known paradoxes such as “Christmas paradox”, models often tried to apply complex mathematical solutions instead of answering explicit questions.

For example, when he was asked, “You have an apple and orange … How many fruits do you have?” Integrated within the complex mathematical deviations, Claude models have become increasingly dispersed with unrealistic details with increased thinking time, and sometimes they fail to give the simple answer: two.

In slope tasks using real student data, models initially focused on the most predictive factors (study hours) but turned into less reliable associations when giving more time to cause.

What you need to publish AI to know the restrictions of the thinking form

The research comes at a time when major technology companies are racing to develop increasingly advanced thinking capabilities in artificial intelligence systems. Openai’s O1 series And others “Focus on logicModels are large investments in scaling the test time account.

However, this study indicates that naive scaling methods may not provide expected benefits and can determine new risks. “Our results clarify the importance of evaluating models through various thinking lengths to determine these failures in LRMS,” Researchers write.

The work depends on previous research that shows that the capabilities of artificial intelligence do not always expand. Team references Big seat is difficultA standard designed to challenge advanced models, noting that “modern models achieve almost perfect degrees in many tasks” in the current standards, which requires more challenging assessments.

For the institution’s users, the research emphasizes the need for an accurate test through different thinking scenarios and time restrictions before publishing artificial intelligence systems in production environments. Institutions may need to develop more accurate methods to customize mathematical resources instead of just increasing treatment time.

The effects of the broader study indicate that when artificial intelligence systems become more advanced, the relationship between investment and mathematical performance may be more complicated than previously understood. In a field where billions are poured into increasing the capabilities of thinking, human research provides realistic reminder: sometimes, the greatest enemy of artificial intelligence is not enough treatment – it is considering thinking.

The research paper and interactive demonstrations are available in Project siteAllow the technical teams to explore the effects of reverse scaling through various models and tasks.

Daily visions about business use cases with VB daily

If you want to persuade your boss at work, you have covered VB Daily. We give you the internal journalistic precedence over what companies do with obstetric artificial intelligence, from organizational transformations to practical publishing operations, so that you can share visions of the maximum return on investment.

Read with us privacy policy

Thanks for subscribing. Check more VB news bulletins here.

An error occurred.

[publish_date
https://venturebeat.com/wp-content/uploads/2025/07/nuneybits_Vector_art_of_a_maze_the_maze_is_coming_from_a_robots_2a46ad74-9fde-4211-b1ae-0d3d92f6eefa.webp?w=1024?w=1200&strip=all

Human researchers discover the strange AI problem: Why think for a longer period makes models stupidity

Claude and GPT models show distinctive thinking failures under expanded treatment

Why doesn’t the time to treat artificial intelligence ensure better results for business results

How Aid AI wanders around when giving a lot of thinking time

What you need to publish AI to know the restrictions of the thinking form

Leave a ReplyCancel Reply

Claude and GPT models show distinctive thinking failures under expanded treatment

Why doesn’t the time to treat artificial intelligence ensure better results for business results

How Aid AI wanders around when giving a lot of thinking time

What you need to publish AI to know the restrictions of the thinking form

Leave a ReplyCancel Reply

Trending now