Why children in the field of logical thinking do better than chatt & Co.

Large Language Models (LLM) and Chatgpt have gained popularity in recent years. But when it comes to logic tasks, artificial intelligences have so far deliver moderate results.

Study of the University of Bristol

A study by Nezhurina and colleagues at the University of Bristol published in June 2024 was able to show the difficultest of the simplest logic tasks for LLM. The researchers asked programs such as GPT-3.5/4, Claude, Gemini or Mistral a simple question: “Alice has a brothers and she has sisters. How many sisters does the brother of Alice have?” While most adults and children would recognize the right solution “M + 1” straight away, the KIS cut off below average.

Results sobering

Even when the variables N and M were replaced with concrete numbers, the LLM could not provide a correct answer. According to the study “it comes [bei den meisten Modellen] Too serious disorders and many are unable to give even a single correct answer. “Only open ais GPT -4 and Claude 3 Opus were able to at least partially produce correct answers – in around 30 percent of cases. But even more worrying than the wrong answers are the supposed arguments for their correctness and persistence with which they were defended.

Error

“This breakdown can be considered dramatic not only because it happens in such a simple problem, but also because the models tend to describe their wrong solutions as correct, while they often deliver confabulations to explain the given answer, where they imitate an argumentation -like tone, but provide nonsensical arguments as support for the equally nonsensical, final answers,” said the researchers in their work. While some arguments maintained the seemingly logical conclusion, others were too simple. One reason was, for example, “this conclusion is simple and clear”.

Warning for companies

As a consequence, the researchers draw that future work should deal with the origins of the Reasoning deficits, i.e. the lack of logical thinking. In addition, current benchmarks should be revised to disclose weaknesses. The results of the study should be “warning against excessive claims […] serve that are often made by commercial companies to present their models as a mature product for users “.

J. Vogel / Redaktion Finanzen.net

ttn-28

Prime for Young Adults

Audible Standard Trial

Bytaketonews

Study of the University of Bristol

Results sobering

Error

Warning for companies

Cucurella’s Bold Tattoo: A Tribute to His Coach After World Cup Glory

Last Minute Deal? The Wild Yan Diomande Transfer Saga Unfolds!

FC Bayern’s Manuel Neuer Hints at Potential Retirement Soon

Manuel Neuer Announces Retirement Plans: A Legendary Goalkeeper’s Farewell?

Infantino’s Controversial Plan: Selling World Cup Stakes to Investors?

Iran Conflict: Satellite Images Reveal Scale of Oil Pollution in Hormuz Strait

Cucurella’s Bold Tattoo: A Tribute to His Coach After World Cup Glory

Urgent Warning: Sparkasse Online Banking May Be Suspended Soon!

41 Million People Living with HIV: Urgent Global Health Concerns

Tatort Star Johannes Krisch Dies: His Colorful Life with Larissa Fuchs

You missed

Iran Conflict: Satellite Images Reveal Scale of Oil Pollution in Hormuz Strait

Cucurella’s Bold Tattoo: A Tribute to His Coach After World Cup Glory

Urgent Warning: Sparkasse Online Banking May Be Suspended Soon!

41 Million People Living with HIV: Urgent Global Health Concerns