AI can learn to think before it speaks

Stay informed with free updates

The writer is professor of computer science at the Université de Montreal and founder of Quebec Artificial Intelligence Institute Mila

Lack of internal deliberation abilities — thinking, in other words — has long been considered one of the main weaknesses of artificial intelligence. The scale of a recent advance in this by ChatGPT creator OpenAI is a point of debate within the scientific community. But it leads many of my expert colleagues and I to believe that there is a chance that we are on the brink of bridging the gap to human-level reasoning.

Researchers have long argued that traditional neural networks — the leading approach to AI — align more with “system 1” cognition. This corresponds to direct or intuitive answers to questions (such as when automatically recognising a face). Human intelligence, on the other hand, also relies on “system 2” cognition. This involves internal deliberation and enables powerful forms of reasoning (like when solving a maths problem or planning something in detail). It allows us to combine pieces of knowledge in coherent but novel ways.

OpenAI’s advance, which has not yet been fully released to the public, is based on a form of AI with internal deliberation made with their o1 large language model (LLM).

Better reasoning would address two major weaknesses of current AI: poor coherence of answers and the ability to plan and achieve long-term goals. The former is important in scientific uses and the latter is essential to create autonomous agents. Both could enable important applications.

The principles behind reasoning have been at the heart of AI research in the 20th century. An early example of success was DeepMind’s AlphaGo, the first computer system to beat human champions at the ancient Asian game of Go in 2015, and more recently AlphaProof, which engages with mathematical subjects. Here, neural networks learn to predict the usefulness of an action. Such “intuitions” are then used to plan by efficiently searching possible sequences of actions.

However, AlphaGo and AlphaProof involve very specialised knowledge (of the game of Go and specific mathematical domains respectively). What remains unclear is how to combine the breadth of knowledge of modern LLMs with powerful reasoning and planning abilities.

There have been some advancements. Already, LLMs come up with better answers to complex questions when asked to produce a chain of thought leading to their answer.

OpenAI’s new “o” series pushes this idea further, and requires far more computing resources, and therefore energy, to do so. With a very long chain of thought it is trained to “think” better.

We thus see a new form of computational scaling appear. Not just more training data and larger models but more time spent “thinking” about answers. This leads to substantially improved capabilities in reasoning-heavy tasks such as mathematics, computer science and science more broadly.

For example, whereas OpenAI’s previous model GPT-4o only scored about 13 per cent in the 2024 United States Mathematical Olympiad (on the AIME test), o1 reached an 83 per cent mark, placing it among the top 500 students in the country.

If successful, there are major risks to consider. We don’t yet know how to align and control AI reliably. For example, the evaluation of o1 showed an increased ability to deceive humans — a natural consequence of improving goal-reaching skills. It is also concerning that the ability of o1 in helping to create biological weapons has crossed OpenAI’s own risk threshold from low to medium. This is the highest acceptable level according to the company (which may have an interest in keeping concerns low).

Unlocking reasoning and agency are believed to be the main milestones on the road to human-level AI, also known as artificial general intelligence. There are therefore powerful economic incentives for large companies racing towards this goal to cut corners on safety.

o1 is likely to be only a first step. Although it does well at many reasoning and mathematical tasks, it looks like long-term planning has still not been achieved. o1 struggles on more complex planning tasks, suggesting that there is still work to be done to achieve the kind of autonomous agency sought by AI companies.

But with improved programming and scientific abilities, it is to be expected that these new models could accelerate research on AI itself. This could get it to human-level intelligence faster than anticipated. Advances in reasoning abilities make it all the more urgent to regulate AI models in order to protect the public.

Source link