How a big shift in training LLMs led to a capability explosion

Date:

Share:



In a recent article, Ars Technica’s Benj Edwards explored some of the limitations of reasoning models trained with reinforcement learning. For example, one study “revealed puzzling inconsistencies in how models fail. Claude 3.7 Sonnet could perform up to 100 correct moves in the Tower of Hanoi but failed after just five moves in a river crossing puzzle—despite the latter requiring fewer total moves.”

Conclusion: Reinforcement learning made agents possible

One of the most discussed applications for LLMs in 2023 was creating chatbots that understand a company’s internal documents. The conventional approach to this problem was called RAG—short for retrieval augmented generation.

When the user asks a question, a RAG system performs a keyword- or vector-based search to retrieve the most relevant documents. It then inserts these documents into an LLM’s context window before generating a response. RAG systems can make for compelling demos. But they tend not to work very well in practice because a single search will often fail to surface the most relevant documents.

Today, it’s possible to develop much better information retrieval systems by allowing the model itself to choose search queries. If the first search doesn’t pull up the right documents, the model can revise the query and try again. A model might perform five, 20, or even 100 searches before providing an answer.

But this approach only works if a model is “agentic”—if it can stay on task across multiple rounds of searching and analysis. LLMs were terrible at this prior to 2024, as the examples of AutoGPT and BabyAGI demonstrated. Today’s models are much better at it, which allows modern RAG-style systems to produce better results with less scaffolding. You can think of “deep research” tools from OpenAI and others as very powerful RAG systems made possible by long-context reasoning.

The same point applies to the other agentic applications I mentioned at the start of the article, such as coding and computer use agents. What these systems have in common is a capacity for iterated reasoning. They think, take an action, think about the result, take another action, and so forth.

Timothy B. Lee was on staff at Ars Technica from 2017 to 2021. Today, he writes Understanding AI, a newsletter that explores how AI works and how it’s changing our world. You can subscribe here.



Source link

━ more like this

Flat September for hospitality sales but pub trends positive as crucial final quarter starts  – London Business News | Londonlovesbusiness.com

Britain’s top managed restaurant, pub and bar groups delivered like-for-like sales growth of 0.2% in September, according to the latest CGA RSM Hospitality...

UK ETA for US citizens: Understanding the new travel rules – London Business News | Londonlovesbusiness.com

Venturing across the pond? There are some new pre-departure arrangements you must make. Everyone understands the special relationship between the US and the...

Gold falls below $4,100; CPI in focus – London Business News | Londonlovesbusiness.com

Gold retreated below USD 4,100 per ounce on Friday, resuming its decline after a brief rebound yesterday. The metal saw an end to its...

The best headphones for running in 2025

Whether you’re already an avid runner or hope to be one as you start a new training regimen, you’ll get more out of...

Nvidia: Outlook amid escalating US–China trade tensions – London Business News | Londonlovesbusiness.com

Over the past two years, NVIDIA (NASDAQ: NVDA) has become the defining symbol of the artificial intelligence era, with its revenue, profit, and...
spot_img