We are finally beginning to understand how LLMs work: No, they don’t simply predict word after word

In context: The constant improvements AI companies have been making to their models might lead you to think we’ve finally figured out how large language models (LLMs) work. But nope – LLMs continue to be one of the least understood mass-market technologies ever. But Anthropic is attempting to change that with a new technique called circuit tracing, which has helped the company map out some of the inner workings of its Claude 3.5 Haiku model.

Circuit tracing is a relatively new technique that lets researchers track how an AI model builds its answers step by step – like following the wiring in a brain. It works by chaining together different components of a model. Anthropic used it to spy on Claude’s inner workings. This revealed some truly odd, sometimes inhuman ways of arriving at an answer that the bot wouldn’t even admit to using when asked.

All in all, the team inspected 10 different behaviors in Claude. Three stood out.

One was pretty simple and involved answering the question “What’s the opposite of small?” in different languages. You’d think Claude might have separate components for English, French, or Chinese. But no, it first figures out the answer (something related to “bigness”) using language-neutral circuits first, then picks the right words to match the question’s language.

This means Claude isn’t just regurgitating memorized translations – it’s applying abstract concepts across languages, almost like a human would.

Then there’s math. Ask Claude to add 36 and 59, and instead of following the standard method (adding the ones place, carrying the ten, etc.), it does something way weirder. It starts approximating by adding “40ish and 60ish” or “57ish and 36ish” and eventually lands on “92ish.” Meanwhile, another part of the model focuses on the digits 6 and 9, realizing the answer must end in a 5. Combine those two weird steps, and it arrives at 95.

However, if you ask Claude how it solved the problem, it’ll confidently describe the standard grade-school method, concealing its actual, bizarre reasoning process.

Poetry is even stranger. The researchers tasked Claude with writing a rhyming couplet, giving it the prompt “A rhyming couplet: He saw a carrot and had to grab it.” Here, the model settled on the word “rabbit” as the word to rhyme with while it was processing “grab it.” Then, it appeared to construct the next line with that ending already decided, eventually spitting out the line “His hunger was like a starving rabbit.”

This suggests LLMs might have more foresight than we assumed and that they don’t always just predict one word after another to form a coherent answer.

All in all, these findings are a big deal – they prove we can finally see how these models operate, at least in part.

Still, Joshua Batson, a research scientist at the company, admitted to MIT that this is just “tip-of-the-iceberg” stuff. Tracing even a single response takes hours and there’s still a lot of figuring out left to do.

Source link

Windows 11 sees a major spike in market share as Windows 10 nears its official EOL

New Intel CEO Lip-Bu Tan takes the reins with a familiar but faster plan

New study links lower proportions of certain sleep stages to brain changes associated with Alzheimer’s disease

Car firms fined for withholding recycling information

Liverpool, inflexible sheep farmers and why a change can do you good | Soccer

Why are bills going up and what can I do about it?

We are finally beginning to understand how LLMs work: No, they don’t simply predict word after word

Windows 11 sees a major spike in market share as Windows 10 nears its official EOL

New Intel CEO Lip-Bu Tan takes the reins with a familiar but faster plan

New study links lower proportions of certain sleep stages to brain changes associated with Alzheimer’s disease

Car firms fined for withholding recycling information

Liverpool, inflexible sheep farmers and why a change can do you good | Soccer

Why are bills going up and what can I do about it?

Windows 11 sees a major spike in market share as Windows 10 nears its official EOL

New Intel CEO Lip-Bu Tan takes the reins with a familiar but faster plan

Mozilla is working on Thunderbird “Pro,” a paid tier offering more features, including AI

New and simpler hack lets you bypass Microsoft account requirement when installing Windows

New Microsoft feature aims to prevent CrowdStrike-like outages on Windows

Acer's new Predator QD-OLED monitors deliver 240Hz at 4K and 1440p resolutions

Marcus Rashford ends drought as Aston Villa sink Preston to reach semi-finals | FA Cup

Intel’s comeback plan: Panther Lake in 2025, Nova Lake in 2026, says CEO Lip-Bu Tan