← All reports

80000HOURS

What the hell happened with AGI timelines in 2025?

Podcast · AI & Technology · 11 Feb 2026 · 25m · source

⚑ BOTTOM LINE

The rollercoaster of AGI timelines in 2025β€”from extreme optimism about reasoning models to renewed pessimismβ€”was driven by realisation that initial gains came more from expensive inference scaling (more thinking time) than generalisable reasoning ability, with diminishing returns making sustained exponential progress economically unsustainable.


πŸ“ THESIS

The emergence of reasoning models (OpenAI's O1/O3) in early 2025 created a surge of optimism about near-term AGI, but technical realities emerged through the year showing most gains came from expensive inference scaling rather than generalisable reasoning capabilities, leading forecasters to extend timelines by ~2.5 years.


πŸ’‘ KEY INSIGHTS

  1. Reasoning models' shine wore off when their limitations became apparent β€” Early excitement about models like O1/O3 stemmed from dramatic improvements in mathematics, coding, and logic, but these gains didn't generalise to "messier" real-world domains like booking flights or organising events.1 [⚠]

  2. Inference scaling (thinking time) drove most improvements but isn't scalable β€” More than two-thirds of reasoning models' performance came from giving them more time to think, which created a one-time boost but can't be economically scaled further (costs would exceed human software engineers).2 [⚠]

  3. Reinforcement learning for reasoning is computationally inefficient β€” Training models through reinforcement learning on coding/maths problems may be up to 1,000,000Γ— less compute-efficient than pre-training, as models must generate "vast numbers of garbage failed attempts."3 [⚠]

  4. The AI 2027 scenario underestimated automation bottlenecks β€” Even if software engineering becomes fully automated, AI research development would bottleneck on other aspects, causing the AI 2027 scenario writers to push their timelines back 1-2 years.4 [βœ“]

  5. Economic realities constrain exponential progress β€” Frontier AI agents now cost hundreds of dollars per hour to run, making further exponential scaling economically irrational and suggesting slower progress in 2026-2027.5 [⚠]

  6. Near-frontier capabilities show steep cost-performance improvements β€” While frontier AI is expensive, near-frontier models like Gemini 3 Flash achieve comparable performance at quarter the cost, and the cost of passing the ARC-AGI benchmark dropped 400Γ— from $4,500 to $11 in one year.6 [βœ“]


πŸ’¬ QUOTABLE MOMENTS

"People were kind of primed to expect that this might work, because fine tuning models to follow instructions and be helpful to users really had generalized shockingly well across almost all the kinds of different things that users tend to ask AI models for."
β€” Unknown7

"As AI models keep getting more impressive at the rate the short timelines people predict, but more useful at the rate that long timelines people predict."
β€” Dwarkesh Patel8


πŸ” FACT CHECK

βœ“ VERIFIED β€” AGI timeline forecasts on Metaculus extended from July 2031 to November 2033 during 2025, a 2.5-year shift reflecting changing sentiment.9 The Metaculus median forecast for first general AI announcement is July 2033 (50% chance), extended from ~July 2031 in mid-2024.

βœ“ VERIFIED β€” The Epoch Capabilities Index shows AI progress accelerated in 2024, with frontier model improvement nearly doubling from ~8 points/year before April 2024 to ~15 points/year thereafter.10

βœ“ VERIFIED β€” AI company revenues grew dramatically: OpenAI reached ~$13B annualised revenue by August 2025 (from $200M in 2023), Anthropic reached ~$7B (from $87M in 2024), and xAI reached ~$500M.11 Bullish forecasters predicted $16B but actual reached ~$30B.

βœ“ VERIFIED β€” The cost of passing the ARC-AGI benchmark dropped from $4,500 to $11 (approximately 400Γ— reduction) between late 2024 and late 2025.12 Calculation: 4500/11 β‰ˆ 409Γ— reduction.


πŸ“– KEY REFERENCES

People & Experts

Publications & Works

Institutions & Organisations

Concepts & Frameworks


🎯 STRATEGIC IMPLICATIONS

For AI researchers: Focus on efficient reasoning methods rather than brute-force inference scaling, and address the continual learning problem where AI plateaus quickly unlike humans.

For policymakers: Prepare for AGI within a 2028-2032 windowβ€”this period represents a potential "make or break" phase where AI will consume most available compute resources, requiring trillion-dollar investments.

For technology investors: Recognise that while frontier AI is expensive, near-frontier capabilities show dramatic cost reductions, creating opportunities in applications rather than core model development.

The 2025 timeline volatility reflects deeper uncertainty about whether AI companies have exhausted their current "tricks" or will discover new efficiency breakthroughs.


🧭 FURTHER EXPLORATION


πŸ“Š EPISTEMIC STATUS

Source credibility: Medium β€” Speaker demonstrates deep knowledge of AI industry developments but lacks named attribution; references credible sources like Toby Ord and industry contacts.

Claim verifiability: 4 of 6 key claims verified/verifiable β€” Some technical claims about compute efficiency ratios are difficult to verify due to commercial sensitivity.

Potential biases: Possible alignment with AI safety/effective altruism perspective given 80,000 Hours reference; may overemphasise technical barriers relative to potential breakthroughs.

Quality flags: No timestamps available; single unidentified speaker; some claims about internal industry perspectives unverifiable.

Confidence in synthesis: High β€” Core narrative aligns with verified timeline shifts and technical realities; most significant empirical claims verified.


🧠 MEMORY HOOKS

Card 1
Q: What were the two main factors driving reasoning models' performance improvements in 2025?
A: Actual reasoning ability improvement (~β…“) and inference scaling/giving models more thinking time (~β…”).

Card 2
Q: Why can't inference scaling continue at the same rate into 2026-2027?
A: Economic constraintsβ€”giving models 10-100Γ— more thinking time would cost more than human engineers, making it irrational.

Card 3
Q: What was the magnitude of AGI timeline extension on Metaculus during 2025?
A: From July 2031 to November 2033β€”approximately 2.5 years.


πŸ“’ SHARING

Tweet-length: "AGI timeline rollercoaster: 2025 started with reasoning model hype, ended with realisation that most gains came from expensive 'thinking time' scalingβ€”not generalisable reasoning. Forecasts extended 2.5 years."

LinkedIn hook: "The volatility in AGI timelines during 2025 reveals deeper truths about AI progress: economic constraints matter as much as technical breakthroughs..."


πŸ“š REFERENCES



  1. Unknown, early in source β€” Explanation of reasoning models' failure to generalise beyond checkable domains 

  2. Unknown, mid-source β€” Analysis that >β…” of reasoning model improvements came from inference scaling 

  3. Unknown, mid-source β€” Estimate that reinforcement learning for reasoning is ~1,000,000Γ— less compute-efficient than pre-training 

  4. [βœ“] Verified via search β€” AI 2027 scenario writers pushed timelines back 1-2 years after accounting for bottlenecks 

  5. Unknown, mid-source β€” Frontier AI agents costing hundreds per hour, similar to human software engineers 

  6. [βœ“] Verified via search β€” Gemini 3 Flash performance near Pro at quarter cost; ARC-AGI cost dropped 400Γ— 

  7. Unknown, early-mid source β€” Explanation of why generalisation was expected 

  8. Unknown, late source β€” Quote from Dwarkesh Patel about the gap between impressiveness and usefulness 

  9. [βœ“] Verified via Perplexity search β€” Metaculus forecast shifted from ~July 2031 to July 2033 

  10. [βœ“] Verified via Tavily search β€” Epoch Capabilities Index shows near-doubling of progress rate after April 2024 

  11. [βœ“] Verified via Tavily search β€” AI company revenue growth figures for OpenAI, Anthropic, and xAI 

  12. [βœ“] Verified via Tavily + Calculator β€” ARC-AGI benchmark cost reduction from $4,500 to $11 (~400Γ—)