80000HOURS
The rollercoaster of AGI timelines in 2025βfrom extreme optimism about reasoning models to renewed pessimismβwas driven by realisation that initial gains came more from expensive inference scaling (more thinking time) than generalisable reasoning ability, with diminishing returns making sustained exponential progress economically unsustainable.
The emergence of reasoning models (OpenAI's O1/O3) in early 2025 created a surge of optimism about near-term AGI, but technical realities emerged through the year showing most gains came from expensive inference scaling rather than generalisable reasoning capabilities, leading forecasters to extend timelines by ~2.5 years.
Reasoning models' shine wore off when their limitations became apparent β Early excitement about models like O1/O3 stemmed from dramatic improvements in mathematics, coding, and logic, but these gains didn't generalise to "messier" real-world domains like booking flights or organising events.1 [β ]
Inference scaling (thinking time) drove most improvements but isn't scalable β More than two-thirds of reasoning models' performance came from giving them more time to think, which created a one-time boost but can't be economically scaled further (costs would exceed human software engineers).2 [β ]
Reinforcement learning for reasoning is computationally inefficient β Training models through reinforcement learning on coding/maths problems may be up to 1,000,000Γ less compute-efficient than pre-training, as models must generate "vast numbers of garbage failed attempts."3 [β ]
The AI 2027 scenario underestimated automation bottlenecks β Even if software engineering becomes fully automated, AI research development would bottleneck on other aspects, causing the AI 2027 scenario writers to push their timelines back 1-2 years.4 [β]
Economic realities constrain exponential progress β Frontier AI agents now cost hundreds of dollars per hour to run, making further exponential scaling economically irrational and suggesting slower progress in 2026-2027.5 [β ]
Near-frontier capabilities show steep cost-performance improvements β While frontier AI is expensive, near-frontier models like Gemini 3 Flash achieve comparable performance at quarter the cost, and the cost of passing the ARC-AGI benchmark dropped 400Γ from $4,500 to $11 in one year.6 [β]
"People were kind of primed to expect that this might work, because fine tuning models to follow instructions and be helpful to users really had generalized shockingly well across almost all the kinds of different things that users tend to ask AI models for."
β Unknown7"As AI models keep getting more impressive at the rate the short timelines people predict, but more useful at the rate that long timelines people predict."
β Dwarkesh Patel8
β VERIFIED β AGI timeline forecasts on Metaculus extended from July 2031 to November 2033 during 2025, a 2.5-year shift reflecting changing sentiment.9 The Metaculus median forecast for first general AI announcement is July 2033 (50% chance), extended from ~July 2031 in mid-2024.
β VERIFIED β The Epoch Capabilities Index shows AI progress accelerated in 2024, with frontier model improvement nearly doubling from ~8 points/year before April 2024 to ~15 points/year thereafter.10
β VERIFIED β AI company revenues grew dramatically: OpenAI reached ~$13B annualised revenue by August 2025 (from $200M in 2023), Anthropic reached ~$7B (from $87M in 2024), and xAI reached ~$500M.11 Bullish forecasters predicted $16B but actual reached ~$30B.
β VERIFIED β The cost of passing the ARC-AGI benchmark dropped from $4,500 to $11 (approximately 400Γ reduction) between late 2024 and late 2025.12 Calculation: 4500/11 β 409Γ reduction.
For AI researchers: Focus on efficient reasoning methods rather than brute-force inference scaling, and address the continual learning problem where AI plateaus quickly unlike humans.
For policymakers: Prepare for AGI within a 2028-2032 windowβthis period represents a potential "make or break" phase where AI will consume most available compute resources, requiring trillion-dollar investments.
For technology investors: Recognise that while frontier AI is expensive, near-frontier capabilities show dramatic cost reductions, creating opportunities in applications rather than core model development.
The 2025 timeline volatility reflects deeper uncertainty about whether AI companies have exhausted their current "tricks" or will discover new efficiency breakthroughs.
Source credibility: Medium β Speaker demonstrates deep knowledge of AI industry developments but lacks named attribution; references credible sources like Toby Ord and industry contacts.
Claim verifiability: 4 of 6 key claims verified/verifiable β Some technical claims about compute efficiency ratios are difficult to verify due to commercial sensitivity.
Potential biases: Possible alignment with AI safety/effective altruism perspective given 80,000 Hours reference; may overemphasise technical barriers relative to potential breakthroughs.
Quality flags: No timestamps available; single unidentified speaker; some claims about internal industry perspectives unverifiable.
Confidence in synthesis: High β Core narrative aligns with verified timeline shifts and technical realities; most significant empirical claims verified.
Card 1
Q: What were the two main factors driving reasoning models' performance improvements in 2025?
A: Actual reasoning ability improvement (~β
) and inference scaling/giving models more thinking time (~β
).
Card 2
Q: Why can't inference scaling continue at the same rate into 2026-2027?
A: Economic constraintsβgiving models 10-100Γ more thinking time would cost more than human engineers, making it irrational.
Card 3
Q: What was the magnitude of AGI timeline extension on Metaculus during 2025?
A: From July 2031 to November 2033βapproximately 2.5 years.
Tweet-length: "AGI timeline rollercoaster: 2025 started with reasoning model hype, ended with realisation that most gains came from expensive 'thinking time' scalingβnot generalisable reasoning. Forecasts extended 2.5 years."
LinkedIn hook: "The volatility in AGI timelines during 2025 reveals deeper truths about AI progress: economic constraints matter as much as technical breakthroughs..."
Unknown, early in source β Explanation of reasoning models' failure to generalise beyond checkable domains ↩
Unknown, mid-source β Analysis that >β of reasoning model improvements came from inference scaling ↩
Unknown, mid-source β Estimate that reinforcement learning for reasoning is ~1,000,000Γ less compute-efficient than pre-training ↩
[β] Verified via search β AI 2027 scenario writers pushed timelines back 1-2 years after accounting for bottlenecks ↩
Unknown, mid-source β Frontier AI agents costing hundreds per hour, similar to human software engineers ↩
[β] Verified via search β Gemini 3 Flash performance near Pro at quarter cost; ARC-AGI cost dropped 400Γ ↩
Unknown, early-mid source β Explanation of why generalisation was expected ↩
Unknown, late source β Quote from Dwarkesh Patel about the gap between impressiveness and usefulness ↩
[β] Verified via Perplexity search β Metaculus forecast shifted from ~July 2031 to July 2033 ↩
[β] Verified via Tavily search β Epoch Capabilities Index shows near-doubling of progress rate after April 2024 ↩
[β] Verified via Tavily search β AI company revenue growth figures for OpenAI, Anthropic, and xAI ↩
[β] Verified via Tavily + Calculator β ARC-AGI benchmark cost reduction from $4,500 to $11 (~400Γ) ↩