YOUTUBE

4 AI Labs Built the Same System Without Talking to Each Other (And Nobody's Discussing Why)

Video · AI & Technology · 12 Mar 2026 · 27m · source

⚡ BOTTOM LINE

AI's "jaggedness" (unpredictable performance across tasks) is disappearing in practical work contexts not because models are getting uniformly smarter, but because we've learned to organise agents using structures similar to human teams — and four major AI companies have converged on identical architectural patterns without coordinating.

📝 THESIS

The perception that AI capabilities are "jagged" (excellent at some tasks, terrible at others) was an artifact of primitive deployment methods (single-turn interactions) rather than inherent intelligence limitations.¹ As companies have developed multi-agent systems with organisational structures (harnesses), practical work capabilities have smoothed dramatically because these systems implement the same organisational intelligence principles that enable human teams to succeed.²

💡 KEY INSIGHTS

Jaggedness was a deployment artifact, not an intelligence property — Single-turn AI interactions forced models to solve complex problems instantly with no ability to iterate, adjust, or accumulate knowledge, much like asking a human to solve every problem in 30 seconds without notes or colleagues.³
Organisational intelligence transfers naturally to agents — Four major AI companies (Anthropic, Google DeepMind, OpenAI, and Cursor) independently converged on identical architectural patterns: decompose work, parallelise execution, verify outputs, and iterate toward completion.⁴ [⚠]
Practical work domains have become "smooth" — For everyday work tasks (PRDs, code, customer service tickets), AI capabilities are no longer jagged because most work operates well inside current model capabilities and can be handled by properly organised agent systems.⁵
The shift from "can AI do this task?" to "can this work be decomposed into verifiable subproblems?" — The key question for knowledge workers is no longer whether AI can handle specific tasks, but whether their work can be broken down into components that can be verified for correctness.⁶
Meta-skills (sniff-checking) become the core competency — As execution gets automated, the most valuable skills shift to evaluation: knowing whether outputs are correct, whether architecture is maintainable, and recognising fragile solutions.⁷
Cost is real but organisational strength is transformative — Multi-agent systems generate significantly more tokens (increasing cost) but provide structural diversity, parallel exploration, and error isolation that single-turn systems cannot achieve.⁸
The Cursor math breakthrough illustrates cross-domain generalisation — A system designed for coding solved an unpublished research mathematics problem using spectral graph theory, suggesting agentic harnesses can generalise across domains when work is verifiable.⁹ [⚠]

💬 QUOTABLE MOMENTS

"We have been asking a capable analyst to solve every problem in 30 seconds with no notes, no colleagues, no ability to try something, and no ability to retry."
— [Speaker, early in source]³

"The skill that survives this transition isn't 'I can do the work' — it's 'I can sniff check. I can tell if the work is correct or not.'"
— [Speaker, mid-source]⁷

🔍 FACT CHECK

⚠ UNVERIFIED — The specific Cursor AI math breakthrough on March 3 involving spectral graph theory and unpublished Stanford/MIT/Berkeley problems. While Cursor's autonomous coding systems are documented,¹⁰ this specific math achievement couldn't be verified via current searches.

✓ VERIFIED — Anthropic's 2026 Agentic Coding Trends Report exists and documents engineers delegating tasks where they can "sniff check" correctness, aligning with the speaker's claims about the shift to evaluation-focused roles.¹¹

⚠ UNVERIFIED — The claim that four AI labs independently built identical multi-agent architectures without coordination. While each company has agentic systems research, their architectural convergence couldn't be fully verified.

✓ VERIFIED — The concept of "AI jaggedness" is academically recognised, with research characterising models' uneven performance profiles across different tasks.¹²

📖 KEY REFERENCES

People & Experts

Michael Trule — Cursor CEO, mentioned for announcing the math breakthrough
Wilson Lynn — Published Cursor blog post on scaling long-running autonomous coding in January 2026

Publications & Works

2026 Agentic Coding Trends Report (Anthropic) — Documents shift from code-writing to agent orchestration
"Scaling long-running autonomous coding" (Wilson Lynn, Jan 2026) — Cursor's planner/worker/judge architecture

Institutions & Organisations

Cursor — AI coding company whose harness solved both coding and math problems
Anthropic — AI company that published the 2026 Agentic Coding Trends Report
Google DeepMind — AI research division developing agentic systems
OpenAI — AI company with CodeEx sandbox environment approach

Concepts & Frameworks

Harness — The organisational scaffolding around agents that enables meaningful work (memory, task files, state)
Planner/Worker/Judge architecture — Cursor's hierarchical approach where planners create tasks, workers execute, and judges evaluate progress
Organisational intelligence — The collective problem-solving capacity that emerges from proper team/organisation structures

🎯 STRATEGIC IMPLICATIONS

For engineers: Your role shifts from writing code to orchestrating agents, sniff-checking correctness, and ensuring architectural maintainability.

For knowledge workers: Map your domain for verifiability — identify which tasks can be decomposed into checkable components and develop evaluation competencies.

For leaders: Invest in agent infrastructure and training on decomposition, verification, and organisational design principles rather than focusing on individual AI capabilities.

The convergence on identical organisational patterns suggests we've discovered fundamental principles of scalable intelligence that apply equally to humans and AI.

🧭 FURTHER EXPLORATION

If organisational intelligence principles apply equally to humans and AI, what uniquely human capabilities remain valuable in a world of well-organised agent teams?
How does the cost/benefit calculation change when we consider the organisational advantages (parallel exploration, error isolation) versus the token costs of multi-agent systems?
What governance and safety implications arise when AI systems independently implement organisational patterns that took human institutions centuries to develop?

📊 EPISTEMIC STATUS

Source credibility: Medium — Speaker demonstrates deep understanding of AI systems and organisational theory, but specific claims about proprietary company architectures require verification.
Claim verifiability: 3 of 7 key claims verified/partially verified — Documentation exists for some claims (Anthropic report, Cursor blog) but specific breakthrough details couldn't be confirmed.
Potential biases: Forward-looking claims about AI capabilities may be optimistic; selection bias toward recent success stories without examining failures.
Quality flags: No timestamps, speaker identity unknown, specific math breakthrough claim couldn't be verified despite searching.
Confidence in synthesis: Medium — Core concepts (jaggedness as deployment artifact, organisational intelligence transfer) are plausible and align with observable trends, but verification of specific claims is incomplete.

📚 REFERENCES

[Speaker, early] "The jagged frontier was never an inherent property of AI intelligence. I want to suggest it was an artifact of how we were asking the AI to work." ↩
[Speaker, early] "We humans have figured out a form of organizational intelligence and now we are giving it to agents and it turns out it scales." ↩
[Speaker, early] "We have been asking a capable analyst to solve every problem in 30 seconds with no notes, no colleagues, no ability to try something, and no ability to retry." ↩↩
[Speaker, mid] "Four organizations, Anthropic, Google Deep Mind, OpenAI, and Cursor have independently built very large multi-agent coordination systems... All four exhibit a similar structural pattern." ↩
[Speaker, early] "In that world, the world of PRDs, the world of code, the world of customer service tickets, AI is not jagged anymore." ↩
[Speaker, late] "The relevant question... is shifting very quickly from can AI do a specific task in my job family to can my work be decomposed into verifiable subpros." ↩
[Speaker, mid] "The skill that survives this transition isn't 'I can do the work' — it's 'I can sniff check. I can tell if the work is correct or not.'" ↩↩
[Speaker, mid] "Multi-agent harnesses are extremely expensive... but multi-agent systems give you an organizational strength you can't get any other way for the hardest problems." ↩
[Speaker, mid] "Cursor did not build to solve math problems... A system designed to write code looked at a problem in spectral graph theory and produced mathematics that the problems own authors hadn't found." ↩
[Verified] Cursor's "Scaling long-running autonomous coding" blog post documents their planner/worker/judge architecture and achievements including building a web browser from scratch. ↩
[Verified] Anthropic's 2026 Agentic Coding Trends Report documents engineers delegating tasks where they can easily sniff check correctness. ↩
[Verified] Academic research characterises "model jaggedness" as the normalised pattern of peaks and valleys in AI performance relative to human baselines. ↩