Can The LLM Market Scale To Artificial General Intelligence?

Introduction

Scaling current large-language-model (LLM) infrastructure yields steady – but slowing – gains. Fundamental constraints in compute, data supply, energy, cost, and safety indicate that brute-force scaling is unlikely to cross the remaining gap to human-level, general intelligence without substantial algorithmic advances and new system designs.

1. What Pure Scaling Has Achieved

Year Frontier model (public) Train compute (FLOP) Cost (USD, est.) ARC-AGI-1 score Notable capabilities
2020 GPT-3 (175 B) 3.1 e23 $2–4 M 0% few-shot text generation
2023 GPT-4 ≈6 e24 $41–78 M 5% chain-of-thought, tool use
2024 Claude 3.5 n/a “few tens of millions” 14% improved coding & reasoning
2025 o3-medium ≈1 e25 $30–40 M 53% on ARC-AGI-1 but ≤3% on harder ARC-AGI-2 beats graduate-level STEM tests, 25% on Frontier-Math

Raw scale has pushed LLMs from near-random performance to superhuman scores on many benchmarks, showing that power-law “scaling laws” hold over five orders of magnitude in compute. Yet even the most compute-hungry model still fails most ARC-AGI-2 tasks that ordinary humans solve easily.

2. Why Scaling Laws Flatten

  1. Compute and Cost

    • Training cost for the largest runs has grown 2.4 × per year since 2016; extrapolates to > $1 billion by 2027.

    • Inference cost also rises with test-time “long-thinking” strategies that drive recent gains.

  2. Energy and Carbon

    • A single 65 B-parameter model can draw 0.3 – 1 kW per inference job at scale training GPT-3 emitted approximately 550 t CO₂-eq.

    • Running 3.5 M H100 GPUs at 60% utilisation would consume approximately 13 TWh yr⁻¹ – more than many small countries.

  3. Data Exhaustion

    • Human-generated high-quality text (about300 T tokens) will be fully consumed between 2026 to 2032 if current trends continue.

    • Heavy reliance on synthetic data risks “model collapse” and degraded diversity.

  4. Networking & Memory Limits

    • Clusters above about 30 k GPUs suffer steep efficiency loss from interconnect and fault-tolerance bottlenecks.

    • Sparse mixture-of-experts helps but increases VRAM pressure and complexity.

  5. Safety & Governance Friction

    • Labs have adopted Responsible Scaling Policies that require pauses when dangerous capabilities emerge; ever-larger models hit these checkpoints sooner.

3. Evidence That Scale Alone Is Insufficient

  • ARC-AGI-2 glass ceiling: o3’s ≤ 3% score – after about 50,000 × compute growth since 2019 – shows diminishing returns on tasks demanding systematic abstraction.

  • Diminishing log-log slopes: Updated scaling fits reveal exponents flattening as models reach the Chinchilla-optimal data/parameter ratio.

  • From pattern learning to planning: Current LLMs remain brittle at multi-step novel reasoning, long-horizon planning, and grounding in the physical world.

  • Economic infeasibility: A $1 billion training run would need to recoup > $10 billion in revenue just to match cloud depreciation, excluding alignment research and liability risk.

4. Paths Beyond Brute Scaling

  1. Algorithmic Efficiency

    • Chinchilla showed that smarter allocation of tokens beats larger models at equal compute.

    • Retriever-augmented generation, sparse routing, and neuromorphic techniques cut costs by 5 to 20 times.

  2. Test-time Adaptation & Agents

    • Tree search, majority voting, and tool-use agents outperform naïve parameter scaling on maths and code.

  3. Multimodal & Continual-Learning Systems

    • Grounding in images, actions, and feedback loops may supply richer gradients than extra text alone.

  4. Synthetic-Data Science

    • SynthLLM finds power-law scaling in generated curricula up to approximately 300 B tokens before plateau.

    • Theory warns that mutual-information bottlenecks, not sheer volume, drive generalization.

  5. Architecture Innovation

    • New memory-augmented, modular or hybrid neuro-symbolic models aim to break the quadratic attention wall and enable compositional generality.

5. Outlook: Toward AGI Requires More Than Bigger Clusters

Scaling current transformer-based LLM infrastructure will continue to deliver valuable, super-human skills – especially when paired with clever inference algorithms – yet multiple converging ceilings suggest it will not by itself close the remaining qualitative gap to general intelligence:

  • Compute, energy, and cost grow faster than capabilities.

  • High-quality data is finite; synthetic data helps but introduces new failure modes.

  • Benchmarks designed to detect genuine abstraction (ARC-AGI-2) still expose large deficits.

  • Safety regimes and public policy are already nudging labs to slow or pivot from raw scale.

The most plausible route to AGI therefore lies in hybrid progress. Continued – but economically tempered – scaling combined with breakthroughs in architecture, efficient learning algorithms, richer data modalities, and robust alignment methods. Pure scale remains a crucial ingredient, yet it is neither all we need nor, on its own, a guaranteed path to human-level general intelligence.

References.

  1. https://www.linkedin.com/posts/callou876_ai-training-cost-estimates-from-the-stanford-activity-7188106664758185984-ikED
  2. https://forum.effectivealtruism.org/posts/CoPNbwNqDai6orZhv/openai-s-o3-model-scores-3-on-the-arc-agi-2-benchmark
  3. https://www.forbes.com/sites/katharinabuchholz/2024/08/23/the-extreme-cost-of-training-ai-models/
  4. https://arcprize.org
  5. https://www.reddit.com/r/singularity/comments/1id60qi/big_misconceptions_of_training_costs_for_deepseek/
  6. https://arcprize.org/blog/analyzing-o3-with-arc-agi
  7. https://arxiv.org/html/2405.21015v1
  8. https://arxiv.org/html/2505.11831v1
  9. https://highlearningrate.substack.com/p/1212-o3-saturates-the-arc-agi-benchmark
  10. https://klu.ai/glossary/scaling-laws
  11. https://arxiv.org/abs/2203.15556
  12. https://blogs.nvidia.com/blog/ai-scaling-laws/
  13. https://openreview.net/forum?id=VNckp7JEHn
  14. https://arxiv.org/pdf/2310.03003.pdf
  15. https://hdsr.mitpress.mit.edu/pub/fscsqwx4
  16. https://higes.substack.com/p/the-energy-cost-of-teaching-machines-diving-deep-into-energy-and-llms-d01f7e1acb12
  17. https://arxiv.org/html/2211.04325v2
  18. https://epoch.ai/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data
  19. https://phinity.ai/blog/synthetic-data-llms-definitive-guide-2025
  20. https://www.reworked.co/information-management/llms-are-hungry-for-data-synthetic-data-can-help/
  21. https://www-cdn.anthropic.com/1adf000c8f675958c2ee23805d91aaade1cd4613/responsible-scaling-policy.pdf
  22. https://www.anthropic.com/news/anthropics-responsible-scaling-policy
  23. https://metr.org/blog/2023-09-26-rsp/
  24. https://legalgenie.com.au/artificial-intelligence/chinchilla-point/
  25. https://www.linkedin.com/pulse/ai-hits-wall-ilya-sutskever-plateau-llm-scaling-diana-wolf-torres-ryo0c
  26. https://arxiv.org/abs/2501.07458
  27. https://epoch.ai/blog/how-much-does-it-cost-to-train-frontier-ai-models
  28. https://www.rcrwireless.com/20250120/fundamentals/three-ai-scaling-laws-what-they-mean-for-ai-infrastructure
  29. https://www.cudocompute.com/blog/what-is-the-cost-of-training-large-language-models
  30. https://arxiv.org/html/2503.19551v2
  31. https://openreview.net/forum?id=UxkznlcnHf
  32. https://proceedings.neurips.cc/paper_files/paper/2022/file/c1e2faff6f588870935f114ebe04a3e5-Paper-Conference.pdf
  33. https://www.reddit.com/r/singularity/comments/1g4a1mm/anthropic_announcing_our_updated_responsible/
  34. https://cameronrwolfe.substack.com/p/llm-scaling-laws
  35. https://en.wikipedia.org/wiki/Neural_scaling_law
  36. https://openreview.net/forum?id=iBBcRUlOAPR
  37. https://arxiv.org/abs/2404.17785
  38. https://clickhouse.com/blog/how-anthropic-is-using-clickhouse-to-scale-observability-for-ai-era
  39. https://www.jonvet.com/blog/llm-scaling-in-2025
  40. https://lifearchitect.ai/chinchilla/
  41. https://www.anthropic.com/responsible-scaling-policy
  42. https://paperswithcode.com/method/chinchilla
  43. https://www.anthropic.com/news/activating-asl3-protections
  44. http://arxiv.org/pdf/2203.15556.pdf
  45. https://fortune.com/2025/02/19/generative-ai-scaling-agi-deep-learning/
  46. https://cacm.acm.org/blogcacm/the-energy-footprint-of-humans-and-large-language-models/
  47. https://www.lawfaremedia.org/article/openai’s-latest-model-shows-agi-is-inevitable.-now-what
  48. https://www.reddit.com/r/OpenAI/comments/1dqk2b8/is_it_scaling_or_is_it_or_learning_that_will/
  49. https://garymarcus.substack.com/p/breaking-openais-efforts-at-pure
  50. https://arxiv.org/abs/2211.04325
  51. https://adasci.org/how-much-energy-do-llms-consume-unveiling-the-power-behind-ai/
  52. https://www.wired.com/story/microsoft-and-openais-agi-fight-is-bigger-than-a-contract/
  53. https://www.marketingaiinstitute.com/blog/agi-policy-debate
  54. https://www.reddit.com/r/mlscaling/comments/1dag1a6/will_we_run_out_of_data_limits_of_llm_scaling/
  55. https://blog.spheron.network/understanding-the-expenses-of-training-large-language-models
  56. https://arxiv.org/abs/2309.14393
  57. https://www.linkedin.com/pulse/beyond-data-exhaustion-innovative-training-strategies-kesharwani-k8eoe
  58. https://arxiv.org/html/2505.04521v1
  59. https://www.nature.com/articles/s41598-024-76682-6
  60. https://dl.acm.org/doi/10.1145/3701100.3701162
  61. https://www.nownextlater.ai/Insights/post/the-ai-landscape-in-2024-the-rising-costs-of-training-ai-models
  62. https://hotcarbon.org/assets/2024/pdf/hotcarbon24-final154.pdf
  63. https://arxiv.org/abs/2410.12896
  64. https://team-gpt.com/blog/how-much-did-it-cost-to-train-gpt-4/
  65. https://www.sustainabilitybynumbers.com/p/carbon-footprint-chatgpt
  66. https://arcprize.org/blog/oai-o3-pub-breakthrough
  67. https://www.reddit.com/r/ArtificialInteligence/comments/1hitny3/open_ais_o3_model_scores_875_on_the_arcagi/
  68. https://forum.effectivealtruism.org/posts/GbHqM4pMjMt4pyrHm/arc-evals-responsible-scaling-practices
  69. https://www.rdworldonline.com/just-how-big-of-a-deal-is-openais-o3-model-anyway/
  70. https://www.lesswrong.com/posts/pnmFBjHtpfpAc6dPT/arc-evals-responsible-scaling-policies
  71. https://metr.org/blog/2023-03-18-update-on-recent-evals/
  72. https://www.fanaticalfuturist.com/2025/01/openais-o3-ai-model-smashes-the-aci-agi-benchmark-tests/
  73. https://www.givingwhatwecan.org/charities/arc-evals
  74. https://arcprize.org/leaderboard
  75. https://www.alignmentforum.org/posts/EPLk8QxETC5FEhoxK/arc-evals-new-report-evaluating-language-model-agents-on
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *