JOURNAL ARTICLE

How Well Can AI Do Strategy? Empirical Benchmarking Using Strategy Simulations.

Published In: Strategy Science (INFORMS), 2026, v. 11, n. 1. P. 93 1 of 3
Database: Business Source Ultimate 2 of 3
Authored By: Allen, Ryan T.; McDonald, Rory M. 3 of 3

Abstract

This article focuses on benchmarking large language models' (LLMs) capabilities in strategic decision making using the Back Bay Battery (BBB) simulation, a widely used business strategy exercise that captures key elements of real-world strategy such as uncertainty, complexity, irreversible multiperiod decisions, and delayed feedback. The study evaluates 21 proprietary and 13 open-source LLMs on their ability to balance short-term profitability with long-term investment in emerging technologies, comparing their performance to historical data from 249 MBA students. Results show that while LLMs have generally improved over time, models from late 2024 to early 2025 (e.g., OpenAI's o3-mini, Claude Sonnet 4, Gemini 2.0 Flash) outperformed both earlier models and the MBA average, the most recent frontier models from mid-2025 (e.g., GPT-5, Gemini 2.5 Pro) surprisingly underperformed, exhibiting a bias toward exploiting core businesses at the expense of future growth. The paper argues for the importance of strategy-specific benchmarks like BBB to accurately assess and guide AI development in strategic contexts, highlighting current LLM limitations in managing strategic uncertainty despite advances in other domains.

Additional Information

Source:Strategy Science (INFORMS). 2026/03, Vol. 11, Issue 1, p93
Document Type:Article
Subject Area:History
Publication Date:2026
ISSN:2333-2050
DOI:10.1287/stsc.2025.0444
Accession Number:192698243
Copyright Statement:Copyright of Strategy Science (INFORMS) is the property of INFORMS: Institute for Operations Research & the Management Sciences and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

Looking to go deeper into this topic? Look for more articles on EBSCOhost.