Understanding AI Cost Dynamics: A Crucial Leadership Skill in 2026

Written by Tim | Jul 3, 2026 8:31:10 AM

When leaders are selecting their first AI use cases, there is one principle that matters more than almost anything else.The cost of AI per token keeps dropping. So why are companies running out of budget early in their financial years? The thing that drives the bill, how agents consume compute, has become invisible to the people now spending the money. This is not a big-tech problem alone; for mid-sized and corporate organisations it may matter more. Here is what every leadership team needs to understand, and to teach its people, before designing the next AI workflow.

The Situation: AI Costs Are Getting Out of Hand

In December 2025, Uber gave around 5,000 engineers access to AI coding tools. By February, usage had nearly doubled. By April, the company had spent its entire 2026 AI budget, four months into a twelve-month year, and its CTO admitted the team was, in his words, back to the drawing board.

The detail that turns this from a cautionary tale into something more instructive is what Uber did to get there. It had actively encouraged the spend, running an internal leaderboard ranking teams by how much AI they used. The behaviour that blew the budget was the behaviour the company was rewarding and when the dust settled, its COO conceded that the link between all that consumption and genuinely better products for customers was, so far, hard to draw.

Uber is not an outlier. Microsoft reportedly cancelled most of its internal AI coding licences about six months after rolling them out. One unnamed company is said to have run up a roughly $500 million bill in a single month after failing to set any usage limits. By April and May, the executive director of the FinOps Foundation was hearing from companies that were already three times over their full-year 2026 budget, and the conversation across the industry had moved, almost overnight, from “use as much as possible” to “how on earth do we control this?”

This is because of a cost model that almost nobody designed for, and it is about to land squarely on marketing, sales and customer service.

It is tempting to file all of this under big-tech problems. Uber, Microsoft and a half-billion-dollar invoice are not representative of most organisations’ world. But that reading gets the risk precisely backwards. A frontier lab or a hyperscaler can absorb a blown AI budget, and has platform engineers and a dedicated cost-management function to catch the problem early. A mid-sized business or a corporate division has neither. The same dynamics (reasoning costs, agentic loops, opaque pricing) arrive through the everyday marketing and sales stack rather than a research lab, where a runaway agent is a far larger share of a smaller budget and there is no one whose job it is to notice the problem.

The organisations with the most to lose from getting this wrong are often exactly the ones least resourced to get it right on their own.

The Paradox: Cheaper Than Ever, and Somehow Unaffordable

Here is the part that confuses most leadership teams. The price of AI is falling. An analysis of 2.4 billion enterprise API calls found the blended cost of intelligence dropped around 67% year on year, from roughly $18.40 to $6.07 per million tokens. Gartner expects the cost of running a frontier-scale model to fall by nearly 90% by 2030.

And yet the bills keep climbing. The reason is simple arithmetic that most 2026 budgets quietly got wrong: total spend is price per unit multiplied by volume consumed. The price per unit is dropping. The volume is exploding far faster.

How much faster? Google alone went from processing 9.7 trillion tokens a month two years ago, to 480 trillion in May 2025, to 3.2 quadrillion in May 2026, a sevenfold jump in a single year. Goldman Sachs projects token consumption will rise twenty-four-fold to 120 quadrillion tokens a month by 2030. Gartner’s warning to product leaders is worth pinning to the wall: do not “confuse the deflation of commodity tokens with the democratisation of frontier reasoning.” Cheaper tokens do not mean cheaper AI, because the newest models consume so many more of them per task.

Which raises the obvious question - where are all those extra tokens going?

The Mechanics: How a Single Task Multiplies in the Background

A token is the basic unit AI bills for, roughly three-quarters of a word, counted on everything you send the model and everything it sends back. Understanding three combined effects explains almost the entire cost problem.

First, the thinking tax. The current generation of models can “reason” before they answer, working through a problem in internal steps you pay for but rarely see. That reasoning is expensive. Independent testing by Artificial Analysis found reasoning models use up to twenty times more tokens than non-reasoning models for the same task. A separate study put the average increase at around 1,950%, roughly twenty-fold, to reach the same answer. And more thinking does not reliably mean better output: research has shown cases where longer reasoning actively degraded the result. You are billed for every thinking token whether or not it changed the answer.

Second, the agentic multiplier. A chatbot answers once. An agent takes a task and works it autonomously, querying tools, re-reading its own context on every turn, looping until it is satisfied. Goldman Sachs describes this as taking a single request and blowing it up “10-fold, 20-fold, 50-fold.” At the extreme, one AI researcher estimates an agent can consume five hundred to a thousand times as many tokens as a simple query. EY puts a number on it for customer service specifically: a routine interaction that cost around $0.04 in 2023 costs roughly $1.20 in 2026 once you add tools, reasoning and iterative loops, about thirty times more.

Third, the quiet creep. Even the definition of a token changes. The newest Claude models, for instance, use a tokeniser (an algorithm that breaks down words into tokens) that can consume up to 35% more tokens for the same English text, so the effective cost per word rises even when the headline rate per token does not move.

Now stack them. A reasoning model, running inside an agent that loops and re-sends its context, on a newer tokeniser, can cost one to three orders of magnitude more than the same business outcome delivered by a right-sized, non-reasoning model on a disciplined prompt, for output a customer often cannot tell apart. That gap is a design choice, usually made by accident.

The “Tokenmaxxing” Mistake

For most of 2025, the prevailing instinct was to spend. Chief executives pushed teams to use the best models and move fast, cost be damned. The behaviour even acquired a name, “tokenmaxxing”, and companies including Meta and Amazon reportedly ran internal leaderboards celebrating whoever burned the most. Heavy token use became a proxy for productivity.

The data has not been kind to that instinct. Analysis by Jellyfish found heavy users were about twice as productive but spent ten times the tokens. The best return came less from pushing power users harder than from moving average users to moderate usage. Usage, it turns out, is a terrible measure of value. The cost of an output has nothing to do with how many tokens it took to produce, and rewarding consumption simply rewards waste.

It is worth being honest about why this happened, because the cause is a knowledge gap that sits at the top. The people who set AI strategy and approve the budgets (boards, executives, and increasingly the marketing and sales managers who now own these tools directly) have rarely been given a working understanding of what drives an AI bill. Few leadership teams could say with confidence what a token is, when a reasoning model earns its extra cost, or why one model can be twenty times more expensive than another for identical work. Without that fluency, “use the best model” sounds like healthy ambition rather than an open tap, and the judgement cannot simply be delegated downward, because the trade-offs between capability, speed and cost are strategic, not technical.

This is the part that no architecture diagram fixes. The organisations that keep AI costs under control are, almost without exception, the ones whose decision-makers understand the economics well enough to ask the right questions. It is exactly why we put token economics and model selection in front of leadership teams directly in our AI Workshop for Leaders, to give decision makers enough fluency to decide well rather than to turn executives into engineers. Educating the top of the organisation is not a soft accompaniment to cost control. It is where cost control starts.

This is why the most telling fact in the entire 2026 picture may be this one: only about a quarter of organisations have genuinely scaled AI. Nearly half are still piloting. Most of the eye-watering bills are not buying scaled capability. They are buying experiments.

What It Costs: The Numbers Across Providers

The single largest lever you control is which model does the work. The spread between the cheapest and most expensive options from the major labs runs to roughly twenty- to fifty-fold. Here is where the current generation sits, in US dollars per million tokens, input and output, at standard rates (verified June 2026):

Sources: Anthropic, OpenAI, Google. Note that output is billed far higher than input, typically five times, and that thinking tokens count as output, which is why agentic bills concentrate there.

Two things matter more than the exact figures:

1. The analysis of those 2.4 billion calls found teams that route work intelligently (cheap models for simple tasks, expensive ones only where they earn it) ran at a median of $2.31 per million tokens, against $18.40 for those who sent everything to the flagship. The most common cause of waste is that the first engineer to build something chose the model they happened to be using, and nobody revisited the decision when it scaled.

2. Every major provider offers levers that can cut effective costs by 30–95%: batch processing (around half price), prompt caching (up to 90% off repeated context), tighter prompts, and caps on reasoning effort. They are only useful to teams who know they exist.

The Blind Spot: When the Meter Disappears

Everything above assumes you can see the tokens. For a growing share of marketing, sales and service teams, you cannot, because the platforms they use deliberately hide them.

Salesforce Agentforce and HubSpot Breeze have both moved away from per-seat pricing toward consumption and outcome-based models, and, being the default CRM stack for mid-sized and corporate organisations, they are where most of these businesses will first meet agentic AI. On the surface, the change sounds reassuring. HubSpot’s framing is that AI should be priced on the value it delivers, not the compute it consumes. In practice, the compute has not gone away. It has been buried one layer down, where you can no longer manage it.

HubSpot's Breeze Credit Dashboard

Consider how each meters cost. HubSpot’s Breeze runs on credits at $10 per thousand; its Customer Agent costs 50 credits, fifty cents, per resolved conversation, and its Prospecting Agent costs 100 credits, a dollar, per recommended lead. The credits have not disappeared under the outcome pricing, they have moved down a layer, where they meter the compute behind each agent rather than the price the buyer sees.

Salesforce’s Agentforce charges either $2 per conversation or, via Flex Credits, about ten cents per “action.” However, each action covers a threshold of around 10,000 tokens, and a token-heavy action silently bills as several. A single interaction can quietly become two or three. Independent analysts note that Agentforce credit consumption is genuinely hard to predict until you have run agents for two or three months, and that the platform requires a substantial Data Cloud subscription underneath it before it does anything at all.

The deeper problem is not the price. It is that the abstraction moves complexity rather than removing it. You stop arguing about tokens and start arguing about what counts as a “resolution” or a “qualified lead.” And you lose sight of the levers that control cost (model choice, prompt and context discipline, reasoning limits, removing redundant agent steps) because they are no longer visible to the marketer or RevOps lead who owns the tool. The opacity protects the vendor’s margin and removes your control in the same move.

It is worth sitting with the consequence. Salesforce’s own research found that 90% of CIOs say managing AI costs is limiting their ability to drive value. The people now being handed agentic tools (in marketing, in sales, in service) are the people least equipped to see what those tools are spending. You cannot optimise a cost you cannot see.

The Reframe: Compute Is an Architecture Decision

Pull these threads together and the conclusion is uncomfortable but clear. AI cost is not a billing surprise to be absorbed after the fact. It is an architecture decision made, well or badly, at the moment you design the solution.

The organisations bringing it under control share a pattern, and none of it is exotic:

Right-size the model to the task. Most work (classification, extraction, drafting, routing) does not need the smartest, most expensive model. Reserving the flagship for the genuinely hard 10% is the twenty-to-fifty-fold lever.
Cap reasoning and output. Pay for thinking only where it changes the answer; switch it off where it does not.
Engineer for token economy. Tight prompts, controlled context, caching and batching where the work allows.
Govern the agent itself. Step limits, scope boundaries and stop conditions, so a “simple” task cannot recurse into millions of tokens unnoticed.
Measure cost per completed task, not per token or per credit, and attribute it to the team and the use case that drove it.

And crucially, treat all of this as continuous rather than one-off. The ground moves underneath you constantly: tokenisers change, and Agentforce and Breeze each re-priced more than once inside a single year. This is not a setting you configure once and forget.

This Is The Gap Most Mid-Sized Organisations Cannot Staff Internally

The discipline lives at the design-and-run layer, not in procurement, and it is now the single most sought-after skill in technology finance teams, with the share of practitioners responsible for AI spend jumping from roughly a third to nearly all of them in twelve months. Wanting to control the cost and knowing how to are two very different things.

There is a deeper point the budget panic tends to obscure. The goal was never to spend less on AI. An organisation that throttles adoption to protect a budget has failed just as surely as one that overspends; it has simply chosen to fall behind instead of going broke.

The goal is deep, confident adoption at a cost that earns its return, both at once. Getting it right and keeping it right are different jobs. It needs someone whose standing remit is to hold both sides (adoption deep, economics honest) as the ground moves beneath them, whether that capability sits inside the business or alongside it as a trusted adviser.

The Quiet Conclusion

The lesson of 2026 is not that AI is too expensive. The models that produced these jaw-dropping bills were doing exactly what they were built to do. The lesson is that the cost of intelligence has to be designed in and governed continuously, understood at the build stage by people equipped to understand it, and watched on every run as the ground moves, rather than discovered when the invoice arrives.

For most mid-sized and corporate organisations, that calls for two things they rarely have in place:

Leaders fluent enough to make sound decisions about models, tokens and where reasoning is worth paying for.
Someone close to the detail who stays accountable for the economics as they change.

The first is a matter of education, bringing decision-makers up to speed before they set direction. The second is a matter of advice, a standing relationship rather than a one-off project.

Building Your Cost-Control Function

It is the thinking behind our leadership workshops, and behind Agents-as-a-Service: acting as a trusted AI adviser inside our clients’ businesses, building agents and then staying accountable for what they cost to run and the return they earn, optimising each one on a per-run basis as the models and prices move beneath it. The aim is AI adopted deeply and confidently, at a cost that pays for itself.

The question worth asking before you deploy anything is no longer can the model do this? It almost certainly can.

The questions are whether your people understand what it will cost every time it does, and whether someone you trust is watching that cost as the ground moves. Those are the questions that separate AI that earns its keep from AI that quietly drains the budget.

If you have not yet asked them of your own AI plan, that is the place to start.

View full post