Skip to main content
On this page

The AI Cost Trap: Why Falling Token Prices Won't Save Your Budget

Izzy A
Izzy A
CTO @PromptMetrics

Token prices dropped 92%, yet enterprise AI spend exploded 16x. Discover why the Jevons Paradox and agentic workflows are inflating your budget and how to fix it.

The AI Cost Trap: Why Falling Token Prices Won't Save Your Budget

TL;DR: Why "Falling Token Prices" Won't Save You

  • The Trap: Unit costs are plummeting (1,000× drop), but total enterprise spend is exploding (16× rise).

  • The Gap: Cloud waste burns $44.5B/year; AI governance is even further behind.

  • The Multipliers: Agentic workflows and hidden reasoning tokens are reshaping usage patterns faster than prices can fall.

  • The Hedge: Whether prices drop (volume explodes) or rise (subsidies end), TCO increases. Visibility is your only hedge.

You've seen this movie before, and you know the ending costs $44.5 billion.

A decade ago, cloud infrastructure was supposed to save everyone money. Pay for what you use. Scale down when you don't. No more CapEx hardware rotting in a closet.

Then the bill arrived. And kept arriving.

Today, that bill includes $44.5 billion in cloud waste alone for 2025 (Harness, FinOps in Focus, February 2025). With 91% of organizations experiencing at least some waste (HashiCorp/Forrester, 2024), the average company takes 31 days to identify a spike.

In the world of agentic workflows that can spawn 50 inference loops per user request, 31 days is an eternity; your costs compound faster than your alerts fire.

While FinOps practices have reduced waste from 32% to 27%, and 59% of organizations now have FinOps teams, the absolute dollar amount keeps climbing because consumption is outpacing optimization.

Now, the same pattern is playing out with AI, but at warp speed. Ifyou'ree a CTO watching GPT-4 input costs drop 92% and thinking "this will sort itself out," you're standing exactly where your predecessor stood in 2015, watching EC2 prices fall while the AWS bill quietly tripled.

The Unit Cost Fallacy: Why Cheaper Tokens Don't Mean Cheaper AI

Let's look at the numbers, because the disconnect is staggering.

OpenAI's GPT-4 family clearly demonstrates the trend. The original GPT-4 cost $30 per million input tokens at launch (March 2023). GPT-4 Turbo dropped that to $10. GPT-4o launched at $5 and adjusted to $2.50 by August 2024. That is a 92% reduction in 17 months.

Andreessen Horowitz documents this "LLMflation" as a 1,000× reduction in inference costs over three years. Epoch AI (March 2025) found that the price to match GPT -4's performance is falling by 40× to 200× per year.

Great news, right? Here'ss the problem. While unit prices cratered, total enterprise GenAI spending surged from $2.3 billion in 2023 to $37 billion in 2025, a 16× increase (Menlo Ventures, State of Generative AI, December 2025). OpenAI's own compute consumption followed the same trajectory, growing from 0.2 GW in 2023 to 1.9 GW in 2025.

Prices fell 1,000×. Spend grew 16×. The math doesn't lie: Usage is expanding far faster than costs are declining.

Jevons Paradox: The 150-Year-Old Warning You're Ignoring

This isn't a new phenomenon. In 1865, economist William Stanley Jevons observed that as steam engines became more fuel-efficient, total coal consumption increased because cheaper energy made more uses economically viable.

Microsoft CEO Satya Nadella put it plainly in January 2025: "Jevons paradox strikes again! As AI gets more efficient and accessible, we will see its use skyrocket."

And we're seeing exactly that, but through mechanisms most teams don't yet understand. Three specific forces are driving this paradox right now:

1. The Agentic Multiplier

Cheaper inference doesn't just mean "cheaper chat." It changes how we build. We are moving from single-turn chatbots to agentic workflows. A single "fix this bug" request to a coding agent doesn't trigger one API call. It triggers a loop: plan, draft, test, error, refine, verify. One user interaction can easily spawn 5, 10, or 50 inference calls.

The impact is measurable: OpenAI's own data shows that average reasoning token consumption per organization increased by approximately 320× in the 12 months leading up to 2025 evidence that agentic architectures are fundamentally reshaping usage patterns.

2. The "Reasoning Tax."

Newer models, such as OpenAI's o1 and o3, introduce a hidden cost driver: "Reasoning Tokens." These models generate thousands of internal tokens to "think" before producing a final answer.

You are billed for these hidden tokens at output token rates that are typically 3-5× more expensive than input rates. A simple prompt might trigger 100 input tokens and 50 visible output tokens,s but 5,000 hidden reasoning tokens you never see. Cost is no longer tied to answer length; it's tied to problem complexity.

3. The Compliance Premium (For EU Markets)

If you operate in Europe, the "cheapest model" isn't the one with the lowest token price. It's the one that doesn't trigger regulatory risk. Under the EU AI Act, many enterprise use cases of AI in HR screening, credit scoring, or critical infrastructure qualify as "high-risk" systems. These impose compliance costs estimated at up to €400,000 per system (roughly 17% of total AI investment).

The cost of governance is becoming a structural part of your AI P&L, specifically because non-compliance carries fines up to €35 million or 7% of global turnover.

The 4 Problems CTOs Miss When Token Prices Fall

1. Visibility Fragments as Usage Scales

When AI was a single team running a single model, costs were trackable. Now you have prompt chains, RAG stacks, and fine-tuning jobs. Because these costs are often buried under a single API key, you lose the ability to attribute spend to specific features or teams.

2. Your FinOps Practice Doesn't Cover LLMs Yet

Most FinOps tools were built for compute and storage, not prompt-driven workloads. They can't see that the same feature costs 10× more with a poorly written prompt, or that swapping models degrades quality. This means you're flying blind, tracking total API spend but unable to answer the questions that matter: which features are burning budget, which prompts are inefficient, and where switching models would save money without degrading output.

3. "Cheap" Enables Waste at Scale

When tokens cost $30/million, teams optimized prompts and cached responses. At $0.15/million, that discipline vanishes. Why spend an hour optimizing a prompt that saves $5/month? But when that same unoptimized prompt runs 10 million times, it's costing you $1,500/month instead of $150. As total volumes surge, the lack of hygiene compounds the problems.

4. The Optimization Window Is Closing

Here's what most CTOs don't realize: the best time to instrument your AI spend is before it becomes a crisis. Once you've got dozens of AI features in production, multiple model providers, and engineering teams that have built habits around unoptimized prompts, unwinding that is painful and expensive. Cloud taught us this lesson. Organizations that adopted FinOps early saved significantly more than those that scrambled to cut costs during a budget crunch.

Why Governance Is a Hedge, Not a Bet

The question isn't whether this becomes expensive; it's when. And the answer depends on a variable entirely outside your control: where token prices go next.

  • Scenario A (Bull Case): Prices keep falling. Jevons Paradox kicks in, usage explodes, agents run wild, and total spend rises.

  • Scenario B (Bear Case): Subsidies end, energy constraints bite, and unit prices rise. Your unit economics collapse, and total spend rises.

The only winning move is to build cost visibility and governance now. This allows you to throttle volume in a low-price world and optimize unit costs in a high-price world. Waiting to see which scenario plays out means you will be unprepared for both.

When This Isn't Your Problem (Yet)

Let's be honest about who shouldn't worry about this today.

If you are pre-product-market-fit, optimizing AI costs is premature. However, even if your spending is under €1,000/month, you shouldn't ignore this entirely. Don't buy an enterprise FinOps platform yet. But do start tagging your API calls. Building the habit of tracking "cost per feature" now is free; retrofitting it later when you have millions of unlabelled logs is expensive and painful.

What You Can Do This Week

You don't need a six-month implementation project. Start with four things:

  1. Tag your API calls. Add metadata to every LLM call, which feature triggered it, and which team owns it. This is the foundation of everything else.

  2. Calculate the cost per core action. What does it cost to generate one customer response or run one agent workflow? If you don't know this number, you can't make informed decisions about model selection or architecture trade-offs.

  3. Set a simple alert. Pick your biggest AI cost center and set a daily threshold. If the spending exceeds $X, someone gets notified. This alone would have prevented half the cloud cost horror stories of the past decade.

  4. Map your data sensitivity. If you operate under GDPR or the EU AI Act, identify which prompts contain personal or high-risk data. This determines which models and regions you can legally route to, and legal mistakes cost more than token optimization ever saves.

The Pattern Is Clear

Cloud waste is a $44.5 billion problem despite a mature FinOps industry. AI is on the same trajectory, but faster.

CTOs who build the practice now and instrument their AI spend and governance before it's urgent will be the ones who actually capture the value of falling token prices. The ones who wait will be writing the same "how we cut our AI costs by 40%" blog posts in 2027 that we've been reading about cloud for the past five years.

The only question is which character you want to play: the one who optimized early, or the one explaining to the board why last quarter's AI bill was 3× the forecast.

Want to see where your AI spend is actually going? PromptMetrics gives you cost-per-query visibility, prompt-level optimization insights, and budget alerts integrated in under 30 minutes. Start with the free tier and see your first cost breakdown today.

Self-hosted prompt registry + agent telemetry. Zero vendor lock-in. Runs on a $5 VPS.

Up next

Explore more from the blog

Engineering notes, release updates, and honest takes.

Get the best of the prompt engineering blog delivered to your inbox

Join thousands of AI enthusiasts receiving weekly insights, tips, and tutorials.