Skip to main content
On this page
Engineering

FinOps for AI: How to Track & Reduce LLM Costs Per Feature

Izzy A
Izzy A
CTO @PromptMetrics

Spending over €5K/month on LLMs? Learn why per-feature cost tracking is critical for AI FinOps, EU compliance, and cutting token waste by up to 50%.

FinOps for AI: How to Track & Reduce LLM Costs Per Feature

You know your cloud bill by service. You know your headcount costs by department. But can you tell your board exactly which product feature is responsible for 40% of your token spend? More importantly, can you prove that the specific feature is actually profitable?

If you are spending €5K to €30K per month on LLM APIs and that number is growing fast, you aren't alone. Enterprise GenAI spending hit $37B in 2025 (Menlo Ventures). The money is flowing. The visibility is not.

Here is the reality: proper LLM cost attribution typically uncovers 20% to 50% in wasted spend within the first 30 days. For a company spending €15K/month, even a conservative estimate yields €3K in monthly budget savings, totaling €36K a year. That is a senior engineer, or three extra months of margin before your next funding milestone.

Let's break down the hidden costs, the compliance reality for EU teams, and the five dimensions you need to track.

You Are Not Just Paying for Tokens

When we talk about LLM costs, we usually mean the API bill from OpenAI, Anthropic, or Google. That is just the sticker price. The actual cost of running LLMs in production includes several hidden layers.

Here is what "cost" includes beyond the API invoice:

  • Token Spend: The raw per-token pricing from your provider.

  • Context Waste: This is often the biggest silent killer. A 2,000-token system prompt repeated across 100,000 daily requests costs you 200M tokens/month in static text alone. Without caching or optimization, you are paying to process the same text millions of times.

  • Reliability Overhead: As usage scales, timeouts increase. A single retry-heavy endpoint running at a 15% failure rate effectively inflates your bill by 15% for that feature you are paying for, calls that never delivered value.

  • Engineering Overhead: The hours your team spends investigating cost spikes, debugging prompt regressions, and building internal dashboards that are outdated by next quarter.

The uncomfortable truth? According to PwC's 29th Global CEO Survey (2026), 56% of CEOs report that AI has delivered neither increased revenue nor decreased costs. The problem isn't usually the AI itself; it's usually a lack of visibility into what is working and what is waste.

The Five Dimensions of LLM Cost Attribution

Mature cost management requires tracking spend across five dimensions simultaneously. This is the exact framework we used to build PromptMetrics, but you can apply it regardless of your tool stack.

Note that infrastructure proxies (like LiteLLM) can track totals by provider, but they lack business context. Accurate attribution happens at the application layer.

  1. User: Which end-users generate the most tokens? Are your power users profitable, or just expensive?

  2. Team: Which internal team owns the spend? Engineering, Product, or Customer Success?

  3. Feature: This is the critical missing link. When you can see that Feature A costs €8K/month and generates €50K in revenue, while Feature B costs €6K/month and generates €2K, the optimization path becomes obvious.

  4. Model: Which model is being called, and is it the right one for the task?

  5. Prompt Version: Which version is deployed, and how does its cost compare to the previous one?

The difference between "tracking" and "attribution" looks like this:

  • Before: "We spent €18K on OpenAI last month."

  • After: "Our support chatbot costs €7.2K/month on GPT-4o. 60% of that is one prompt template that could run on Haiku at a tenth of the cost. Switching saves us €3.9K/month."

Currently, 94% of enterprises report tracking AI costs, but only 34% have what researchers call mature cost management, meaning granular attribution, not just aggregate totals (Benchmarkit). That gap is where your margin is disappearing.

For EU Teams: The Compliance Reality

If you operate in the EU, cost attribution isn't just a financial exercise; it is a regulatory one.

The EU AI Act (with full compliance required by August 2026) mandates transparency. Articles 12 and 13 require detailed logging of system performance and resource consumption initially for high-risk AI systems. Still, these transparency norms are rapidly becoming the baseline expectation across all AI deployments.

The attribution infrastructure you build for cost governance doubles as your compliance backbone. With penalties for non-compliance reaching up to €35M or 7% of global annual turnover, relying on a monthly CSV invoice from OpenAI is no longer a viable strategy.

What Drives Your Bill (The Levers You Can Actually Pull)

Once you have visibility, you have control. This is FinOps for AI: the same discipline that turned cloud cost chaos into cloud cost governance, now applied to LLM spend.

Here are the levers that move the needle:

1. Prompt Efficiency (15–40% Savings)

Most teams have never audited their prompts for token efficiency. Verbose system prompts and redundant context injection are common culprits. Optimization here often delivers 15% to 40% savings within days of implementation.

2. Caching Strategy (40–60% Savings)

If your application serves similar queries (search, support, content generation), response caching is low-hanging fruit. For high-traffic endpoints, this can cut costs by 40% to 60%.

3. Model Selection & Routing

Not every call needs GPT-4 or Claude Opus. Intelligent routing (using smaller models for simpler tasks) allows you to balance cost against quality. The key is knowing which calls require the "smart" model and which do not.

4. Prompt Version Control

80% of enterprises miss their AI infrastructure forecasts by more than 25% (Benchmarkit). A significant reason is prompt changes that silently increase token consumption. You need to know the cost impact of a new prompt before the end of the billing cycle.

Pricing Models: Build vs. Buy vs. Hybrid

You have three paths to LLM cost visibility. Here is the breakdown:

Approach

Typical Cost

Best For

Build Internally

€30K–€80K (Eng. time) + maintenance

Platform teams with €100K+ monthly LLM spend who need custom stack integration.

Dedicated Tool (e.g., PromptMetrics)

€200–€2,000 / month

Scale-ups spending €5K–€50K / month who need immediate answers and compliance support.

Hybrid (Generic Observability)

€500+ / month + Eng. time

Teams already deep in Datadog/New Relic who want basic tracking, not optimization.

Cost vs. ROI: The Only Math That Matters

Let's run the numbers for a company spending €15K/month on LLM APIs.

Without cost attribution:

  • Monthly LLM spend: €15K

  • Estimated waste (conservative 25–35%): €3,750–€5,250/month

  • Annual waste: €45,000–€63,000

  • Result: You have no data to present to the board about AI unit economics.

With full optimization (using a ~€800/month tool):

  • Tool cost: ~€9,600/year

  • Identify & remove waste (20%): €3,000/month savings

  • Model routing improvements (15%): €2,250/month savings

  • Gross Annual Savings: Up to €63,000

  • Net Annual Savings: Up to €53,400

Note: This calculation is conservative and does not include the additional 40–60% savings potential from caching repetitive queries.

Frequently Asked Questions

How long does integration typically take?

For SDK-based tools, expect 15 minutes to half a day, depending on your stack. If a vendor quotes weeks of integration work, that is a red flag.

Will cost tracking add latency to my LLM calls?

It shouldn't. Look for tools that use async collection (logging after the response, not intercepting it). The best implementations add zero latency to the hot path.

How does this work with multiple LLM providers?

The best tools are provider-agnostic. You should be able to track costs across OpenAI, Anthropic, Google, and open-source models through a single interface. This is vital for avoiding vendor lock-in.

What to Do Next

If you are spending more than €5K/month on LLMs and you cannot answer the question "which product feature costs the most per user," you have a visibility gap that is impacting your margins.

You built your product to solve a complex problem. Don't let invisible LLM costs undermine the business model that supports it.

Here is a simple first step: try PromptMetrics free for 14 days. Connect your LLM calls, tag them by feature, and see exactly where your budget is going. Most teams identify their first cost-saving opportunity within hours.

No credit card required. Integration takes less than 30 minutes. Start your free trial Today

Self-hosted prompt registry + agent telemetry. Zero vendor lock-in. Runs on a $5 VPS.

Up next

Explore more from the blog

Engineering notes, release updates, and honest takes.

Get the best of the prompt engineering blog delivered to your inbox

Join thousands of AI enthusiasts receiving weekly insights, tips, and tutorials.