Skip to main content
v0.10 shippedMIT · Node 20+Self-hostedSQLite · Postgres · S3

Your prompts in Git.
Your traces, runs & evals
in one registry.

Self-hosted prompt registry and agent telemetry. Zero vendor lock-in. Runs on a $5 VPS. Versions prompts in Git, collects traces via OTLP, scores evals over time.

npm install -g promptmetrics && promptmetrics-server
28
REST endpoints
9
SQLite tables
3
Storage drivers
Zero
Vendor lock-in
~/checkout-api · zsh — promptmetricsworkspace=default
localhost:3000 · driver=github · sqlite=WALX-API-Key pm_*** ✓
The problem

Most teams version prompts in Google Docs and debug agents with console.log. PromptMetrics replaces both with Git and SQLite.

Before
  • · Prompts copy-pasted across Notion, Slack, and PR descriptions
  • · No idea which prompt version produced yesterday's bug
  • · Agent traces written to stdout and lost on container restart
  • · Eval scores tracked in a shared spreadsheet someone owns
After
  • · Every prompt version is an immutable Git tag with a commit sha
  • · Labels resolve production at runtime — no redeploys
  • · Traces and spans persist in SQLite, queryable over REST
  • · Eval suites score prompts over time, with full result history
See it live

Three surfaces. One process.

The dashboard ships with every self-hosted instance. No separate signup, no data egress — just open your local UI and inspect traces, evaluations, and operations in one place.

Agent trace detail
Hierarchical trace detail showing nested spans with status icons and duration metadata.
Trace detail page — inspect span waterfalls, status, and metadata for every agent step.
Evaluation trends
Evaluation score trend chart showing average score over time with min-max confidence band.
Evaluations page — track prompt quality scores, pass/fail thresholds, and result counts over any time window.
Live metrics
Dashboard overview showing summary cards, cost and latency chart, and recent runs table.
Overview page — monitor cost, latency, token usage, and system health at a glance.
Four primitives

We added two pillars in v0.10. Telemetry and evals are now first-class.

// before: prompts + logs
// now: prompts + logs + traces + spans + runs + labels + evals + audit

01

Prompt registry

Version, label, render. Git stores content; SQLite indexes metadata.

POST /v1/prompts
02

Metadata log

One POST per LLM call. Tokens, latency, cost, custom metadata — fully nested.

POST /v1/logs
03

Traces & spans

First-class telemetry for agent loops. No Jaeger. No collector. Just SQLite.

POST /v1/traces
04

Evaluations

Score prompts over time with structured eval suites and result history.

POST /v1/evaluations

Boring on purpose. Express in front. SQLite in the middle. Git underneath.

Optional Redis for caching and rate-limiting. Optional Postgres for multi-node. Optional S3 for object-storage backed prompts. Optional OTel for export. Everything optional except what you actually run.

Clients
Node SDK
TypeScript · typed
Python SDK
requests-based
CLI
promptmetrics
REST
curl, anything
Express App · :3000WAL
authHMAC-SHA256 · scopes · workspace
tenantX-Workspace-Id middleware
rate-limitsliding window · Redis or SQLite
auditasync batch → SQLite
circuit-breakerOpossum · GitHub 429 backoff
SQLite tables
prompts
api_keys
logs
audit_logs
traces
spans
runs
prompt_labels
evaluations
Storage & Optionals
Git / Filesystem
default · prompts/{name}/{ver}.json
GitHub driver
bare clone + Contents API
S3 / MinIO
set DRIVER=s3
Redis
LRU + rate limit
Postgres
set DATABASE_URL
OpenTelemetry
OTLP · opt-in
New in v0.10 · Traces, spans & runs

Drill into agent loops without buying an APM.

Emit spans from your code. PromptMetrics writes them to SQLite and stitches them into a tree under a trace_id. Workflow runs link the high-level outcome to the low-level steps. Optionally export the same data to Tempo, Jaeger, or Datadog over OTLP.

tracet_550e8400-e29b-41d48 spans1 error
duration 1.98s
span
timeline
ms
Selected span
llm-call · gpt-4o
span_ids_a91f0b3c
parent_ids_root
statusok
start_time165 ms
end_time1505 ms
duration1340 ms
tokens_in120
tokens_out340
curl
curl http://localhost:3000/v1/traces/t_550e8400/spans/s_a91f \
  -H "X-API-Key: $PM_KEY" \
  -H "X-Workspace-Id: default"
Built-in surfaces

The pieces you stop hand-rolling.

prompts /welcome-onboarding5 labels
label
resolves to
set by
updated
productionv1.4.2@ci-runner2d ago
stagingv1.5.0-rc.1@isabel3h ago
canary-euv1.5.0-rc.1@isabel1h ago
shadowv1.4.0@mikael12d ago
rollbackv1.3.5@isabel30d ago
Resolve at runtime

Stop hardcoding version strings. Apps fetch by label; you move labels with one POST. No re-deploys.

// app code — never changes
const p = await pm.prompts.get('welcome', {
  label: process.env.PM_LABEL  // 'production'
})

// ops — staged rollout
$ pm add-label welcome canary-eu --version 1.5.0-rc.1
$ pm add-label welcome production --version 1.5.0
Multi-tenancy

One server. Many workspaces. One header.

Every row in SQLite is partitioned by workspace_id. API keys are scoped to a workspace. A master key sees all. The X-Workspace-Id middleware resolves the tenant before your route handler runs.

curl
curl http://localhost:3000/v1/prompts \
  -H "X-API-Key: pm_********7a3f" \
  -H "X-Workspace-Id: eu-prod" 
default
us-east
Prompts
142
Keys
8
eu-prod
eu-west
Prompts
96
Keys
5
staging
us-east
Prompts
204
Keys
11
red-team
eu-west
Prompts
33
Keys
3
edge-asia
ap-south
Prompts
18
Keys
2
* master
all
Prompts
Keys
1

Eight concerns. Eight honest answers.

The post-pivot scope is wider — telemetry, evals, multi-tenancy. The promise is the same: nothing leaves your infra.

ConcernWithout PromptMetricsWith PromptMetrics
Prompt versioningGoogle Docs · scattered PRsGit-backed registry · immutable tags
Prompt rolloutsHardcode versions · redeployMove a label → instant
LLM observabilityconsole.log + stdoutPOST /v1/logs · SQL queryable
Agent debuggingBlack-box · re-run with printsTraces + spans + runs
EvaluationsSpreadsheet of vibesEval suites · scored over time
Multi-tenancyNew deployment per tenantX-Workspace-Id · one server
Storage backendLocked into vendor DBSQLite · Postgres · S3 · GitHub
Cost at scalePer-seat + per-GB egress$0 · runs on a $5 VPS
SDKs · CLI · REST

The same primitives, in your language.

agent.ts
prompts → traces → evals
import { PromptMetrics } from 'promptmetrics-sdk'

const pm = new PromptMetrics({
  baseUrl: 'http://localhost:3000',
  apiKey:  process.env.PM_KEY,
  workspaceId: 'eu-prod',
})

// 1. Resolve prompt by label
const p = await pm.prompts.get('welcome', {
  label: 'production',
  variables: { name: 'Alice' },
})

// 2. Open a trace + span around your LLM call
const trace = await pm.traces.create({ prompt_name: 'welcome' })
const span  = trace.span('llm-call')
const out   = await openai.chat(p.messages)
span.end({ tokens_in: 120, tokens_out: 340, status: 'ok' })

// 3. Score it
await pm.evaluations.addResult('accuracy-check', {
  run_id: trace.run_id, score: 0.94,
})
Providers

Works with any provider — because provider is just a tag.

PromptMetrics doesn't wrap your LLM client. You pass provider: 'openai' in the log payload. The SDK validates, SQLite indexes. That's the whole integration.

openaianthropicmistralollamavllmcoheretogethergroqllama.cppaleph-alphageminibedrock…your-model-here
Ship it

Four ways to install.
One process to run.

npm. Docker. Source. ghcr. They all end at a single promptmetrics-server process — running on your laptop, your VPS, or your cluster.

npm install -g promptmetrics && promptmetrics-server
docker compose up --build # from the cloned repo
docker run -p 3000:3000 ghcr.io/iiizzzyyy/promptmetrics
git clone … && npm i && npm run build && npm run db:init