OpenTelemetry Trace Ingestion
Overview
The kindlm trace command lets you test real agent executions by ingesting their OpenTelemetry traces. Instead of sending prompts to providers, KindLM listens for OTLP/HTTP trace data, extracts model outputs and tool calls from span attributes, and runs assertions against them.
This is useful when:
- Your agent is already instrumented with OpenTelemetry
- You want to test against real production-like executions
- You need to validate traces from staging environments
- Your agent framework (LangChain, CrewAI, etc.) exports OTel traces
Quick Start
# 1. Start trace listener and run your agent
kindlm trace --command "python my_agent.py"
# 2. Or start listener only, send traces from elsewhere
kindlm trace --port 4318 --timeout 60000
The trace command automatically sets OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:<port> when spawning a command.
Configuration
kindlm.yaml
trace:
port: 4318
timeoutMs: 30000
spanMapping:
outputTextAttr: gen_ai.completion.0.content
modelAttr: gen_ai.response.model
systemAttr: gen_ai.system
inputTokensAttr: gen_ai.usage.input_tokens
outputTokensAttr: gen_ai.usage.output_tokens
spanFilter:
namePattern: "^chat\\."
attributeMatch:
gen_ai.system: openai
minDurationMs: 100
Span Mapping
KindLM follows the OpenTelemetry GenAI Semantic Conventions by default. The spanMapping section lets you override which span attributes map to assertion context fields:
| Field | Default Attribute | Purpose |
|---|---|---|
outputTextAttr | gen_ai.completion.0.content | Model output text |
modelAttr | gen_ai.response.model | Model identifier |
systemAttr | gen_ai.system | Provider system (openai, anthropic, etc.) |
inputTokensAttr | gen_ai.usage.input_tokens | Input token count |
outputTokensAttr | gen_ai.usage.output_tokens | Output token count |
Tool calls are extracted from spans with gen_ai.tool.name and gen_ai.tool.arguments attributes.
Span Filtering
Optional filters to select which spans are used for assertion evaluation:
| Field | Type | Description |
|---|---|---|
namePattern | regex | Only include spans whose name matches |
attributeMatch | map | Only include spans with these exact attribute values |
minDurationMs | number | Only include spans lasting at least this long |
OTLP Protocol
KindLM accepts OTLP/HTTP JSON traces at POST /v1/traces. This is the standard OpenTelemetry collector endpoint.
Request format
{
"resourceSpans": [
{
"resource": {
"attributes": [
{ "key": "service.name", "value": { "stringValue": "my-agent" } }
]
},
"scopeSpans": [
{
"scope": { "name": "openai.instrumentation" },
"spans": [
{
"traceId": "abc123...",
"spanId": "span1",
"name": "chat.completions",
"kind": 3,
"startTimeUnixNano": "1700000000000000000",
"endTimeUnixNano": "1700000001500000000",
"attributes": [
{ "key": "gen_ai.system", "value": { "stringValue": "openai" } },
{ "key": "gen_ai.response.model", "value": { "stringValue": "gpt-4o" } },
{ "key": "gen_ai.completion.0.content", "value": { "stringValue": "Here is..." } }
]
}
]
}
]
}
]
}
Response
200with{ "partialSuccess": {} }on success400on invalid JSON or malformed payload- CORS headers are included for browser-based exporters
Architecture
Core (zero I/O)
packages/core/src/trace/
├── types.ts OTLP wire types, ParsedSpan, TraceConfigSchema
├── parser.ts parseOtlpPayload() — flattens resourceSpans → ParsedSpan[]
├── mapper.ts filterSpans(), mapSpansToResult(), buildContextFromTrace()
└── index.ts Barrel export
parseOtlpPayload(raw)— Validates and flattens the nested OTLP structure into normalizedParsedSpanobjects with millisecond timestamps and flat attribute maps.filterSpans(spans, filter)— Applies name pattern, attribute match, and duration filters.mapSpansToResult(spans, mapping)— Extracts output text, tool calls, tokens, latency from span attributes using the configured mapping.buildContextFromTrace(result, options)— Converts aSpanMappingResultinto anAssertionContextthat assertion handlers can evaluate.
CLI
packages/cli/src/
├── utils/trace-server.ts OTLP HTTP server (node:http)
└── commands/trace.ts kindlm trace command registration
createTraceServer(port)— Lightweight HTTP server that acceptsPOST /v1/traces, parses payloads via core, and collects spans. Providesstart(),stop(),getSpans(),waitForSpans({timeoutMs}).registerTraceCommand(program)— Commander command that orchestrates: parse config → start server → spawn command → wait → filter/map → evaluate assertions → report.
Integration Examples
Python with opentelemetry-instrumentation
# Install OTel instrumentation
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-http
# Run with KindLM trace collection
kindlm trace --command "opentelemetry-instrument python my_agent.py"
Node.js with @opentelemetry/auto-instrumentations-node
kindlm trace --command "node --require @opentelemetry/auto-instrumentations-node my_agent.js"
Manual OTLP export
# Start listener
kindlm trace --timeout 120000 &
# Run your agent pointing at the listener
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
OTEL_EXPORTER_OTLP_PROTOCOL=http/json \
python my_agent.py
# Traces are collected and assertions evaluated when timeout expires
Latency Calculation
Latency is computed from root spans only (spans without a parentSpanId). Child span durations are not added to avoid double-counting. If multiple root spans exist, their durations are summed.