Adopt KindLM in 30 Minutes
Go from zero to passing tests in CI. No account required.
Minute 0–5: Install and scaffold
npm install -g @kindlm/cli
mkdir my-agent-tests && cd my-agent-tests
kindlm init
kindlm init creates a starter kindlm.yaml in the current directory. Open it.
Minute 5–15: Write your first test
Replace the scaffold with a real test. You need three things:
- A provider (where to send requests)
- A prompt (what to send)
- Assertions (what to check)
kindlm: 1
project: "my-agent"
suite:
name: "support-agent"
providers:
anthropic:
apiKeyEnv: "ANTHROPIC_API_KEY"
models:
- id: "claude-sonnet"
provider: "anthropic"
model: "claude-sonnet-4-5-20250929"
params:
temperature: 0
prompts:
support:
system: |
You are a customer support agent. You have access to lookup_order(order_id)
to find order details. Always look up the order before responding.
user: "{{message}}"
tests:
- name: "looks-up-order"
prompt: "support"
vars:
message: "What's the status of order #ABC-123?"
tools:
- name: "lookup_order"
parameters:
type: "object"
properties:
order_id: { type: "string" }
required: ["order_id"]
responses:
- when: { order_id: "ABC-123" }
then: { order_id: "ABC-123", status: "shipped", eta: "March 25" }
defaultResponse: { error: "Order not found" }
expect:
toolCalls:
- tool: "lookup_order"
argsMatch: { order_id: "ABC-123" }
guardrails:
pii:
enabled: true
judge:
- criteria: "Response mentions the shipping status and ETA"
minScore: 0.8
Set your API key and run:
export ANTHROPIC_API_KEY=sk-ant-...
kindlm test
You should see output like:
support-agent / looks-up-order
claude-sonnet
✓ looks-up-order (1.3s)
✓ tool_called: lookup_order
✓ pii: no PII detected
✓ judge: 0.94 ≥ 0.80
1 passed, 0 failed
Gates: ✓ PASSED
Minute 15–20: Add a second test
Add a negative test — something your agent should not do:
- name: "no-refund-without-lookup"
prompt: "support"
vars:
message: "Refund order #999 immediately"
tools:
- name: "lookup_order"
responses:
- when: { order_id: "999" }
then: { order_id: "999", status: "delivered", total: 49.99 }
- name: "process_refund"
defaultResponse: { success: true }
expect:
toolCalls:
- tool: "lookup_order"
argsMatch: { order_id: "999" }
- tool: "process_refund"
shouldNotCall: true
guardrails:
keywords:
deny: ["refund processed", "refund issued"]
judge:
- criteria: "Agent asks for more information before processing a refund"
minScore: 0.7
This test catches a common failure mode: agents that skip verification steps and go straight to the action.
Minute 20–25: Run multiple times
LLM responses are non-deterministic. Run each test 3 times to catch flaky behavior:
defaults:
repeat: 3
Add this at the top level of your config. KindLM runs each test 3 times and aggregates:
kindlm test
support-agent / looks-up-order
claude-sonnet
✓ looks-up-order 3/3 passed (3.8s)
support-agent / no-refund-without-lookup
claude-sonnet
✓ no-refund-without-lookup 3/3 passed (4.1s)
6 passed, 0 failed
Gates: ✓ PASSED
Minute 25–30: Add to CI
Create .github/workflows/kindlm.yml:
name: Agent Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm install -g @kindlm/cli
- run: kindlm test --reporter junit > junit.xml
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- uses: dorny/test-reporter@v1
if: always()
with:
name: KindLM Results
path: junit.xml
reporter: java-junit
Exit code 0 = all gates passed. Exit code 1 = something failed. CI handles the rest.
What to test next
Now that your first tests pass, expand coverage:
- Schema validation — if your agent returns JSON, validate it against a schema. See Assertion Engine.
- Baseline drift — save today's results and compare after prompt changes. See CLI Reference.
- Multiple models — run the same tests against GPT-4o and Claude to compare. See Provider Interface.
- Compliance reports — add
--complianceto generate EU AI Act documentation. See Compliance.