I DESIGN AI
SYSTEMS.
MODELS SHIP.
I PROVE IT.
The professional record. I build LLMOps systems and prove they work — eval harnesses, trace observability, cost routing, prompt maintenance. Twenty years in regulated software; every decision below is defended on its own page.
Selected Work
8 SYSTEMS / SHIPPED & DEFENDEDReasonable UX
2W/1L — the cheaper config won, proven by eval. Playwright agent that walks a site, scores the UX, and emits a multi-page PDF. Tiered Haiku scout → Sonnet executor → Opus advisor, with a separate eval harness that quantified which variant actually wins — the cheaper one.
RAG Brain
<$0.01 per full reindex · 5 defended decisions. A working RAG pipeline over a personal Obsidian vault. Five LLMOps decisions made explicitly and defended, with incremental indexing via content hashing.
QA Agent
84% token reduction (~55k → ~8.8k per run). Automated UX quality-audit agent on Playwright + multi-model scoring. Cut running cost 84% by stripping images from conversation history and shipping screenshots at JPEG-40.
Agent Panel
59 expert/reviewer agents + an LLM-as-judge eval. The panel of Claude subagents that runs my dev environment — domain experts paired with adversarial reviewers, plus skills, hooks, and scheduled automation. The system that reviewed this very redesign.
Reasonable Basis
Haiku 4.5 beat Sonnet 4.6 AND Opus 4.7 on false-positive refusal — at ~12× lower cost. Semantic tax research grounded in IRS publications — RAG with mandatory citations and graceful refusal. A 7-metric eval, all-pass, on a self-validating corpus.
rag-lens
A generative-UI 'flipbook' that drops on top of an existing RAG system — spatial visual browse of retrieved chunks instead of a chat list. The engineering-differentiation flagship.
red-team-bench
Red-teams LLM-agent prompts and tool schemas against OWASP LLM01 attacks, calibrated against AgentDojo / InjecAgent with Opus-generated hardening recs.
Personal OS Dashboard
SQLite mirror via Litestream, on Fly.io behind Cloudflare Access. Private SvelteKit dashboard reading a SQLite mirror of an Obsidian vault. Modules for projects, daily notes, biometrics, and system health.