LLMOPS · CHARLOTTE NC · BY APPOINTMENT

I DESIGN AI
SYSTEMS.
MODELS SHIP.
I PROVE IT.

The professional record. I build LLMOps systems and prove they work — eval harnesses, trace observability, cost routing, prompt maintenance. Twenty years in regulated software; every decision below is defended on its own page.

WHAT I BUILD

ADVISOR PATTERNS EVAL HARNESSES RAG AGENT ORCHESTRATION TRACE OBSERVABILITY COST TELEMETRY

Selected Work

8 SYSTEMS / SHIPPED & DEFENDED

Flagship

Reasonable UX

eval harnessPlaywright + vision

2W/1L — the cheaper config won, proven by eval. Playwright agent that walks a site, scores the UX, and emits a multi-page PDF. Tiered Haiku scout → Sonnet executor → Opus advisor, with a separate eval harness that quantified which variant actually wins — the cheaper one.

Shipped

RAG Brain

RAGVoyage · ChromaDB

<$0.01 per full reindex · 5 defended decisions. A working RAG pipeline over a personal Obsidian vault. Five LLMOps decisions made explicitly and defended, with incremental indexing via content hashing.

Shipped

QA Agent

multi-modelcost telemetry

84% token reduction (~55k → ~8.8k per run). Automated UX quality-audit agent on Playwright + multi-model scoring. Cut running cost 84% by stripping images from conversation history and shipping screenshots at JPEG-40.

Shipped

Agent Panel

multi-agenteval harness

59 expert/reviewer agents + an LLM-as-judge eval. The panel of Claude subagents that runs my dev environment — domain experts paired with adversarial reviewers, plus skills, hooks, and scheduled automation. The system that reviewed this very redesign.

In progress

Reasonable Basis

tax RAGcitation-grounded

Haiku 4.5 beat Sonnet 4.6 AND Opus 4.7 on false-positive refusal — at ~12× lower cost. Semantic tax research grounded in IRS publications — RAG with mandatory citations and graceful refusal. A 7-metric eval, all-pass, on a self-validating corpus.

In progress

rag-lens

RAGspatial UI

A generative-UI 'flipbook' that drops on top of an existing RAG system — spatial visual browse of retrieved chunks instead of a chat list. The engineering-differentiation flagship.

In progress

red-team-bench

securityOWASP LLM

Red-teams LLM-agent prompts and tool schemas against OWASP LLM01 attacks, calibrated against AgentDojo / InjecAgent with Opus-generated hardening recs.

Shipped

Personal OS Dashboard

SvelteKit · Fly.ioLitestream

SQLite mirror via Litestream, on Fly.io behind Cloudflare Access. Private SvelteKit dashboard reading a SQLite mirror of an Obsidian vault. Modules for projects, daily notes, biometrics, and system health.

DECISIONS DEFENDED across the project pages

$0.52–1.94

COST PER UX AUDIT · reasonable-ux

<$0.01

COST PER RAG REINDEX · rag brain

I DESIGN AI SYSTEMS. MODELS SHIP. I PROVE IT.

Selected Work

Reasonable UX

RAG Brain

QA Agent

Agent Panel

Reasonable Basis

rag-lens

red-team-bench

Personal OS Dashboard

I DESIGN AI
SYSTEMS.
MODELS SHIP.
I PROVE IT.