Live Demo

See the agent work an invoice

Walk through the agent's architecture and decision trace on four sample invoices, each of which forces a different path.

Try the live demo →

Hosted free, so give it a moment to wake (30 to 60 seconds on first load). The public demo runs in simulation mode: it replays a scripted decision path so it's free and safe to leave open. It's a walkthrough of the agent's architecture and decision trace, not live model reasoning. The real agent runs the live model.

Most "AI" features are retrieval pipelines: the steps are fixed by the designer. This is different. It's an agent. I gave a language model a set of accounting tools and ran it in a loop, and the model itself decides which tool to call next based on what it just learned, iterating toward a goal.

The task is accounts-payable invoice processing. Hand it an invoice and it works out its own path: parse the invoice, validate the vendor, pull and match the purchase order, flag discrepancies, code the GL, then stop at a human-approval checkpoint, because in finance a machine prepares and a person authorizes. Every decision is an inspectable audit trail.

I designed it and directed Claude Code to build it. It's the agent counterpart to my Construction-Accounting RAG. Together they show two tools for two different problems.

What makes it an agent

The difference between an agent and a fixed pipeline is who decides the order of operations. In a pipeline, the designer hard-codes the sequence. In an agent, the model is given tools and run in a loop, and it decides which tool to call next based on what the last step revealed, iterating toward a goal. Nobody scripts the path. A different invoice produces a different path. That autonomy is the point.

Given a vendor invoice, the agent chooses its tools in whatever order fits the situation:

It never posts anything itself. Its final move always hands off to a human-approval checkpoint, mirroring real AP controls. And every step it takes is written to a visible audit trail. That inspectable reasoning is the differentiator. It comes straight from the accounting side of this: in finance, you don't trust a number you can't trace.

Four invoices, four paths

The demo includes four sample invoices. Each one forces the agent down a different path, which is how you can tell it's deciding rather than following a script.

Proposes Clean PO match. Invoice lines reconcile against the purchase order. The agent proposes a posting and routes it to approval.
Escalates Price overbill (~10%). The invoice exceeds the PO. The agent catches the variance and escalates instead of proposing.
Proposes No PO. Nothing to match against, so the agent codes the lines to GL accounts and proposes a posting.
Escalates Vendor on hold / no W-9. The vendor fails validation. The agent stops and escalates rather than preparing anything.

The decision trace

The agent shows its work. Each tool call and the reasoning behind it is laid out step by step, ending in either a proposed posting or an escalation, always pending a human's approval.

Decision trace for INV-A: the agent matches the invoice to PO-1001, then proposes a posting with two GL lines (5010) totaling $2,475.00, marked Pending approval with Approve and Reject buttons
A clean PO match: the agent reconciles the invoice, proposes a posting with the GL lines, and waits for a human to approve.
Decision trace for INV-D: the vendor lookup returns hold status and W-9 MISSING in red, so the agent escalates to a human for compliance review instead of proposing a posting
A vendor that fails validation: the agent flags the hold and the missing W-9, then escalates instead of preparing a posting.

RAG vs Agent

This sits next to my Construction-Accounting RAG demo, which answers plain-language questions grounded in a construction-accounting knowledge base. The two solve different problems, and the difference is worth being precise about.

RAG grounds the model's answers in retrieved knowledge. An agent lets the model choose and sequence its own actions in a loop. They compose, an agent can call RAG as one of its tools, but they are not the same thing.

RAGAgent
Control flowFixed by the designer: retrieve, then generateThe model chooses its own next action
Best forGrounded answers from a knowledge baseMulti-step tasks with branching decisions
Human roleReads the answerApproves the agent's proposed action
They composeAn agent can call RAG as one of its tools

Two demos, two tools, two different problems. Try them both:

Have a workflow like this?

Let's talk

If you've got a repetitive, rules-heavy process that still needs a human's judgment at the end, this is the shape of problem an agent fits. Reach out and I'll tell you whether it's a good candidate.

Get in touch →
← Back to all work