I built an AI agent that processes vendor invoices. You hand it an invoice and it works out what to do on its own: pull the fields off the document, check the vendor, find the purchase order it references, match the two, flag anything that doesn't line up, and then either propose a posting or send it to a person. There's a working version on my site if you want to watch it run.
I went in expecting the hard part to be the reasoning, getting the model to make good calls on messy invoices. That was the easy part. The model reasons fine. The hard part was deciding what it was not allowed to do.
Here is the thing that changed how I think about it. One of the sample invoices comes from a vendor that is on hold and has no W-9 on file. I assumed I would have to write the logic for that case. Look up the vendor, and if it is on hold, stop. I never wrote it. The agent pulled the vendor record, saw the hold, and escalated on its own. It worked out from the tool result that it shouldn't go any further. Another invoice with a clean purchase order match ran all the way to a proposed posting without me steering it. Same agent, two different paths, because the inputs were different. Nobody scripted either one.
That autonomy is the part everyone talks about, and it is real. It is also the cheap part. Give a model a few tools and a loop and it will start making decisions. The work is everything around that. What tools it gets. What each tool is actually able to do. Where it has to stop and ask a person.
The agent can only be as safe as the tools you give it, so I built it so that it cannot post anything. It can propose. A person approves. That is not a guardrail I added at the end. It is the whole design. In accounts payable a machine prepares and a person authorizes, and the agent has to live inside that rule the same way a clerk would. I came at this from finance, so the first thing I worried about was what happens when it gets something wrong. Making it clever came second.
The other thing I would tell anyone building one of these is to make it show its work. Every step it took, every tool it called, the result that came back, all of it visible. If you cannot see why it did what it did, you cannot trust it with anything that touches money, and no accounting team is going to sign off on output they can't trace.
So the lesson landed almost backwards from where I started. The model's judgment was not the risk. The risk was handing it a tool that let it do something it shouldn't and not noticing. Most of building the agent was deciding, carefully, what it was not allowed to touch.
See it run
I wrote up how the agent works, with the actual decision traces, on the project page. The live demo is there too if you want to try it on the sample invoices.
See the AP Invoice Agent →