The Script That Outlived Its Assumptions
I’ve been talking to engineering and finance leaders at companies running AI in production. One question comes up constantly, in different forms, but always the same shape:
What did it actually cost to produce a successful outcome — for this workflow, this customer, this week versus last?
Not cost per token. Not total model spend. Cost per outcome.
Almost nobody can answer it cleanly. Here’s why.
At some point, someone wrote a script. It joins application traces with model inference logs and infrastructure cost reports. It produces a number. Good enough for the quarterly review.
Then the system grows. Tracing gets sampled to control cost. Workflows span multiple services and queues. A single business event spawns retries. Infrastructure gets shared across tenants and workflows.
The script still runs.
But now the numbers depend on assumptions — how retries are grouped, how sampled traces get extrapolated, how shared compute gets allocated. Small architecture changes quietly break the math. A meaningful slice of engineering time goes into maintaining joins, patching edge cases, recalculating reports when something upstream changes.
This is where the build vs buy question usually comes up. And it’s almost always framed wrong.
It’s not about capability. The team can join those tables. They’re good engineers.
It’s about what they’re not building while they do.
Every week that script runs, the deeper cost isn’t the maintenance time. It’s that consequential decisions- which workflows to scale, whether a model migration actually saved money, whether a given automation is profitable, are getting made on numbers nobody fully trusts.
That’s the problem I’m working on. More soon.
If this resonates with something you're dealing with, I'd love to hear about it. Check out botanu.ai or reach me directly at deborah@botanu.ai.

