An operations-layer vendor evaluation looks like a software evaluation. It is not. It is closer to a control-systems evaluation. The wrong vendor in this slot does not just underdeliver. The wrong vendor writes bad data to your ERP and the audit trail is unrecoverable. Treat the first vendor call accordingly.
The shape of the call
The first call is usually 45 minutes. The vendor wants to show you their demo. You want to figure out whether they have built a real platform or wrapped an LLM around your existing pain. Spend ten minutes on the demo. Spend the remaining thirty-five on the twelve questions below. If the vendor cannot get through the twelve in the time you have, that is itself useful information.
The twelve questions
1. Show me an end-to-end write to a tier-one ERP in your demo environment.
Not a screenshot. A live write. If the demo environment cannot perform a real write to a real ERP instance, the vendor has not built the integration depth they are pitching. The dodge is usually "we have a connector for that but it is not in this demo environment." That is the wrong answer.
2. What happens when your model is wrong and the operator does not catch it?
Listen for two things. Whether there is a structured override record after the fact. And whether there is a rollback mechanism for the bad write. The answer "our model is rarely wrong" is the red-flag answer. Every model is wrong some of the time. A vendor without a story for the bad-write case has not run a real deployment.
3. Walk me through your audit log schema.
You want to hear about per-event records with input data, model version, operator identity, timestamp, and downstream write reference. If they describe a free-text log, the audit log is decorative. If they describe a per-event record with structured fields, the audit log is queryable. The difference matters at quarter-end.
4. How does your system handle a stale-context decision?
A real platform has a freshness model and refuses to act on stale data. The vendor should be able to describe what their freshness window is, how it is configured per workflow, and what the operator sees when freshness is exceeded. If they have no concept of freshness, every decision they make is implicitly assuming the data is current. That assumption breaks in production.
5. Show me a real customer's deployment, not a sandbox.
Even an anonymized one. A vendor who has shipped will be able to walk you through a real configuration, a real workflow with real volume, and a real set of overrides accumulated over months. A vendor who has not shipped will have only the sandbox.
6. Who owns the data model when our schemas change?
You will add a custom field to your item master at some point in the next year. The vendor's system has to either pick up the new field automatically or have a clear process for incorporating it. If the answer is "file a ticket and we will look at it," the vendor has not built for schema evolution. You will hit that limit by month six.
7. What is your time-to-rollback on a bad deployment?
A platform should be able to disable a workflow inside minutes. If the answer is "we would deploy a config change, which takes a couple of hours," the rollback surface is the wrong granularity. The right answer is a per-workflow kill switch the operator can hit themselves.
8. How do you handle approvals across multiple operators?
The real platform supports multi-step approvals, role-based gates, and threshold-based escalations as a first-class concept. The wrapper around a CRUD app supports a single approver per action. The difference shows up the first time you need a price-threshold approval to escalate to a senior buyer.
9. What happens to the system when the LLM provider has an outage?
A real platform degrades gracefully. The operator can still see proposals, still write manually, still query the audit log. A wrapper crashes or hangs. Ask specifically what the fallback path is. If the vendor cannot describe one, you will discover it the next time OpenAI or Anthropic has a regional incident.
10. How is provenance carried on the values your system produces?
This is the one question that most reliably separates the real platforms from the wrappers. A real platform carries provenance at the cell level, with click-through to the source artifact. A wrapper carries provenance in a footer panel or a free-text log. Ask the vendor to click a value in their UI and show you where it came from. If the click-through opens a side panel with the source region highlighted, the platform is real. If the click-through opens a long list of artifacts, you are looking at a wrapper.
11. What is your pricing model and where does it scale?
Per-seat pricing is fine for a small team and brutal when you roll out to a plant. Per-transaction pricing aligns incentives but punishes high-volume workflows. Per-workflow pricing is rare and tends to be the most predictable. The honest answer is hybrid. Watch for vendors who cannot articulate where their pricing breaks down at scale.
12. What happens if we cancel after six months?
You want to hear about a data export, a clean exit, and an unwind of any writes the system made. The dodge is "our customers do not cancel." The right answer is "we will hand you a CSV of everything in the system plus the audit log, and we will leave your ERP exactly as the audit log shows." If they cannot describe an exit, you are locked in by default.
The four red-flag answers
- Our model is rarely wrong. Every model is wrong sometimes. The vendor without a story for that case has not deployed at scale.
- We do not have integration to your ERP yet but could build it. Integration depth is the hard part. Vendors who say this often mean six months of consulting work.
- Our customers do not cancel. Either untrue or the customers are locked in. Both are red flags.
- The audit log is a free-text record. Not queryable means not auditable means the next compliance review will be a fire drill.
The vendor who failed our second evaluation was the one who had the prettiest demo. The vendor who shipped was the one who knew where their own system broke and could explain how to deal with it.
