Polymr
EngineeringMay 12, 202618 min readPolymr engineering

Human-in-the-loop as a product primitive, not a config flag

Approval cannot be a step in a workflow graph. It has to be a property of every action object the runtime can emit. Treating it any other way is one config flag away from a bad write to a production system.

The first version got the shape wrong

The first version of the workflow engine treated approval as a step in the workflow graph. The graph had nodes for parsing, classification, enrichment, validation, approval, and emit. Each node was the same kind of thing: a function the runtime invoked, with a configurable predicate for when it ran.

On paper this looked clean. In production it had the wrong shape. The problem was structural: if approval is a step, then skipping it is a configuration decision. Someone with edit permission on a workflow graph could remove an approval node, or set its predicate to false, and the runtime would happily emit external writes to a production ERP without a signed human approval in the audit trail.

We caught two instances in pre-production of an approval node being skipped because the workflow author copy-pasted a template that did not include the node. Both were caught by review. Neither would have been caught by the runtime.

Skipping the approval should have been a type error, not a review error.
Internal post-mortem, runtime v0.4

Two failure modes of bad HITL

Before we get to the right shape, it is worth naming the two failure modes the wrong shape produces. They look different on the surface. They are the same structural error from opposite directions.

Failure mode 1 · too much friction

Every action gates, the operator drowns

The cautious shape: route every effectful action through a full review queue. Every write to the ERP, every email to a vendor, every status flip on a WO record. Within a week, the queue has hundreds of items and operators are bulk-clicking through them at the rate of a few seconds per item. The signature is technically present on every action. The signature carries no information about whether anyone read the row.

This is the failure mode that looks safe and is not. An audit report shows 100% of writes were approved. An operations review shows the operator approved 312 rows in 14 minutes on a Friday. The system is structurally compliant and operationally hollow.

Failure mode 2 · too little trust

The LLM writes silently, the human finds out at quarter-end

The optimistic shape: trust the model, let writes flow, treat approvals as exception handling. This shape works for read-only enrichment and proposals. It is catastrophic for effectful writes. The first time an autonomous write hits the wrong vendor, or sends a PO at the wrong price tier, or flips a WO status that the floor was actually relying on, the operations VP will demand every autonomous write be unwound, and the unwind itself is the kind of operation nobody has tooling for.

The two failure modes converge on the same underlying error: treating approval as a runtime policy rather than a property of the action itself. Policy you can tune. Properties you cannot strip.

HeuristicIf your approvals system can be turned off by changing a number in a config file, it is a policy. Policies leak. Properties of a type do not.

The second version moved approval into the type system

The second version of the runtime made a single change with large consequences: every effectful operation. every write to a vendor portal, every PO emit, every supplier email. became an instance of a single Action object. The runtime is structurally incapable of producing side effects any other way.

The Action type has one mandatory field that the runtime checks at emit time: approval. The field holds a signed approval record, with a user id, a timestamp, and a content hash over the exact payload that is about to be emitted. An Action instance without a valid approval is not a runtime error. It is a type error. The code that constructs the Action will not compile if approval is omitted, and the runtime will not accept an Action whose approval hash does not match the payload.

Skipping approvals is no longer a config flag. It is no longer a property of the workflow graph. It is a property of every effectful operation the runtime can perform.

  1. Source
    Sources
    inbound to internal state
    • Supplier inbox
    • Vendor portals
    • CAD pipeline
    • ERP read mirror
    • Spreadsheet uploads
    • EDI 850 / 856
  2. Stage
    Processing
    parse · enrich · propose
    • Parsers per artifact
    • Reconcilers
    • Rule packs
    • Proposal drafts
    • Provenance index
  3. Human gate
    Approval gate
    human signature required
    • Bands · threshold rules
    • Per-vendor history check
    • Revision-mismatch guard
    • Bulk approve w/ scope
    • Signed audit record
  4. Sink
    Actions · external writes
    every emit is an Action object
    • PO emit · ERP
    • Vendor email send
    • Portal post (RFQ / award)
    • Inventory write
    • EDI 850 emit
Trace: effectful path

Figures illustrative · drawn from observed pattern, not a named deployment

The runtime architecture, with the approval gate as a first-class layer. Sources land as proposals. Proposals never become Actions until they cross the gate. The gate is the only path from internal state to any external system.

A taxonomy: what gates, what does not

Once approval is a primitive, the design question becomes: which operations are Action and which are not. We have a simple taxonomy. Anything that produces an effect on a system the operator does not own. vendor inboxes, ERP writes, portal posts, EDI emits. is an Action and must be approved. Anything internal. proposals, enrichments, log writes, lookups. is not an Action and does not gate.

Gating taxonomy
  • ·
    Read-only lookups · no gate
    Pulling current item-master record, vendor history, last-quoted price. Pure read against systems we have permission to read. No signature required, no audit row beyond the access log.
  • ·
    Internal proposals · no gate
    Drafted PO records, drafted BOM revisions, drafted RFQ packets. They live in Polymr-internal queues. They do not become Actions until an operator promotes them.
  • ·
    Internal log writes · no gate
    Every parse, every reconciliation decision, every proposal draft writes to the provenance index. These are audit substrate, not effectful operations. No signature; full provenance.
  • ·
    External write to systems we own · single-sig gate
    Writes back to the customer ERP. even ones the customer technically authorised in the integration scope. gate through a one-click approve lane when in-band. Out-of-band escalates to full review.
  • ·
    External communication to third parties · single-sig gate
    Vendor emails, supplier-portal posts, EDI 850/856 emits, customer-facing acknowledgements. Each is an Action object with a signed approval and a content hash.
  • ·
    Cross-tenant or destructive actions · two-sig gate
    Bulk cancels of in-flight POs, vendor blacklist changes, item-master deletions, anything that cannot be undone with a single counter-action. Two distinct operator signatures, neither can be the same person.

Scope of authority. the role-scoped console

Approval as a primitive answers "can this action emit at all?" A second question is harder: who is allowed to sign for what? The buyer who can approve a $50k coil PO should not be able to approve a $500k capital purchase. The intake engineer who can approve a BOM revision should not be approving vendor onboarding.

The runtime models this as a role-scoped console. Each operator is associated with one or more roles. Each role carries a scope. a set of Action types, value bands, vendor subsets, plant subsets. The approval surface for a given operator only shows the Actions inside their scope. Out-of-scope Actions exist on someone else's queue.[1]

The console is intentionally opinionated. It does not allow an operator to grant themselves additional scope. Scope changes are themselves Actions, and they require a two-sig gate from the role-admin role.

Runtime
proposal source
Console
role-scoped view
Operator
in-role signer
Gate
verify + emit
Audit log
immutable
External
ERP / vendor
  1. Proposal
    t0
    in-band · banded ok
  2. Open · review
    t1
    reads provenance inline
  3. Override price
    t2
    reason: vendor concession
  4. Sign · hash(payload)
    t3
    role: buyer-l2
  5. Append
    t4
    who · when · what · why
  6. Emit Action
    t5
    payload + sig
  7. Ack
    t6
    ext-id · timestamp

Figures illustrative · drawn from observed pattern, not a named deployment

The approval lifecycle for a single Action. from proposal landing in the operator's queue, through optional override + reason capture, to the signed approval and the immutable audit row. The audit log receives a full content hash, the proposal pointer, the source artifacts, and the override reason if present.

Approval discipline, post-primitive
Median time to approve
42 sec
seconds from proposal landing in queue to signed approval, in-band actions
reading-shaped
Override rate
6%
proposals where the operator changed at least one field before signing
reasons captured
Audit completeness
100%
Actions where every field traces back to a source artifact in the audit row
primitive property

Approval-surface metrics on a representative deployment, measured against a four-month baseline. The point of the metrics is to show approval is reading work, not clicking work.

Figures illustrative · drawn from observed pattern, not a named deployment

What the gate actually evaluates

Making approval a primitive only matters if the approval surface is usable. A queue that floods with every routine write is functionally the same as a config flag. operators will start bulk-clicking through it and the signature stops carrying meaning.

The gate evaluates three kinds of conditions to decide what shows up:

  • Bands. Most routine actions sit inside pre-agreed bands. supplier within historical price range, quantity within rolling forecast, vendor on the approved list. Within band, the action shows up in a one-click approve lane. Out of band, it escalates to a full review.
  • Vendor / item history. A new supplier, a new item, or a revision that has not appeared on a PO in the last twelve months always escalates. The gate treats absence of history as a signal.
  • Bulk-vs-unit scope. Bulk approve is available, but it operates on an explicit scope (e.g. "all in-band POs to V-218 this morning") with a single signed approval that covers the scope. The individual Action objects each carry the scope reference, so the audit trail can reconstruct which scope authorized each write.
ProcurementApprovals
env: sandbox

Approvals · procurement

2 selectedConsolidate RFQsApprove all valid
All7RFQ sends2Quote review2Awards1PO sends2
TaskEntityVendorAmountΔAction
RFQ send
PMR-4124 · race
aged 12m
V-201, V-218, V-244
-Review
Quote review
PMR-4031 · brake hub
aged 38m
V-218
$ 14,820−4.2%Review
Quote review
PMR-4218 · shaft
aged 52m
V-201
$ 6,140−1.1%Review
Award
PMR-4031 · brake hub
aged 1h
V-218
$ 14,820within bandReview
PO send
PO 84231
aged 1h
V-218 · 280 ea
$ 14,820Review
RFQ send
PMR-4304 · cap M6
aged 2h
V-244, V-301
-Review
PO send
PO 84228
aged 3h
V-244 · 1280 ea
$ 4,420Review

Every external write (vendor email, PO send, ERP update) waits here for an operator. Bulk approve respects per-vendor consolidation rules.

Figures illustrative · drawn from observed pattern, not a named deployment

The procurement approvals surface. Each row is an Action waiting on signature. Bands, vendor history, and consolidation rules drive what is shown as routine vs escalated. Bulk approve preserves per-Action provenance.

Where this is hardest · multi-plant industrial
Operations lead, multi-site industrial components manufacturer (four plants)
Situation
Four plants procured overlapping SKUs from partially overlapping vendor lists. Each site ran its own planner and its own approved-vendor file. Central operations saw a roll-up only at month-end.
What was breaking
PMR-4124 bearing race was bought by three plants from three vendors at unit costs spanning 18%. Plant-level POs were sent before central could batch them. Vendor consolidation was a slide deck, never an action.
  • Planning + purchasing
  • Quote-to-procure
  • Margin and bottleneck analysis
Outcome · 14 weeks
2.6$M/yr
Top-two-category landed cost
was $3.42M/yr−$840K/yr
Illustrative, reflects this specific deployment. Outcomes vary by plant, stack, and scope.

The audit log is a first-class output

Treating approval as a primitive gives you something almost as valuable as the safety guarantee: a complete, source-linked audit trail. Every external write the runtime ever performed can be traced backward through the Action it was emitted as, the approval that authorized it, the proposal it was drafted from, and the source artifacts the proposal was reconciled from.

This is the line that auditors and IT security teams care about most. "Show me every PO Polymr has emitted in the last quarter, who approved each, against what input evidence" is a single query against the audit index. We designed the index that way because once approval is a primitive, the audit shape comes nearly for free.

The cost of getting the primitive wrong

The cost of the first version was not abstract. We watched the team start to write defensive code around the runtime - wrapping emits in their own check layers, building per-call assertion harnesses, gating CI on snapshot tests of serialized workflow graphs. Every one of those was a workaround for a missing primitive.

The second version made all of that defensive code obsolete in a single change. The wrapper checks were deleted. The assertion harnesses are now type-level. The snapshot tests got replaced with a much smaller set of tests against the Action constructor.

The general lesson is one we keep relearning: when the right shape is a type, do not implement it as a graph node. The type catches the case at construction time, in every code path, forever. The graph catches it on the paths you remembered to add a node to.

Building the surface around the primitive

The primitive is one piece. The surface around it is the other. A primitive that is structurally safe but operationally painful gets routed around within a quarter. operators will find ways to batch-approve everything, or to push work to a colleague with a different permission scope, or to do the write manually outside the system. The surface has to be good enough that the safe path is also the fast path.

Three principles drove the approvals UI. First, every Action in the queue shows its provenance inline. the source PDF, the parsed proposal, the band the action sits inside or outside. The operator does not click through to a second screen to understand what they are approving. Second, the one-click lane and the full-review lane are visually distinct. The one-click lane is for in-band actions where the operator's role is to confirm the runtime did not err; the full-review lane is for escalations where the operator's role is to apply judgment. Mixing the two breaks the operator's intuition for how much attention a row needs. Third, bulk approve is scoped, not unbounded - a scope query, an explicit count, a signed approval that carries the scope reference forward.

We have iterated this surface twice. The first iteration had a generic queue with row-level approve buttons; in practice it produced approval fatigue within a week. The current iteration separates the lanes, surfaces provenance inline, and constrains bulk approve to explicit scopes. The per-operator approval throughput went up while the rate of retrospective "I approved that without looking properly" flags went down.

The rules we follow for our own approval gates

A short, opinionated checklist. We apply this to every new Action type before it ships to a customer deployment.

Polymr approval-gate rules

Seven rules we apply before any new Action ships

  1. 1
    Every Action carries a content hash, not a payload pointer
    The signature is over the bytes that will be emitted, not a reference to a record that might be edited between signature and emit.
  2. 2
    The proposal screen shows provenance without a click
    Source artifacts are inline. The operator never has to navigate to understand what they are approving.
  3. 3
    In-band and out-of-band lanes are visually distinct
    Same surface, different lane. Mixing the two breaks operator intuition about how much attention to apply.
  4. 4
    Bulk approve is scoped, never unbounded
    Bulk operates on an explicit query with a count. The operator confirms the count. Every Action in the bulk carries the scope reference forward into its audit row.
  5. 5
    Override fields require a reason string
    If the operator changes a field the runtime drafted, a free-text reason is mandatory. The reason is part of the audit row, not a separate log.
  6. 6
    Approval scope cannot be self-granted
    Scope changes are two-sig Actions from the role-admin role. The operator who needs the scope cannot be one of the two signatures.
  7. 7
    The audit row is immutable from inside the runtime
    Once written, the audit row is append-only. Corrections are themselves new rows. The runtime has no API to rewrite history.

What this means for buyers evaluating an operations layer

If you are evaluating an operations layer that writes to your ERP, the question to ask is not "does it support approvals?" The honest answer from every vendor will be yes. The right question is "is the approval gate a property of every external write, or is it a step in the workflow that someone can remove?" The two have radically different safety properties and the difference will show up the first time someone bulk-edits a workflow on a Friday afternoon.

A second question worth asking: "Show me the audit record for a single PO emit, end to end." If the answer is a log file with the emit timestamp and the user that ran the workflow, the approval is bolted on. If the answer is a structured record that shows the Action, the signed approval, the proposal it was drafted from, and the source artifacts - all linked, all queryable. the approval is structural. Only one of those answers survives the first time a vendor disputes a PO line.

Footnotes

  1. [1]
    The role-scope model is intentionally coarser than a full RBAC matrix. We have seen too many deployments drift into per-user exception scopes that nobody can reason about. A small number of named roles with explicit scopes beats a sprawling permission matrix every time we have measured.