Polymr
EngineeringApr 20, 202616 min readPolymr engineering

Provenance as a product feature, not a backend log

Source-linked cells are the difference between a system the operator trusts and a system they audit twice. Provenance has to be the same primitive as the cell itself, not a query you run after the fact.

A system without provenance is a system the operator audits twice

Walk into a plant where the operations layer has shipped and watch a buyer review a drafted PO for the first time. The difference between the buyer who approves in three seconds and the buyer who flips to a separate window to verify is the same difference. The first buyer can click a cell, see the source PDF region the value was extracted from, and move on. The second buyer cannot, so they go to the inbox, find the email, open the PDF, scroll to the right page, and read the line themselves.

The two buyers are reviewing the same drafted PO. The system gave them the same data. What is different is whether the provenance is at the cell or at the bottom of the screen. The buyer with cell-level provenance trusts the system. The buyer without it audits it. Audit-twice is the failure mode that kills adoption of automated drafting, and it is structural to where the provenance lives in the data model.

This piece argues that provenance has to be a primitive of the cell, not a query you run later. The shape of the argument is structural: where provenance lives in the data model determines whether the operator trusts the surface or not. Get this wrong and the prettiest UI in the world will not save the deployment.

Source artifact
PDF · email · CAD
Parser
extract + region
Cell write
value + source ref
UI render
click-through
Operator
verify
  1. Quote PDF arrives
    14:08
    page 3, region x=240,y=440
  2. Extract + anchor
    14:08
    unit price 52.93 + region bounds
  3. Cell render
    14:08
    value · provenance dot
  4. Click cell
    14:09
    open source pane
  5. Show region
    14:09
    PDF page 3, highlighted
  6. Verify or override
    14:09
    two-second decision

Figures illustrative · drawn from observed pattern, not a named deployment

The provenance lookup loop, end to end. A source artifact lands. The parser extracts a value and anchors the region. The cell write carries the value and the source reference together. The UI renders the cell with a click-through to the highlighted source. The operator verifies in two seconds.

What cell-level provenance unlocks
Approval time
9s
median seconds buyer spends reviewing a drafted PO with cell-level provenance
from 47s
Drop to inbox
6%
proposals where the buyer opened the source email to verify before approving
from 68%
Auto-approve rate
71%
proposals the buyer chose to enable for automatic approval within their tolerance band
from 12%

Observed in two mid-market deployments where the same drafted-PO surface shipped first with footer-level provenance, then was retrofitted to cell-level provenance. Adoption numbers are within-deployment before and after the retrofit, not cross-customer.

Figures illustrative · drawn from observed pattern, not a named deployment

What footer-level provenance gets wrong

The most common shape we see when a team adds "audit trail" to their product is a footer panel. The cell shows the value. At the bottom of the form, there is a collapsible section called "Provenance" or "Sources" that lists the source artifacts that contributed to the form. The list shows the email subject line, the PDF filename, the parser run timestamp, maybe a link to the raw artifact.

The footer panel is technically correct. The provenance is there. The data model knows it. The auditor, if they ever asked, could find it. What the footer panel does not do is make the provenance useful in the moment the operator is deciding whether to trust the cell.

The structural reason is that the operator has to make a decision per cell and the footer panel is per form. The operator looks at the unit price, wonders if that is the right unit price for the right SKU, and has to do a manual join: scroll to the footer, find the source PDF, open it, navigate to the right page, find the right row, verify. That join is roughly forty seconds per cell, and there are eleven cells on a typical drafted PO. Five minutes per PO. Forty POs in a buyer's morning queue. Three and a half hours of footer-walking per day. The buyer stops doing it. They start either approving without checking or escalating to the senior buyer for spot checks. Both are failure modes.

I can see where the data came from. I have to leave the screen I am on to check it. So I do not check it. I approve the ones that look right and flag the ones that do not. The flagging is the same thing I was doing before the layer.
Buyer, week two of a footer-level provenance deployment

What cell-level provenance changes

The shape that works is cell-level provenance with two-click verification. Every cell on the surface carries a small indicator (a dot, a chip, a kerning subtle enough to be unobtrusive when the operator is reading and visible enough to be findable when they want it). Clicking the indicator opens a side panel that shows the exact source region the value was extracted from, highlighted in context. The buyer reads the line on the source PDF and either accepts the value with a click on the cell or overrides it with a typed value.

The change of shape is small in implementation and large in operator experience. The footer panel is replaced with cell decorations. The collapsible "Sources" section goes away. The data that backs the cell decorations is the same data that backed the footer panel. What changes is where the operator interacts with it.

The two-click time per cell is the load-bearing metric. If the operator can verify a cell in two clicks and two seconds, they will do it. If verification costs forty seconds, they will not. The cell-level architecture is the one that fits inside two seconds.

Footer-level provenance

What the form shows

  • PO line 1
    PMR-2810-1500-S · 280 · $52.93
  • PO line 2
    PMR-3104-CL-A · 560 · $11.40
  • Vendor
    V-218
  • Req date
    2026-06-12
  • Footer panel · Sources
    3 artifacts · click to expand
  • Verify cost
    ~40s per cell
Cell-level provenance

What the cell carries

Item PMR-2810-1500-S
src · quote.pdf p2 r4
Qty 280
src · email body line 4
Unit price 52.93
src · quote.pdf p3 col B
Vendor V-218
src · email sender domain
Req date 2026-06-12
src · purchase req PR-9928
Click target
highlighted region · 2s
Override target
inline typed value
Audit trail
same artifacts, surfaced inline

The data model is identical in both cases. The architectural change is that the provenance reference is part of the cell write, not a separate document on the side.

Figures illustrative · drawn from observed pattern, not a named deployment

Footer-level vs. cell-level provenance on the same drafted PO. The data model knows the same things in both cases. What changes is whether the operator can act on the provenance in the moment of decision or has to do a manual join to a separate section of the screen.

Where this hurts most · make-to-order
Estimating lead, custom industrial fabricator
Situation
Inbound customer RFQs arrived as a one-line email with a PDF drawing attached. Estimating, engineering, and purchasing each opened the drawing separately, often two business days apart.
What was breaking
The average quote-to-PO loop was four business days. One-off BOMs were rebuilt from scratch every time because nothing in the ERP keyed off a customer drawing. Half of quotes landed outside the +/-8% margin band the operations VP enforced.
  • BOM extraction + RFQ
  • Quote-to-procure
  • Engineering revisions
Outcome · 5 weeks
0.6days
Quote-to-PO cycle
was 4.1 days−85%
Illustrative, reflects this specific deployment. Outcomes vary by plant, stack, and scope.

The five places provenance has to live for it to work

Cell-level provenance is not free. The data model has to carry the provenance reference in five specific places. Skip any one of them and the operator finds the gap on day three of the rollout.

Place 1 · the cell write itself

Provenance is a column on the cell, not a join to a side table

Every cell write carries the source reference inline. The provenance is a structured object: source artifact id, source region (page, bounding box, line number, whatever the artifact format supports), parser version, confidence score. The provenance is written at the same time as the value. Looking up the provenance is a column read on the same row, not a join to a separate audit table.

The reason this matters: a join to a separate audit table is one more place that can be out of sync with the cell. If the value updates and the join misses, the provenance is wrong, and the operator notices because the highlighted region does not match the displayed value. Inlining the provenance with the value makes them atomic. They cannot drift.

Place 2 · the artifact store

Source artifacts are retained verbatim and content-addressable

The cell's source reference points to a stored copy of the source artifact, content-addressed by hash so the same artifact is the same row even if it was re-uploaded. The stored copy is the verbatim original (the actual PDF bytes, the actual email .eml). Not a re-rendered preview. Not a parsed extract. The operator clicking through has to land on the exact thing the parser was looking at, so the verification is meaningful.

Re-rendered previews introduce their own bug class. The renderer might display the PDF differently from how the parser parsed it. The operator looks at the preview, sees the expected text, and approves. Six weeks later the auditor opens the original PDF and finds different text. Verbatim retention is the only shape that prevents this.

Place 3 · the parser version

The parser version is recorded with the cell, not assumed from runtime

When parser version 1.4 extracted the value, the cell carries "parser 1.4." If the parser is upgraded to 1.5, old cells still reference 1.4. The operator investigating an old discrepancy can reproduce the extraction with the exact parser that produced it. This turns out to matter more than expected, because parser upgrades change extraction behaviour on edge cases, and the discrepancy walk needs to know which extraction algorithm produced the cell in question.[1]

Place 4 · the confidence score

Confidence is a first-class field, not a hidden tunable

The parser knows when it is uncertain. The cell carries that uncertainty as a numeric confidence score (0 to 1) and the UI renders cells with low confidence differently from cells with high confidence. The buyer's eye is drawn to the low-confidence cells. The high-confidence cells recede into a backdrop the buyer trusts by default. The two-second decision is a fast read of the low-confidence subset.

Place 5 · the override trail

Operator overrides leave a permanent record alongside the original value

When the buyer overrides a parsed value with a typed value, the original parsed value is not deleted. The cell becomes a small two-row history: the parsed value with its source and confidence, plus the override with the operator identity, the timestamp, and an optional reason. The override is the current cell value. The parsed value is there for the auditor to see what the parser thought and what the operator changed it to.

The data-model gut checkIf your cell write has fewer than four fields, you do not yet have cell-level provenance. The minimum is value, source reference, parser version, confidence. The override history makes it five. Anything less and the audit trail lives outside the cell, which is the failure mode.

What the click-through experience looks like

The mechanic of the click-through is mundane but worth describing precisely because most teams get it wrong on the first try. The cell has a small visual indicator. The buyer clicks it. A side panel slides in from the right (or a modal opens, depending on screen real estate) showing the source artifact at the relevant region, with the extracted value highlighted in place. The buyer reads the highlighted region. They close the panel. They are back on the form.

Three details matter. First, the click target is the indicator, not the cell itself, because clicking the cell has to remain reserved for editing the value. Second, the panel opens on the same artifact view the buyer expects (a PDF renders as a PDF; an email renders as an email), because translation between formats introduces its own confusion. Third, the panel closes with a single click or keystroke, because anything slower breaks the two-second rule.

The implementation is not exotic. Most modern PDF renderers support region highlighting. Email rendering is a solved problem. The work is in stitching the cell's source reference to the renderer in a way that consistently lands on the right region.

The four specific changes Polymr makes

When a deployment moves from footer-level to cell-level provenance, four concrete things change in the data model and the surface. None of them are large engineering projects. All of them are missing from the typical first version of an operations layer.

  1. 1
    Every cell write is a value + source reference + parser version + confidence
    The data model treats provenance as a property of the cell, not a separate audit object. The cell cannot exist without its source reference. A cell with no provenance is a typed override, which has its own provenance (the operator who typed it).
  2. 2
    Source artifacts are stored verbatim and content-addressed
    The artifact store keeps the bytes the parser parsed, addressable by content hash. Re-uploads of the same artifact resolve to the same stored object. Renders for the UI are produced from the stored bytes on demand and never substituted for the original.
  3. 3
    The UI renders provenance inline at the cell, with two-click verification
    Each cell carries a small click target. Clicking opens the source artifact at the highlighted region. The operator can verify or override in two clicks total. The footer panel is removed because it is no longer needed.
  4. 4
    Overrides leave a permanent two-row history at the cell
    The parsed value persists alongside the override. The cell's current value is the override. The parsed value is available to the auditor and to the parser-training feedback loop without reconstructing it from logs.
extractsource refverify or override
  • Source artifact
    PDF · email · CAD
  • Artifact store
    content-addressed
  • Parser
    extract + region
  • Cell write
    value + source
  • Operator UI
    click-through
  • Override
    appended row
  • Audit query
    cell-level

Figures illustrative · drawn from observed pattern, not a named deployment

The provenance pipeline. Source artifacts land and are stored verbatim. The parser extracts values and writes them as cells that carry their source reference inline. The UI renders cells with click-through. Operator overrides append to the cell, not replace it. The audit trail is the same data the operator interacts with.

What this looks like in week one

The first deployment is narrow. Pick one workflow where the operator currently double-checks every drafted record (most plants will hand you a list). Ship the cell-level provenance surface for that workflow with the verbatim artifact store behind it. Run the workflow alongside the existing footer-level surface for a week. Measure the operator's verify time per cell, the drop-to-inbox rate, and the approval throughput. The numbers usually move enough in the first week to justify the rollout to the rest of the workflows.

By week three, the operator has stopped opening the source email to verify routine cells. By week five, the operator has enabled auto-approval for the cells they trust without looking. The auto-approval is the leading indicator that the provenance surface earned its trust. Until that happens, the surface is approved-by-default-but-checked. That is the failure shape to monitor for.

The shape of the bet

Provenance is not a backend feature. It is the load-bearing surface that decides whether the operator trusts the drafting. Put it at the cell. Make it two clicks. Store the source verbatim. Everything else in the operations layer stacks on top of that.

The objections we hear most often

Three objections come up when we walk a team through this architecture. None of them are wrong. All of them have specific answers.

The first objection is the storage-cost argument. "If you store every source artifact verbatim, the storage bill is going to be enormous." The storage bill for buyer-side artifacts (quote PDFs, email bodies, drawing PDFs) is typically in the tens of gigabytes per year for a mid-market plant. Cheap. The storage bill becomes meaningful at the hundreds-of-thousands-of-records scale and even then is comfortably absorbed by object storage at today's prices. The compliance value of verbatim retention is orders of magnitude higher than the storage line item.

The second objection is the UI-complexity argument. "If every cell carries a click target, the screen is going to look noisy." The click target is intentionally small (a dot or a kerning) and recedes when the operator is not looking for it. The pattern that works is provenance as a subtle decoration on every cell, with low-confidence cells drawn more prominently. The operator's eye lands on the cells that need attention. The high-confidence cells are part of the background.

The third objection is the parser-trust argument. "If we make provenance this prominent, the operator will not trust the parser." This is the inversion of the failure mode and it does not happen in practice. Operators who can verify in two seconds trust the parser more, not less, because they have a fast feedback loop on its accuracy. Operators who cannot verify trust the parser less, because they assume the worst about anything they cannot check. The visible provenance is the trust-builder, not the trust-destroyer.

Footnotes

  1. [1]
    The parser-version field comes up most often during discrepancy investigations triggered by a vendor dispute. The dispute lands months after the original PO. The parser has been upgraded twice in the interim. Without the parser-version field on the cell, reproducing the original extraction requires either pinning the old parser binary or guessing. With the parser-version field, the cell carries exactly which extraction code produced it.