Bounded Inference

operator_bound Sits between Verifiable Inference and Execution in the proof hierarchy.

What it is

Bounded Inference is the proof level for HuggingFace transformers and other ML classifiers. When your governance check runs a DistilBERT-class PII detector, toxicity classifier, or prompt injection model, the SDK automatically instruments the PyTorch graph, commits the per-operator execution trace as a Merkle tree, and issues a VPEC with proof_level_floor: operator_bound.

The proof level is called "Bounded Inference" because it proves that the model's outputs were within the calibrated bounds of what that model class produces on that hardware — without mathematically proving every arithmetic operation the way a ZK circuit would.

Why it's stronger than Execution

This is the key question for compliance buyers and auditors.

primust verify on an Execution VPEC checks: Ed25519 signature valid, RFC 3161 timestamp valid, schema valid. It cannot verify the output came from the declared model. You can commit a model hash and produce any output — the verifier has no way to check consistency.

primust verify on a Bounded Inference VPEC does all of that, plus: resolves the profile_id to a Primust-signed drift profile, confirms the gpu_class is covered by that profile, and checks that the committed merkle_root is consistent with the operator count and profile bounds for the declared model class. This is an additional offline-verifiable claim that Execution cannot make.

In plain English: an Execution VPEC proves the model was declared. A Bounded Inference VPEC provides evidence the model actually ran and produced outputs consistent with running that model on that hardware.

Trust the measurement

The trust assumption for Bounded Inference is "Trust the measurement" — meaning trust that Primust correctly calibrated the drift thresholds for that model class across hardware. The calibration profiles are signed by Primust's GCP KMS key and verifiable offline. A tampered profile is detectable.

How it works

Based on the NAO/TAO methodology (arxiv 2510.16028, Princeton/UIUC/HKUST). Primust's implementation has no blockchain dependency and no dispute protocol.

Per-inference process

SDK attaches forward hooks to all leaf modules in the PyTorch/ONNX graph
On each inference, the mean output of each operator is recorded locally
A Merkle tree is built over all operator outputs in execution order
merkle_root = sha256(merkle_tree(operator_outputs))
Only the merkle_root transits Primust — per-operator outputs never leave your environment (System Invariant 1 holds)
VPEC issues immediately — no async proof generation, no GPU proving job

What primust verify checks

Ed25519 signature valid (same as all VPECs)
RFC 3161 timestamp valid (same as all VPECs)
profile_id resolves to a valid Primust-signed profile (offline — no registry call if cached)
gpu_class is within the profile's calibrated GPU classes
merkle_root is consistent with operator_count and the declared profile bounds

# Bounded Inference VPEC verification output:
✓ Signature valid
✓ Chain intact
✓ ZK proofs valid
✓ Timestamp authentic
✓ No governance gaps
✓ Profile consistent  (primust/distilbert-class/v1.2.0 · A10G)

proof_level_floor: operator_bound
VPEC: vpec_abc123  Environment: production

Runtime overhead: 0.3% additional latency. No Modal invocation per VPEC. COGS impact: negligible.

Setup

Zero configuration for supported models. The SDK infers the stage type automatically from the model object.

from transformers import pipeline
import primust

p = primust.Pipeline(api_key="pk_sb_xxx", policy="ai_agent_general_v1")

classifier = pipeline("text-classification", model="unitary/toxic-bert")

@p.record_check("toxicity_check")
def run_toxicity(text):
    result = classifier(text)
    return CheckResult(
        passed=result[0]["label"] == "non-toxic",
        evidence={"score": result[0]["score"]}
    )

# SDK flow:
# 1. Detects transformers.Pipeline object with DistilBERT-class model
# 2. Calls GET /api/v1/registry/lookup?hash={model_hash}
# 3. Finds primust/distilbert-class/v1.2.0
# 4. Sets stage_type: bound_committed_inference
# 5. Attaches OperatorHook before inference
# 6. Computes merkle_root after inference
# 7. Issues VPEC with proof_level_floor: operator_bound

vpec = p.close()
print(vpec.proof_level_floor)           # operator_bound
print(vpec.provable_surface)            # 0.87
print(vpec.provable_surface_breakdown)
# {"mathematical": 0.67, "bounded_inference": 0.33, ...}
# (includes auto boundary_rule decomposition)

If your model isn't in the registry, you get a model_profile_missing advisory gap and the check falls back to Execution level. No configuration needed to handle the fallback — it's automatic.

Boundary rule decomposition (automatic)

The SDK automatically wraps bound_committed_inference checks in deterministic pre/post conditions. This is called boundary rule decomposition. It gives you Mathematical provable_surface on the deterministic portions — zero configuration.

# What happens internally for a @p.record_check() on a HuggingFace model:
#
# boundary_rule (Mathematical wrapper)
#   ├── pre_conditions (Mathematical)
#   │     ├── tokenization — deterministic, any verifier can re-run
#   │     ├── input_schema_validation
#   │     └── truncation_check
#   │
#   ├── bound_committed_inference (Bounded Inference)
#   │     └── transformer forward pass → confidence score
#   │
#   └── post_conditions (Mathematical)
#         ├── threshold_check — score > 0.85 → PASS
#         └── output_schema_validation

# Resulting VPEC breakdown:
# mathematical:         0.67  ← pre + post conditions
# bounded_inference:    0.33  ← inference core
# Result: "67% of this governance check is mathematically proven"

Tokenization is deterministic — any verifier can re-run it and confirm the token sequence matches the committed input hash. This narrows the trust assumption to raw text → token hash mapping only.

Model Profile Registry

The Model Profile Registry is Primust's signed registry of empirical per-operator drift profiles. Profiles are keyed by onnx_model_hash and signed by Primust's GCP KMS key.

Supported models (initial registry)

Category	Models covered by DistilBERT-class profile
PII detection	distilbert-base-uncased, bert-base-NER, dbmdz/bert-large-cased-finetuned-conll03
Toxicity	unitary/toxic-bert, martin-ha/toxic-comment-model, s-nlp/roberta_toxicity_classifier
Prompt injection	deepset/deberta-v3-base-injection, protectai/deberta-v3-base-prompt-injection
Bias detection	d4data/bias-detection-model, valurank/distilroberta-bias
Content moderation	facebook/roberta-hate-speech-dynabench-r4-target, cardiffnlp/twitter-roberta-base-offensive

Primust does NOT host these models. You download from HuggingFace normally. The profile lookup is a registry check at VPEC issuance; the profile is cached locally after the first fetch.

Request calibration for your model

If your model isn't in the registry, request calibration at app.primust.com/policy/registry. The model_profile_missing gap in your Gap Inbox links directly to this page with the model hash pre-filled.

Profile API

GET /api/v1/registry/lookup?hash={onnx_model_hash}
# 200: { "found": true, "profile_id": "primust/distilbert-class/v1.2.0", ... }
# 404: model not yet calibrated

POST /api/v1/registry/calibration-requests
{ "onnx_model_hash": "sha256:...", "huggingface_model_id": "unitary/toxic-bert" }
# Returns: { "request_id": "...", "status": "queued", "estimated_completion": "7d" }

Verifying a Bounded Inference VPEC

primust verify vpec.json

# Output includes:
# ✓ Profile consistent  (primust/distilbert-class/v1.2.0 · A10G)
# proof_level_floor: operator_bound

# Profile MISMATCH (integrity concern):
# ✗ Profile inconsistent: INVALID — PROFILE MISMATCH
#   merkle_root inconsistent with declared model class on gpu_class: a10g

# Python (embedded verification)
from primust_verify import verify

result = verify(vpec_json)
print(result.proof_level_floor)          # operator_bound
print(result.bounded_inference_valid)    # True | False | None (not applicable)

Audit guidance

Cite Bounded Inference as: "bounded-inference proven — per-operator execution trace verified against Primust-signed model profile."

Check for:

profile_id present in VPEC
gpu_class present and plausible for the organization's infrastructure
operator_count consistent with the declared model architecture
Verification output shows VALID (BOUNDED INFERENCE), not INVALID — PROFILE MISMATCH

A PROFILE MISMATCH result means the committed trace is inconsistent with the declared model running on the declared hardware. This is a serious integrity concern — escalate.

FAQ

Does this replace Verifiable Inference?

No. Verifiable Inference (EZKL ZK circuit) is mathematically stronger — it proves the computation was correct, not just within calibrated bounds. But EZKL only works on small MLP heads (<263K parameters). Full transformers (DistilBERT, BERT, RoBERTa) fail EZKL's fixed-point quantization constraints. Bounded Inference exists specifically for full transformers where ZK circuit compilation isn't feasible.

What if I'm running on a GPU class not in the profile?

The SDK reports this as a model_profile_missing gap and falls back to Execution level. Request calibration for your GPU class at app.primust.com/policy/registry.

Can an attacker fabricate a consistent Merkle root?

The threat model for Bounded Inference is governance compliance — customers proving their own governance to auditors. Customers have no incentive to fake. The Merkle commitment's value is evidentiary depth: more granular proof of what happened. For the adversarial cloud provider threat model (untrusted compute), the NAO/TAO paper provides additional protocol guarantees.

Is this different from EZKL's Verifiable Inference?

Yes, fundamentally. EZKL proves mathematical correctness via ZK circuit. Bounded Inference proves that per-operator outputs are within calibrated hardware physics bounds. EZKL requires no trust beyond the model weights; Bounded Inference requires trusting Primust's calibration profiles. Bounded Inference is cheaper (0.3% overhead vs minutes of GPU time), works on any model size, and issues synchronously.

How does provable_surface_breakdown work with boundary_rule decomposition?

Pre/post conditions (tokenization, threshold check, schema validation) earn Mathematical proof level. Only the inference core is Bounded Inference. A typical governance check achieves 60–75% Mathematical + 25–40% Bounded Inference in the breakdown.