Bounded Inference
operator_bound Sits between Verifiable Inference and Execution in the proof hierarchy.
What it is
Bounded Inference is the proof level for HuggingFace transformers and other ML classifiers. When your governance check runs a DistilBERT-class PII detector, toxicity classifier, or prompt injection model, the SDK automatically instruments the PyTorch graph, commits the per-operator execution trace as a Merkle tree, and issues a VPEC with proof_level_floor: operator_bound.
The proof level is called "Bounded Inference" because it proves that the model's outputs were within the calibrated bounds of what that model class produces on that hardware — without mathematically proving every arithmetic operation the way a ZK circuit would.
Why it's stronger than Execution
This is the key question for compliance buyers and auditors.
primust verify on an Execution VPEC checks: Ed25519 signature valid, RFC 3161 timestamp valid, schema valid. It cannot verify the output came from the declared model. You can commit a model hash and produce any output — the verifier has no way to check consistency.
primust verify on a Bounded Inference VPEC does all of that, plus: resolves the profile_id to a Primust-signed drift profile, confirms the gpu_class is covered by that profile, and checks that the committed merkle_root is consistent with the operator count and profile bounds for the declared model class. This is an additional offline-verifiable claim that Execution cannot make.
In plain English: an Execution VPEC proves the model was declared. A Bounded Inference VPEC provides evidence the model actually ran and produced outputs consistent with running that model on that hardware.
How it works
Based on the NAO/TAO methodology (arxiv 2510.16028, Princeton/UIUC/HKUST). Primust's implementation has no blockchain dependency and no dispute protocol.
Per-inference process
- SDK attaches forward hooks to all leaf modules in the PyTorch/ONNX graph
- On each inference, the mean output of each operator is recorded locally
- A Merkle tree is built over all operator outputs in execution order
merkle_root = sha256(merkle_tree(operator_outputs))- Only the
merkle_roottransits Primust — per-operator outputs never leave your environment (System Invariant 1 holds) - VPEC issues immediately — no async proof generation, no GPU proving job
What primust verify checks
- Ed25519 signature valid (same as all VPECs)
- RFC 3161 timestamp valid (same as all VPECs)
profile_idresolves to a valid Primust-signed profile (offline — no registry call if cached)gpu_classis within the profile's calibrated GPU classesmerkle_rootis consistent withoperator_countand the declared profile bounds
# Bounded Inference VPEC verification output:
✓ Signature valid
✓ Chain intact
✓ ZK proofs valid
✓ Timestamp authentic
✓ No governance gaps
✓ Profile consistent (primust/distilbert-class/v1.2.0 · A10G)
proof_level_floor: operator_bound
VPEC: vpec_abc123 Environment: production
Runtime overhead: 0.3% additional latency. No Modal invocation per VPEC. COGS impact: negligible.
Setup
Zero configuration for supported models. The SDK infers the stage type automatically from the model object.
from transformers import pipeline
import primust
p = primust.Pipeline(api_key="pk_sb_xxx", policy="ai_agent_general_v1")
classifier = pipeline("text-classification", model="unitary/toxic-bert")
@p.record_check("toxicity_check")
def run_toxicity(text):
result = classifier(text)
return CheckResult(
passed=result[0]["label"] == "non-toxic",
evidence={"score": result[0]["score"]}
)
# SDK flow:
# 1. Detects transformers.Pipeline object with DistilBERT-class model
# 2. Calls GET /api/v1/registry/lookup?hash={model_hash}
# 3. Finds primust/distilbert-class/v1.2.0
# 4. Sets stage_type: bound_committed_inference
# 5. Attaches OperatorHook before inference
# 6. Computes merkle_root after inference
# 7. Issues VPEC with proof_level_floor: operator_bound
vpec = p.close()
print(vpec.proof_level_floor) # operator_bound
print(vpec.provable_surface) # 0.87
print(vpec.provable_surface_breakdown)
# {"mathematical": 0.67, "bounded_inference": 0.33, ...}
# (includes auto boundary_rule decomposition)
If your model isn't in the registry, you get a model_profile_missing advisory gap and the check falls back to Execution level. No configuration needed to handle the fallback — it's automatic.
Boundary rule decomposition (automatic)
The SDK automatically wraps bound_committed_inference checks in deterministic pre/post conditions. This is called boundary rule decomposition. It gives you Mathematical provable_surface on the deterministic portions — zero configuration.
# What happens internally for a @p.record_check() on a HuggingFace model:
#
# boundary_rule (Mathematical wrapper)
# ├── pre_conditions (Mathematical)
# │ ├── tokenization — deterministic, any verifier can re-run
# │ ├── input_schema_validation
# │ └── truncation_check
# │
# ├── bound_committed_inference (Bounded Inference)
# │ └── transformer forward pass → confidence score
# │
# └── post_conditions (Mathematical)
# ├── threshold_check — score > 0.85 → PASS
# └── output_schema_validation
# Resulting VPEC breakdown:
# mathematical: 0.67 ← pre + post conditions
# bounded_inference: 0.33 ← inference core
# Result: "67% of this governance check is mathematically proven"
Tokenization is deterministic — any verifier can re-run it and confirm the token sequence matches the committed input hash. This narrows the trust assumption to raw text → token hash mapping only.
Model Profile Registry
The Model Profile Registry is Primust's signed registry of empirical per-operator drift profiles. Profiles are keyed by onnx_model_hash and signed by Primust's GCP KMS key.
Supported models (initial registry)
| Category | Models covered by DistilBERT-class profile |
|---|---|
| PII detection | distilbert-base-uncased, bert-base-NER, dbmdz/bert-large-cased-finetuned-conll03 |
| Toxicity | unitary/toxic-bert, martin-ha/toxic-comment-model, s-nlp/roberta_toxicity_classifier |
| Prompt injection | deepset/deberta-v3-base-injection, protectai/deberta-v3-base-prompt-injection |
| Bias detection | d4data/bias-detection-model, valurank/distilroberta-bias |
| Content moderation | facebook/roberta-hate-speech-dynabench-r4-target, cardiffnlp/twitter-roberta-base-offensive |
Primust does NOT host these models. You download from HuggingFace normally. The profile lookup is a registry check at VPEC issuance; the profile is cached locally after the first fetch.
Request calibration for your model
If your model isn't in the registry, request calibration at app.primust.com/policy/registry. The model_profile_missing gap in your Gap Inbox links directly to this page with the model hash pre-filled.
Profile API
GET /api/v1/registry/lookup?hash={onnx_model_hash}
# 200: { "found": true, "profile_id": "primust/distilbert-class/v1.2.0", ... }
# 404: model not yet calibrated
POST /api/v1/registry/calibration-requests
{ "onnx_model_hash": "sha256:...", "huggingface_model_id": "unitary/toxic-bert" }
# Returns: { "request_id": "...", "status": "queued", "estimated_completion": "7d" }
Verifying a Bounded Inference VPEC
primust verify vpec.json
# Output includes:
# ✓ Profile consistent (primust/distilbert-class/v1.2.0 · A10G)
# proof_level_floor: operator_bound
# Profile MISMATCH (integrity concern):
# ✗ Profile inconsistent: INVALID — PROFILE MISMATCH
# merkle_root inconsistent with declared model class on gpu_class: a10g
# Python (embedded verification)
from primust_verify import verify
result = verify(vpec_json)
print(result.proof_level_floor) # operator_bound
print(result.bounded_inference_valid) # True | False | None (not applicable)
Audit guidance
Cite Bounded Inference as: "bounded-inference proven — per-operator execution trace verified against Primust-signed model profile."
Check for:
profile_idpresent in VPECgpu_classpresent and plausible for the organization's infrastructureoperator_countconsistent with the declared model architecture- Verification output shows
VALID (BOUNDED INFERENCE), notINVALID — PROFILE MISMATCH
A PROFILE MISMATCH result means the committed trace is inconsistent with the declared model running on the declared hardware. This is a serious integrity concern — escalate.
FAQ
Does this replace Verifiable Inference?
No. Verifiable Inference (EZKL ZK circuit) is mathematically stronger — it proves the computation was correct, not just within calibrated bounds. But EZKL only works on small MLP heads (<263K parameters). Full transformers (DistilBERT, BERT, RoBERTa) fail EZKL's fixed-point quantization constraints. Bounded Inference exists specifically for full transformers where ZK circuit compilation isn't feasible.
What if I'm running on a GPU class not in the profile?
The SDK reports this as a model_profile_missing gap and falls back to Execution level. Request calibration for your GPU class at app.primust.com/policy/registry.
Can an attacker fabricate a consistent Merkle root?
The threat model for Bounded Inference is governance compliance — customers proving their own governance to auditors. Customers have no incentive to fake. The Merkle commitment's value is evidentiary depth: more granular proof of what happened. For the adversarial cloud provider threat model (untrusted compute), the NAO/TAO paper provides additional protocol guarantees.
Is this different from EZKL's Verifiable Inference?
Yes, fundamentally. EZKL proves mathematical correctness via ZK circuit. Bounded Inference proves that per-operator outputs are within calibrated hardware physics bounds. EZKL requires no trust beyond the model weights; Bounded Inference requires trusting Primust's calibration profiles. Bounded Inference is cheaper (0.3% overhead vs minutes of GPU time), works on any model size, and issues synchronously.
How does provable_surface_breakdown work with boundary_rule decomposition?
Pre/post conditions (tokenization, threshold check, schema validation) earn Mathematical proof level. Only the inference core is Bounded Inference. A typical governance check achieves 60–75% Mathematical + 25–40% Bounded Inference in the breakdown.