Confidence Report (experimental)

SASY provides two policy-translation pipelines with complementary goals:

The primary translate() flow takes an English spec plus your agent’s codebase and produces a Datalog policy targeted at your code. Reach for it when you’re authoring a policy against real tool names, field names, and message-graph predicates.
The experimental write_policy() flow focuses on validating the translation itself. It generates a truth table and runs adversarial verifier checks so you can gain confidence that the English → Datalog mapping is faithful. It doesn’t read your codebase, so it’s a better fit for short self-contained policies where the assurance artifacts are the point.

This page describes how to read the output of the write_policy() flow. Its result object carries a print_summary() method that renders a verification report — here’s how to interpret each section.

The Summary

result.print_summary()

Policy Compilation Result
========================================
Status:       SUCCESS (336s)

Structured Spec
  10 clauses (C1-C10)

Verification
  Truth table:        25 scenarios
    11 ALLOW | 3 DENY | 11 GUARD_DENY
  Assertions:         25/25 passed
  Independent checks: 3/3 passed
  Round-trip audit:   PASS

What Each Number Means

Truth Table (25 scenarios)

Every combination of membership tier, insurance status, and cabin class was tested. Each row shows what the policy decides for that scenario.

ALLOW — the action is authorized
DENY — the action is explicitly blocked (with a reason)
GUARD_DENY — the policy can’t decide because required data hasn’t been looked up yet

Abridged view of the generated table for the airline demo policy:

Action	Membership	Insurance	Cabin	Decision
cancel	gold	no	—	ALLOW
cancel	gold	yes	—	ALLOW
cancel	silver	yes	—	ALLOW
cancel	silver	no	—	DENY
cancel	regular	yes	—	ALLOW
cancel	regular	no	—	DENY
cancel	unknown	no	—	GUARD_DENY
modify	any	—	economy	ALLOW
modify	silver	—	basic_economy	ALLOW
modify	gold	—	basic_economy	ALLOW
modify	regular	—	basic_economy	DENY
modify	unknown	—	unknown	GUARD_DENY

Assertions (25/25 passed)

The generated Datalog was uploaded to the live SASY policy engine and evaluated against every truth table row. 25/25 means the Datalog produces the exact same decisions as the specification.

If any assertion fails, it means the Datalog translation has a bug — a specific scenario where the policy behaves differently than intended.

Independent Checks (3/3 passed)

Three separate reviewers examined the policy specification for correctness:

Reviewer	What it checks
Omission hunter	Did we miss any requirement from the English?
Boundary adversary	Do edge cases (unknown values, missing data) behave correctly?
Guard auditor	Are prerequisite lookups properly enforced?

Round-Trip Audit (PASS)

The Datalog was read back without seeing the original English, and the reconstructed policy intent was compared against the specification.

PASS means nothing was lost in translation — every rule in the Datalog maps to a clause in the specification, and vice versa.

Prerequisite Guards

Some authorization decisions depend on data that only exists after a prior tool call. The demo’s cancellation rules reference insurance and membership — fields that only appear after get_reservation_details runs.

write_policy handles this automatically: when the English spec references fields from prior tool results, the translator emits a guard rule that denies the call while those fields aren’t yet visible. For the demo’s cancellation spec, that becomes C5 in the generated policy:

Unauthorized(idx) :-
    Actions(idx, a),
    IsTool(a, "cancel_reservation"),
    !ToolResultField("get_reservation_details",
        "insurance", _),
    !ToolResultField("get_reservation_details",
        "membership", "gold").

!ToolResultField(...) is Datalog negation — the rule fires exactly when the fields haven’t been populated. From the agent’s perspective, the behavior is a natural retry loop:

Agent: cancel_reservation(RKLA42)
SASY:  ✗ DENIED — look up reservation details first
Agent: get_reservation_details(RKLA42)
SASY:  ✓ AUTHORIZED
Agent: cancel_reservation(RKLA42)
SASY:  ✓ AUTHORIZED (insurance=yes)

In the truth table these scenarios surface as the GUARD_DENY rows shown above; the Guard Auditor independent check verifies the ordering is actually enforced.

Programmatic Access

All verification data is available on the result object:

v = result.verification

# Truth table shape
v.truth_table.rows       # 25
v.truth_table.allow      # 11
v.truth_table.deny       # 3
v.truth_table.guard_deny # 11

# Assertion results
v.assertions.total       # 25
v.assertions.passed      # 25
v.assertions.mismatches  # 0

# Checks
v.independent_checks_passed  # 3
v.independent_checks_total   # 3
v.round_trip_audit           # "PASS"

When Something Fails

If a translation produces assertion mismatches, the service automatically retries — fixing the Datalog and re-evaluating until all assertions pass or a timeout is reached.

If independent checks fail, the structured specification may need adjustment. Review the check results and update the English policy to resolve any ambiguities.

Running the experimental flow end-to-end

The demo repo ships Makefile targets for the write_policy path. They translate policy_english.md, save the artifacts under output/, and run the 9-scenario demo against the translated policy.

# English → Datalog (writes output/airline_policy_experimental.dl
# + output/truth_table.tsv)
make translate-experimental

# Upload + run the 9 scenarios against the experimental policy
make demo-translated-experimental

# Interactive stepwise variant
make demo-translated-experimental-step

# Upload the experimental policy only (no demo run)
make upload-translated-experimental

make demo-translated-experimental passes output/airline_policy_experimental.dl as the --policy-file argument — the tracked policy.dl is not modified. Use make demo to run against the hand-written reference policy that ships with the repo.