Skip to content

Confidence Report (experimental)

SASY provides two policy-translation pipelines with complementary goals:

  • The primary translate() flow takes an English spec plus your agent’s codebase and produces a Datalog policy targeted at your code. Reach for it when you’re authoring a policy against real tool names, field names, and message-graph predicates.
  • The experimental write_policy() flow focuses on validating the translation itself. It generates a truth table and runs adversarial verifier checks so you can gain confidence that the English → Datalog mapping is faithful. It doesn’t read your codebase, so it’s a better fit for short self-contained policies where the assurance artifacts are the point.

This page describes how to read the output of the write_policy() flow. Its result object carries a print_summary() method that renders a verification report — here’s how to interpret each section.

result.print_summary()
Policy Compilation Result
========================================
Status: SUCCESS (336s)
Structured Spec
10 clauses (C1-C10)
Verification
Truth table: 25 scenarios
11 ALLOW | 3 DENY | 11 GUARD_DENY
Assertions: 25/25 passed
Independent checks: 3/3 passed
Round-trip audit: PASS

Every combination of membership tier, insurance status, and cabin class was tested. Each row shows what the policy decides for that scenario.

  • ALLOW — the action is authorized
  • DENY — the action is explicitly blocked (with a reason)
  • GUARD_DENY — the policy can’t decide because required data hasn’t been looked up yet

Abridged view of the generated table for the airline demo policy:

ActionMembershipInsuranceCabinDecision
cancelgoldnoALLOW
cancelgoldyesALLOW
cancelsilveryesALLOW
cancelsilvernoDENY
cancelregularyesALLOW
cancelregularnoDENY
cancelunknownnoGUARD_DENY
modifyanyeconomyALLOW
modifysilverbasic_economyALLOW
modifygoldbasic_economyALLOW
modifyregularbasic_economyDENY
modifyunknownunknownGUARD_DENY

The generated Datalog was uploaded to the live SASY policy engine and evaluated against every truth table row. 25/25 means the Datalog produces the exact same decisions as the specification.

If any assertion fails, it means the Datalog translation has a bug — a specific scenario where the policy behaves differently than intended.

Three separate reviewers examined the policy specification for correctness:

ReviewerWhat it checks
Omission hunterDid we miss any requirement from the English?
Boundary adversaryDo edge cases (unknown values, missing data) behave correctly?
Guard auditorAre prerequisite lookups properly enforced?

The Datalog was read back without seeing the original English, and the reconstructed policy intent was compared against the specification.

PASS means nothing was lost in translation — every rule in the Datalog maps to a clause in the specification, and vice versa.

Some authorization decisions depend on data that only exists after a prior tool call. The demo’s cancellation rules reference insurance and membership — fields that only appear after get_reservation_details runs.

write_policy handles this automatically: when the English spec references fields from prior tool results, the translator emits a guard rule that denies the call while those fields aren’t yet visible. For the demo’s cancellation spec, that becomes C5 in the generated policy:

Unauthorized(idx) :-
Actions(idx, a),
IsTool(a, "cancel_reservation"),
!ToolResultField("get_reservation_details",
"insurance", _),
!ToolResultField("get_reservation_details",
"membership", "gold").

!ToolResultField(...) is Datalog negation — the rule fires exactly when the fields haven’t been populated. From the agent’s perspective, the behavior is a natural retry loop:

Agent: cancel_reservation(RKLA42)
SASY: ✗ DENIED — look up reservation details first
Agent: get_reservation_details(RKLA42)
SASY: ✓ AUTHORIZED
Agent: cancel_reservation(RKLA42)
SASY: ✓ AUTHORIZED (insurance=yes)

In the truth table these scenarios surface as the GUARD_DENY rows shown above; the Guard Auditor independent check verifies the ordering is actually enforced.

All verification data is available on the result object:

v = result.verification
# Truth table shape
v.truth_table.rows # 25
v.truth_table.allow # 11
v.truth_table.deny # 3
v.truth_table.guard_deny # 11
# Assertion results
v.assertions.total # 25
v.assertions.passed # 25
v.assertions.mismatches # 0
# Checks
v.independent_checks_passed # 3
v.independent_checks_total # 3
v.round_trip_audit # "PASS"

If a translation produces assertion mismatches, the service automatically retries — fixing the Datalog and re-evaluating until all assertions pass or a timeout is reached.

If independent checks fail, the structured specification may need adjustment. Review the check results and update the English policy to resolve any ambiguities.

The demo repo ships Makefile targets for the write_policy path. They translate policy_english.md, save the artifacts under output/, and run the 9-scenario demo against the translated policy.

Terminal window
# English → Datalog (writes output/airline_policy_experimental.dl
# + output/truth_table.tsv)
make translate-experimental
# Upload + run the 9 scenarios against the experimental policy
make demo-translated-experimental
# Interactive stepwise variant
make demo-translated-experimental-step
# Upload the experimental policy only (no demo run)
make upload-translated-experimental

make demo-translated-experimental passes output/airline_policy_experimental.dl as the --policy-file argument — the tracked policy.dl is not modified. Use make demo to run against the hand-written reference policy that ships with the repo.