v2.5.3
AI Code Audit Agent v2.5.3

The instrument that doesn't average away your catastrophes.

408 quality atoms across 28 areas. 5 rubric archetypes. Evidence-tagged findings with a full reliability bundle — paradox-resistant, type-stratified, governance-gated.

408Quality Atoms
28Audit Areas
5Rubric Types
8Compliance Overlays
R*Reliability Bundle
OBSERVED / INFERRED / ABSENT evidenceBinary gate — no averaging catastrophesκw + AC1 + ppos/pneg reliability bundle 408 atoms · 28 areas · 5 archetypesWARN → ROUTE → BLOCK governance ladder8 compliance overlays stackable 4 deployment models: SaaS · BYOK · VPC · AnalystSeven-phase ingestion enginePPR feature attribution OBSERVED / INFERRED / ABSENT evidenceBinary gate — no averaging catastrophesκw + AC1 + ppos/pneg reliability bundle 408 atoms · 28 areas · 5 archetypesWARN → ROUTE → BLOCK governance ladder8 compliance overlays stackable 4 deployment models: SaaS · BYOK · VPC · AnalystSeven-phase ingestion enginePPR feature attribution
Built for your world
Different orgs, different nightmares.

6 engineers. Demo on Tuesday.

"We need to know if this thing is going to embarrass us on launch day."
  • Advisory modeScores without gates. No blocked PRs, no broken sprints.
  • CI/CD in 4 linesDrop the GitHub Action. Trends visible in week one.
  • Binary gateSecrets, hardcoded creds, unprotected admin routes fire even in advisory mode.
  • 5-page executive reportHand it to your investor. Looks like you have a 40-person eng team.
What your CTO tells the board:"We have automated quality governance from day one. Our technical debt is measured, not guessed."

12 clients. 12 codebases. 1 reputation.

"Half our projects are inherited code we didn't write."
  • Multi-repo batch auditingSweep your entire portfolio.
  • Operational readiness overlayRace conditions, failover, load capacity.
  • Delta trackingSprint-over-sprint comparison. Measurable improvement.
  • Coding standards compliance"61% conformance, 0% automation."
What your delivery head tells the client:"Every project comes with an independently scored quality audit. Here's the evidence."

The CISO needs the report by Thursday.

"200 microservices, 4 compliance frameworks, a vendor who swears everything is fine."
  • 8 compliance overlaysHIPAA + SOC 2 + PCI + GDPR + ISO 27001 + NIST + OWASP + SOX.
  • VPC / on-premCode never leaves your boundary. Cryptographic erasure.
  • Full reliability bundleκw, AC1, ppos/pneg, R* gating on 95% CI lower bound.
  • Enforcement modeOne secret = area F. No override without a logged reason.
What your CIO tells the auditor:"Every codebase has a scored, evidence-tagged, compliance-mapped audit with full provenance. The confidence tier is T4."
Deploy your way
One instrument. Four deployment modes.
☁️
Fastest to start

SaaS Platform

Upload your repo. Scored report in minutes. Zero infra, zero config. Code purged after delivery.

→ app.codesentinel.dev/audit/new → Drag & drop or connect GitHub
🔁
Best for teams

CI/CD Pipeline

Gate every merge. GitHub Actions, GitLab CI, or Jenkins. Binary gates fail the build. Scores trend over sprints.

- uses: sentinel/audit@v2 with: strictness: enforcement overlay: soc2
▸_
Max control

CLI Tool

Terminal audits. JSON output to your dashboards. Offline with local models or BYOK cloud keys.

$ sentinel audit ./src \ --strictness standard \ --output technical \ --overlay hipaa,soc2
🔒
Enterprise / Air-gapped

VPC / On-Prem

Deploy inside your network. Code never leaves your boundary. Your GPU infra. SOC 2 & HIPAA-ready.

$ helm install sentinel \ sentinel/agent --values \ custom-values.yaml
How it works
From repo to governance action.
Every stage of the measurement instrument — the typing, the evidence, the gates, the reliability, the governance.
Step 1

The Big Picture

The core insight: Code audit is a measurement instrument, not an opinion. Every finding declares its type, its evidence, and its confidence.
Step 2

Classify Every Atom by Type

B
Binary
Yes/No. One fail = area collapse.
C
Coverage
What % meets the standard?
A
Architectural
Pattern judgment. 4-level rubric.
T
Tooling
Automation maturity level.
F
Config
Correct vs. misconfigured.
Why this matters: Without typing, audits average a security breach (B:0.0) with "nice architecture" (A:0.85), hiding catastrophes behind a decent number.
Step 3

Every Finding Declares Its Evidence

OBSERVED

"I saw it."

Direct evidence: exact file path, line number, verbatim code snippet.

INFERRED

"I deduced it."

Indirect evidence: logical deduction from other findings. Higher false-positive risk.

ABSENT

"It's not there."

Confirmed missing. Not "I didn't look" — "I looked and it's gone."

Why this matters: INFERRED findings have a higher false-positive rate. The framework tracks this separately so you know which findings to verify.
Step 4

The Binary Gate

Scenario A — No Binary failures

B:1 B:1 C:.8 A:.7 0.82
Normal weighted average applies.

Scenario B — One Binary failure

B:0 B:1 C:.8 A:.7 0.00
Binary gate triggers — entire area fails. One API key in code = security score zero.
Step 4b

Strictness Levels

Advisory

Discovery Mode

Gate OFF. All findings are recommendations. Nothing blocks.

Use: First engagement, onboarding.
Standard

Dual-Lens

Gate fires as flag but doesn't override composite. Both views side by side.

Use: Sprint reviews, health checks.
Enforcement

Full Governance

Gate auto-Fs the area. R*-gated. ABSENT enforced. Can BLOCK pipelines.

Use: Release readiness, compliance.
Step 5

The Reliability Bundle

v2.5.3 moves beyond simple κ. Three independent models audit, then a full reliability bundle measures how trustworthy the measurements are — type by type.

κw — Weighted Kappa

Quadratic weighted for ordinal atoms (C, A, F). Large disagreements penalised more than small ones.

AC1 — Gwet's AC1

Paradox-resistant fallback. Standard κ collapses under high prevalence. AC1 fires when po ≥ 0.90 AND κ ≤ 0.60.

ppos / pneg

Agreement decomposition for binary atoms. "Agree on failures" ≠ "agree on passes." ppos ≥ 0.70 required for BLOCK.

R* = min(Rord, Rbin)

Composite gate. Both ordinal AND binary must be reliable. Gates on 95% CI lower bound.

The κ paradox: High agreement can produce low κ when prevalence is skewed. The bundle detects this and engages appropriate fallbacks. The audit measures how trustworthy its own measurements are.
Step 6

R*-Gated Governance

R* ≤ 0.60
0.60–0.80
R* > 0.80

WARN ONLY

Informational. No action forced.

WARN + ROUTE

Flagged to decision-makers. Ticket requiring human review.

FULL AUTHORITY

Block deployments, auto-create remediation tickets.

Step 7

Observer Declaration & Confidence

68%
Coverage — files examined
87%
Resolution — deps traced
R* 0.83
Reliability — cross-validation

Audit Confidence Score

T1 · 0.40–0.54
T2 · 0.55–0.69
T3 · 0.70–0.84
T4 · 0.85–1.00
Source code+0.40
Multi-model (R* ≥ 0.80)+0.08
Enriched mode+0.05
Infrastructure artifacts+0.10
Team context+0.07
Coding standards+0.05
Design documentation+0.06
Load test results+0.05
Monitoring data+0.05
SBOM / dependency data+0.04
Previous audit+0.05
Compliance Overlays
Not a checkbox. A lens.
HIPAA touches 15+ of 28 areas. When active, it amplifies what matters, adds framework-specific questions, and requires deeper evidence.
🏥

HIPAA

45 CFR 160, 162, 164
13 mandatory43 overrides10 Qs
ePHI identification, column-level encryption, audit log separation.
🔒

SOC 2 II

Trust Services Criteria
13 mandatory26 overrides5 Qs
Change management, IaC enforcement, trust service scoping.
💳

PCI DSS 4.0

Requirements 1–12
13 mandatory27 overrides6 Qs
PAN tokenisation, CDE segmentation, 3.0× audit trail.
🇪🇺

GDPR

EU 2016/679
6 mandatory19 overrides7 Qs
Erasure cascading, portability, consent management.
🛡️

ISO 27001

Annex A (2022)
14 mandatory38 overrides8 Qs
ISMS alignment, risk traceability, access control.
🏛️

NIST 800-53

Rev 5 Moderate
16 mandatory52 overrides12 Qs
FedRAMP controls, continuous monitoring, FIPS 140.
🕸️

OWASP ASVS

Level 2
11 mandatory34 overrides9 Qs
Auth verification, session management, API security.
📈

SOX / J-SOX

§302, §404
9 mandatory22 overrides6 Qs
Financial integrity, audit trail immutability.
1

Elevate Areas

Force mandatory areas

2

Amplify Weights

1.5×–3.0× multipliers

3

Raise Evidence

Deeper proof required

4

Add Questions

Framework-specific atoms

5

Score Maturity

Ad Hoc → Optimised

Output Styles

Same data. Right format.

📋

Executive

2–5 pages

Traffic-light scorecard, narrative, top 5 findings, risk heatmap.

CTO · Board · Client Sponsor
🔬

Technical

20–40 pages

Full Q-by-Q with code evidence, NC log, remediation roadmap.

Dev Team · Tech Lead · Architect
📜

Compliance

40+ pages

Full evidence chain, risk register, instrument certificate.

Compliance Officer · Auditor
Complete Pipeline

The audit isn't linear. It's a system.

Ingestion feeds classification. Classification feeds measurement. Measurement feeds cross-validation. Cross-validation feeds governance. Governance feeds the report. The report feeds the organisational loop.

1
Ingest
Codebase + context + overlays
2
Classify
408 atoms × B/C/A/T/F
3
Measure
OBS / INF / ABS evidence
4
Gate
Binary collapse per strictness
8
Purge
AES-256 key destroyed
7
Report
EXEC · TECH · COMPLIANCE
6
Observe
Coverage + overlays + ACS
5
Validate
κw + AC1 + ppos/pneg → R*
v3
Forward Architecture
What the instrument becomes when its outputs are consumed by downstream analysis.
Everything below is designed and specified. The three-layer separation, the ingestion engine, the graph schema, the feature attribution engine, and the modules that compose on top of them.
01

Three-Layer Separation

Architecture

LLM layer (non-deterministic, runs once) → Math layer (deterministic, replayable, milliseconds) → Presentation layer (interactive). CTO adjusts thresholds; report updates instantly. No re-running the audit.

Layer 1: expensive, stochastic Layer 2: cheap, deterministic Layer 3: interactive, zero-cost
02

Seven-Phase Ingestion

Engine

tree-sitter AST parsing → import resolution → call graph → heritage chains → Leiden community detection → BFS process tracing. Single ingestion feeds all downstream modules.

Structure → Parse → Imports → Calls → Heritage → Communities → Processes
03

Graph Schema

Data Model

15 node types (Repo through Artifact), 12 edge types with epistemic typing. Confidence-scored relationships flow through to governance. Postgres hybrid: JSONB events + adjacency + pgvector.

15 nodes · 12 edges CALLS · IMPLEMENTS · SATISFIES EVIDENCES (OBS/INF/ABS)
04

Confidence Scoring

Epistemic

Every edge carries a confidence band. Static imports: 85–95%. Heuristic calls: 70–85%. Confidence propagates through evidence chains — uncertainty compounds multiplicatively.

≥80% → OBSERVED downstream 60–79% → INFERRED <60% → UNRESOLVED → review
05

Feature Attribution

PPR Engine

Personalised PageRank from each entrypoint with centrality-based infrastructure dampening. Two-layer scoring: feature-local findings vs Epic 0 platform health. Epistemic typing on every linkage.

weight = PPR × (1 − dampener) dampener = min(1, BC/threshold) threshold = 95th percentile BC
06

Epic 0 Isolation

Infrastructure

Shared infrastructure (auth middleware, logging, DB pools) stripped from feature epics into auto-generated Epic 0. Features scored accurately. Systemic issues visible separately.

"Epics 1–15 avg 0.82. Epic 0 (platform): 0.41. Your features are fine. Your infrastructure isn't."
07

Algorithm Selection

Self-Tuning

The codebase tells the system which math to use. Graph topology, distribution shape, and data availability select the solver. Every decision is an immutable event. Self-calibrating across engagements.

High entrypoint → backward slice Low entrypoint → PPR + seeds Power-law dist → degree+BC Flat dist → eigenvector
08

Epic Mapping Reverse

Submodule

Second extraction pass over the same ingested graph. Produces a structured Epic Map: hierarchical numbering, epic grouping, role assignment, per-feature quality scores — all evidence-tagged.

OBSERVED: /api/checkout → Stripe INFERRED: subscriptions table ABSENT: vs forward map only
09

Delta Engine

v2 Future

Forward map (specified) vs reverse map (built). Entity resolution: embeddings → LLM scoring → ILP global optimisation (not Hungarian — handles 1:many). CONFIRMED / DISPUTED / ABSENT / ORPHAN.

Step 1: embed candidates Step 2: LLM pairwise score Step 3: ILP global optimise Step 4: threshold governance
10

Graph RAG + MCP

Integration

Query the audit graph in natural language via semantic + graph traversal. Exposed as MCP server: your AI coding agent can ask "what's the quality score of this file?" in real time.

get_file_quality get_call_chain_risk get_feature_findings search_findings (NL)
11

Temporal Diff

v3 Future

Same audit, different time. Detects: regression/improvement, hidden scope creep, false completion (mock APIs, zero-assertion tests), remediation verification, velocity validation.

Event store makes diff trivial: query by audit_run_id, compare. Hidden scope creep = files changed with no sprint item.
12

Full-Stack Cross-Ref

v3 Future

Mobile + web + APIs + microservices in one graph. Detects: dead APIs, broken integrations, auth inconsistency, feature parity gaps, data model divergence. INFERRED → OBSERVED end-to-end.

Single-repo: INFERRED Multi-repo: OBSERVED e2e Observation cone: repo → system
13

Interactive Recomputation

Layer 2

Every parameter is a slider. Infrastructure threshold, alignment confidence, κ vs AC1 toggle, governance strictness. All Layer 2 — millisecond recomputation. Implicit sensitivity analysis.

PPR 10K nodes: ms Centrality: ms ILP 200 stories: seconds Fragile scores auto-flagged
Architecture

Three layers. One invariant.

The LLM layer runs once — expensive, stochastic, unrepeatable. Its outputs freeze as immutable events. The Math layer takes those frozen outputs and computes everything else: scores, attribution, feature boundaries, reliability, alignment. All deterministic. All replayable. All parameterisable.

The Presentation layer reads the math and renders. When a parameter changes, it asks the Math layer to recompute. Milliseconds. No LLM calls. No waiting. No cost.

The LLM outputs are frozen. The math is live.

Layer Stack
Layer 1 · LLMNon-deterministic · Runs once · Expensive
Layer 2 · MathDeterministic · Replayable · Milliseconds
Layer 3 · PresentationInteractive · Zero-cost · Parameterisable
Seven-Phase Pipeline
1StructureDir tree + file classification
2Parsetree-sitter ASTs (multi-lang)
3Import ResolutionTS aliases, Rust, Java, Go
4Call Graph3-tier symbol resolution
5HeritageInheritance + interfaces + mixins
6CommunitiesLeiden clusters → feature candidates
7Process TracingBFS from entrypoints
Ingestion Engine

Single ingestion. Multiple consumers.

The audit engine, the reverse Epic Mapper, and the documentation generator all consume the same graph. Two graphs means two realities means untrustworthy traceability.

Context enrichment injects dependency signatures into each chunk — method names, type hints, public interfaces. Without it, architectural assessment degenerates into file-level pattern matching.

Precompute at index time, serve complete context in one query. The graph answers "what depends on UserService?" in a single structured response.

Data Model

15 node types. 12 edge types.

The graph represents the full relational structure of audited codebases and the measurement outputs attached to them. Stored as adjacency tables in Postgres, projectable into a graph database when traversal becomes the bottleneck.

Edges carry epistemic metadata: IMPLEMENTS has weight and epistemic type. SATISFIES has alignment score. EVIDENCES carries OBSERVED/INFERRED/ABSENT. This isn't a code graph — it's a measurement graph.

Graph Schema
RepoCommitAuditRunFileSymbolRouteFeatureStoryAtomAreaFindingRequirementClaimOverrideToolReportArtifact
CALLSDEFINESDECLARES_ROUTEIMPLEMENTSSATISFIESEVIDENCESMEASURESCHANGED_INAFFECTSREADSWRITESIMPORTS
Area 28

Business Logic Hygiene

Business logic correctness requires domain knowledge. Business logic hygiene — whether code is structured so a domain expert could verify correctness — is measurable from static analysis. 7 new atoms.

28.1 · Domain Layer Separation A
28.2 · Business Rule Centralisation C
28.3 · Named Constants + Rationale C
28.4 · Explicit State Machines A
28.5 · Business Expectation Tests C
28.6 · Calculation Isolation A
28.7 · Edge Case Explicitness C
Trust & Security
How your code is handled.
Where does the code go? Who sees it? How is it purged?
Model A

Cloud SaaS

Upload → ZDR API transit → findings in client tenant → cryptographic erasure after configurable window.

AES-256-GCMTLS 1.3ZDR API
Model B

BYOK

Your API key. Platform becomes stateless orchestrator. No code stored beyond active session.

Client keyStatelessNo storage
Model C

VPC / On-Prem

Zero egress. Container in your infra. Bedrock, Vertex, or self-hosted LLM. Code never leaves your boundary.

Air-gapFedRAMPITAR
Model D

Analyst-Assisted

Human analysts on your infrastructure. NDA'd, background-checked. No external software deployed.

Human-mediatedNDA'd

Data Lifecycle — Cryptographic Erasure

INGEST
CHUNK
PROCESS
ASSEMBLE
DELIVER
PURGE 🔥
Why this matters: Model C satisfies Fortune 50 requirements. Model B leverages your existing LLM relationship. The trust architecture is the sales enabler — methodology only matters after security says yes.
The bigger picture
Code Audit as Organizational Sensing
In the Fractal Org framework, code audit isn't a standalone tool. It's a sensing function of a self-similar control loop that runs at every level of the organization.
👁 Sensing
🧠 Decisioning
⚡ Execution
✓ Verification
📚 Learning

Code Audit = The Sensing Function

It produces the telemetry that everything else acts on. If the sensing function produces unreliable data — no type distinction, no evidence tags, no reliability measurement — every downstream decision is corrupted.

What the instrument sees

Your codebase as a knowledge graph.

Every dependency, call chain, cluster, and execution flow — mapped, measured, and queryable. Hover nodes to inspect. Watch data flow through the system.

47 nodes · 68 edges · 6 clusters
6 findings · R* 0.83 · coverage 68%
Route / Entrypoint
Service / Controller
Model / Repository
Shared Infra (Epic 0)
External Integration
Finding (NC)
CALLS
IMPORTS
READS/WRITES
EVIDENCES