408 quality atoms across 28 areas. 5 rubric archetypes. Evidence-tagged findings with a full reliability bundle — paradox-resistant, type-stratified, governance-gated.
"We need to know if this thing is going to embarrass us on launch day."
"Half our projects are inherited code we didn't write."
"200 microservices, 4 compliance frameworks, a vendor who swears everything is fine."
Upload your repo. Scored report in minutes. Zero infra, zero config. Code purged after delivery.
Gate every merge. GitHub Actions, GitLab CI, or Jenkins. Binary gates fail the build. Scores trend over sprints.
Terminal audits. JSON output to your dashboards. Offline with local models or BYOK cloud keys.
Deploy inside your network. Code never leaves your boundary. Your GPU infra. SOC 2 & HIPAA-ready.
Direct evidence: exact file path, line number, verbatim code snippet.
Indirect evidence: logical deduction from other findings. Higher false-positive risk.
Confirmed missing. Not "I didn't look" — "I looked and it's gone."
Gate OFF. All findings are recommendations. Nothing blocks.
Gate fires as flag but doesn't override composite. Both views side by side.
Gate auto-Fs the area. R*-gated. ABSENT enforced. Can BLOCK pipelines.
v2.5.3 moves beyond simple κ. Three independent models audit, then a full reliability bundle measures how trustworthy the measurements are — type by type.
Quadratic weighted for ordinal atoms (C, A, F). Large disagreements penalised more than small ones.
Paradox-resistant fallback. Standard κ collapses under high prevalence. AC1 fires when po ≥ 0.90 AND κ ≤ 0.60.
Agreement decomposition for binary atoms. "Agree on failures" ≠ "agree on passes." ppos ≥ 0.70 required for BLOCK.
Composite gate. Both ordinal AND binary must be reliable. Gates on 95% CI lower bound.
Informational. No action forced.
Flagged to decision-makers. Ticket requiring human review.
Block deployments, auto-create remediation tickets.
Traffic-light scorecard, narrative, top 5 findings, risk heatmap.
Full Q-by-Q with code evidence, NC log, remediation roadmap.
Full evidence chain, risk register, instrument certificate.
Ingestion feeds classification. Classification feeds measurement. Measurement feeds cross-validation. Cross-validation feeds governance. Governance feeds the report. The report feeds the organisational loop.
LLM layer (non-deterministic, runs once) → Math layer (deterministic, replayable, milliseconds) → Presentation layer (interactive). CTO adjusts thresholds; report updates instantly. No re-running the audit.
tree-sitter AST parsing → import resolution → call graph → heritage chains → Leiden community detection → BFS process tracing. Single ingestion feeds all downstream modules.
15 node types (Repo through Artifact), 12 edge types with epistemic typing. Confidence-scored relationships flow through to governance. Postgres hybrid: JSONB events + adjacency + pgvector.
Every edge carries a confidence band. Static imports: 85–95%. Heuristic calls: 70–85%. Confidence propagates through evidence chains — uncertainty compounds multiplicatively.
Personalised PageRank from each entrypoint with centrality-based infrastructure dampening. Two-layer scoring: feature-local findings vs Epic 0 platform health. Epistemic typing on every linkage.
Shared infrastructure (auth middleware, logging, DB pools) stripped from feature epics into auto-generated Epic 0. Features scored accurately. Systemic issues visible separately.
The codebase tells the system which math to use. Graph topology, distribution shape, and data availability select the solver. Every decision is an immutable event. Self-calibrating across engagements.
Second extraction pass over the same ingested graph. Produces a structured Epic Map: hierarchical numbering, epic grouping, role assignment, per-feature quality scores — all evidence-tagged.
Forward map (specified) vs reverse map (built). Entity resolution: embeddings → LLM scoring → ILP global optimisation (not Hungarian — handles 1:many). CONFIRMED / DISPUTED / ABSENT / ORPHAN.
Query the audit graph in natural language via semantic + graph traversal. Exposed as MCP server: your AI coding agent can ask "what's the quality score of this file?" in real time.
Same audit, different time. Detects: regression/improvement, hidden scope creep, false completion (mock APIs, zero-assertion tests), remediation verification, velocity validation.
Mobile + web + APIs + microservices in one graph. Detects: dead APIs, broken integrations, auth inconsistency, feature parity gaps, data model divergence. INFERRED → OBSERVED end-to-end.
Every parameter is a slider. Infrastructure threshold, alignment confidence, κ vs AC1 toggle, governance strictness. All Layer 2 — millisecond recomputation. Implicit sensitivity analysis.
The LLM layer runs once — expensive, stochastic, unrepeatable. Its outputs freeze as immutable events. The Math layer takes those frozen outputs and computes everything else: scores, attribution, feature boundaries, reliability, alignment. All deterministic. All replayable. All parameterisable.
The Presentation layer reads the math and renders. When a parameter changes, it asks the Math layer to recompute. Milliseconds. No LLM calls. No waiting. No cost.
The LLM outputs are frozen. The math is live.
The audit engine, the reverse Epic Mapper, and the documentation generator all consume the same graph. Two graphs means two realities means untrustworthy traceability.
Context enrichment injects dependency signatures into each chunk — method names, type hints, public interfaces. Without it, architectural assessment degenerates into file-level pattern matching.
Precompute at index time, serve complete context in one query. The graph answers "what depends on UserService?" in a single structured response.
The graph represents the full relational structure of audited codebases and the measurement outputs attached to them. Stored as adjacency tables in Postgres, projectable into a graph database when traversal becomes the bottleneck.
Edges carry epistemic metadata: IMPLEMENTS has weight and epistemic type. SATISFIES has alignment score. EVIDENCES carries OBSERVED/INFERRED/ABSENT. This isn't a code graph — it's a measurement graph.
Business logic correctness requires domain knowledge. Business logic hygiene — whether code is structured so a domain expert could verify correctness — is measurable from static analysis. 7 new atoms.
Upload → ZDR API transit → findings in client tenant → cryptographic erasure after configurable window.
Your API key. Platform becomes stateless orchestrator. No code stored beyond active session.
Zero egress. Container in your infra. Bedrock, Vertex, or self-hosted LLM. Code never leaves your boundary.
Human analysts on your infrastructure. NDA'd, background-checked. No external software deployed.
It produces the telemetry that everything else acts on. If the sensing function produces unreliable data — no type distinction, no evidence tags, no reliability measurement — every downstream decision is corrupted.
Every dependency, call chain, cluster, and execution flow — mapped, measured, and queryable. Hover nodes to inspect. Watch data flow through the system.