Sutra — Hierarchy-aware verification copilot

The bottleneck

Hierarchy is the memory of VLSI work. Most tools flatten it away.

The bottleneck looks different in every phase of a silicon engagement. The identifier in your query exists in five blocks. The interface you are about to change has consumers no full-text search can enumerate. The spec clauses without assertions never appear in any positive match. None of these are content questions. The answers are relationships the agent has to walk, not text it can match.

Four moments from a single engagement, drawn from a fabless team bringing up cortex_v3 (a derivative SoC built on cortex_v2 with a new accelerator). The cast carries from one panel to the next. Use the arrows below, the dots, or swipe on touch to step through.

engagement cortex_v3 · 14 blocks · 3 derivative SoCs

1 / 4

Phase · IP ingest · IP integrator

Same name, different things.

Three versions of alu_top live in the project tree, plus a derivative fork. A bare query for overflow_chk matches all of them. Which one is bound into cortex_apu in cortex_v3?

semantic search · embedding similarity

alu_top_v1 / overflow_chk0.91

alu_top_v2 / overflow_chk0.89

alu_top_v3 / overflow_chk0.88

cortex_lite_alu / overflow_chk0.87

tied scores · cannot disambiguate the candidates are near-identical code. The right answer requires knowing which is bound to cortex_v3, which is not in the embedding.

graph · ancestry resolution

▾ cortex_v3

▾ cortex_apu

▾ alu_top · v3 instance

● overflow_chk

○ alu_top_v1 · not bound to cortex_v3

○ alu_top_v2 · not bound to cortex_v3

○ cortex_lite_alu · derivative fork

resolved by structure ancestry path is unique; the other matches drop away because they have no edge to cortex_v3.

Phase · Integration change · Integration engineer

What breaks if I change this?

The response handshake on bus_if.sv is about to change shape. Before merging, the integrator needs the closed list of consumers — not literature about other AXI handshakes.

semantic search · related content

other AXI variants in tree0.78

prior handshake-change commits0.74

similar interface code in IP-X0.71

spec passages about handshakes0.69

debug notes mentioning bus_if0.63

this is a literature review none of these tell the integrator what actually breaks in cortex_v3. Risk: over-changes downstream, or misses a real consumer.

graph · enumerated edges of bus_if.sv

bindings12 RTL instancescortex_apu, dma_engine, axi_xbar, …

tb envs3 wrapperstb_cortex_v3 · tb_axi_smoke · tb_perf

assertions7 propertiesp_resp_2cycle, p_no_outoforder, …

derivatives2 SoCscortex_v3 (lead) · cortex_v3_lite

scenarios14 testsauthored against current protocol

closed, named, exact every binding edge in the graph is enumerated. The integrator now knows the blast radius before merging.

Phase · Pre-sign-off audit · Verification lead

What is missing?

Sign-off review on arch §4.1–§4.3. The verification lead asks which clauses have no corresponding assertion and which covergroups have no scenario hitting them. Semantic search cannot answer; it has no concept of absence.

semantic search · positive matches

§4.1.1 · assertion p_grant_excllinked

§4.1.2 · assertion p_resp_2clinked

§4.2.1 · assertion p_burst_oklinked

§4.3.4 · scenario s_axi_qoslinked

cannot enumerate absence semantic search returns documents that resemble a query. It has no way to compute "spec clauses that have no matching assertion".

graph · linked vs unlinked

§4.1.1grant exclusion✓ asserted

§4.1.2response in 2 cycles✓ asserted

§4.1.3retry behaviour · UNCOVEREDno assertion

§4.2.1burst alignment✓ asserted

§4.2.7qos arbitration · UNCOVEREDno assertion

§4.3.1error response · UNCOVEREDno assertion

cg_axi_burstburst coverageno scenario

3 unlinked clauses · 1 unscenarised covergroup by structure, not by similarity. The lead gets the audit list in one query.

Phase · Post-silicon debug · Debug engineer

Glitch in wave. Which spec clause?

On the bench, pe_done de-asserts a cycle early at t=14080ns. The debug engineer needs the governing spec clause and the RTL driver in one breath. Semantic search returns text mentions across documents with no causal chain.

semantic search · scattered hits

rtl/dma_engine.sv · pe_done0.82

spec p.183 · pe_done mentioned0.76

debug_log_2026-02.txt · pe_done0.71

tb_perf.sv · pe_done coverage0.68

no causal chain every hit mentions pe_done. None tell the engineer why the de-assertion happened a cycle early or which clause governs it.

graph · wave → rtl → spec walk

#1 wave window @ t=14080nspe_done falls one cycle early

↓

#2 rtl driver · dma_engine.sv L312pe_done <= done_pending && !stall;

↓

#3 governing clause · arch §4.1.7de-assertion held until handshake completes

causal walk across three modalities wave → rtl → spec are explicit edges in the graph, not text similarity.

Cost of running an agent

Up to 10× lower token cost than flat-context agents.

Across the industry, teams that adopted general-purpose coding agents over the last year have begun to report that their token spend grew exponentially with engagement size, and that an annual agent budget could be consumed in a single quarter. The cause is structural rather than incidental. A flat-context agent reloads, re-flattens and re-explains the codebase to itself on every session, so the same hierarchy is paid for again and again in tokens. Sutra holds the hierarchy and its cross-references as a persistent knowledge graph, retrieved by structure rather than by string. On multi-block engagements the difference compounds across sessions and reaches an order of magnitude.

≤ 10×

lower token cost on multi-block engagements vs flat-context coding agents

measuredon real multi-block workloads basispersisted knowledge graph

Hierarchy as substrate

The design hierarchy is the first-class object. Spec, RTL, waveform and log hang off the same nodes and are looked up by structure rather than re-flattened into a context window.

Persistent knowledge graph

Context is held across sessions and across engineers. The model does not pay tokens to rediscover what the team already established yesterday.

Structural retrieval

Each query pulls the minimum slice of the graph that is relevant to the question, rather than the largest window the model can accept. Sharper context, fewer tokens, fewer wasted iterations.

In practice Engagements stay within budget across the full pre-silicon to post-silicon cycle. Verification can run continuously rather than rationed against an exhausted token quota.

query is overflow_chk in alu_top (cortex_v3) handling signed values correctly?

flat · semantic retrieval

~24k tok 2 / 13 rel

○overflow_chk @ alu_top_v11.6k

○overflow_chk @ alu_top_v21.6k

●overflow_chk @ alu_top_v31.7k

○overflow_chk @ cortex_lite_alu1.4k

○AXI overflow snippets2.1k

○spec §3.5.1 · unsigned arith1.9k

●spec §3.5.2 · signed-overflow rule2.0k

+ 6 more noise chunks · debug logs, prior tickets, retired forks…

⚠ context poisoning likely · 11 irrelevant chunks compete for the model's attention

sutra · graph retrieval

~1.8k tok 3 / 3 rel

●alu_top.sv (v3 instance)0.9k

●spec §3.5.2 · signed-overflow rule0.6k

●cortex_apu.signed_mode_cfg0.3k

— window otherwise empty · budget free for reasoning —

✓ clean signal · selected by binding / clause / config edges

also Across a working session, a smart handoff carries the graph state forward — follow-up queries reuse the same context without rereading.

The other half

Hierarchy is the context. Discipline is what does the work.

A hierarchy-aware model that still prompts itself like a junior engineer will still answer like one. AI software-engineering agents have learned over the last year that raw model capability matters less than the curated workflow you wrap it in. The same lesson applies, with greater force, to silicon verification, where conventions are denser and "almost right" is indistinguishable from "wrong".

Sutra ships a library of verification skills: prompt scaffolds, generation harnesses and reviewer-passes, authored by experienced verification engineers and invoked at the exact stage of the flow they belong to.

a skill at every stage of the flow

Spec ingest

verification lead

Pulls test intent, corner cases and coverage targets out of the document, the way a lead reads a spec rather than the way a search engine does.

UVM scaffolding

uvm methodologist

Methodology-correct env, agents, sequencers and scoreboards. Bus-handshake templates per protocol, not boilerplate copied from another project.

Stimulus generation

verification lead

Constraints and sequences targeted at the block under test, derived from the corners the spec actually called out.

Coverage closure

coverage engineer

Missed-bin sampling heuristics and a closure-loop iteration drawn from real coverage-closure engagements.

→ verification + sign-off

Formal drafting

formal practitioner

SVA and assertions drafted from spec clauses, with vacuity and reachability checks built into the pass.

Debug

post-silicon debugger

Walks waveform → log → RTL with a senior debugger's heuristics. Lands on a line, cites the spec clause that anchors the expected behaviour.

Reviewer-pass

senior ic engineer

Lint, style and coverage review applied to every output before it ever reaches you, the way a senior signs off a junior's patch.

Each skill is owned by a domain expert and refined from real engagements.

Capabilities

Four windows. One context.

An engineer normally chases information across four open windows: spec PDF, RTL editor, waveform viewer, and simulation log. Sutra holds the four together as one queryable context. Ask a question in plain English and receive answers grounded in your actual design.

Doc-aware testcase generation

Reads your spec alongside your RTL. Writes testcases that exercise the corner the document called out, rather than generic stimulus.

↗ Stim · UVM · constraints

UVM scaffolding, hierarchy-correct

Generates env, agent, sequencer and scoreboards that match your block's boundary. No copy-paste from another project's testbench.

↗ SystemVerilog · UVM 1.2

Debug with log + wave + RTL

Ask "why does carry_out fail when op_a is signed?" and Sutra walks the waveform back, lands on the RTL line, and cites the spec clause that defines the expected behaviour.

↗ Root-cause · cited

Post-silicon correlation

The same hierarchy carries through to silicon bring-up. Lab logs land on the same nodes your testbench did, so issues are traced back to the right RTL block, not re-investigated from scratch.

↗ Pre & post-silicon

Natural-language design queries

"Where else does this signal toggle in the next 200 cycles?" is answered against the actual waveform, scoped to the right module, in seconds.

↗ Cross-window query

Reviewer-style suggestions

Inline lint and review for testbench and constraints. Flags untested branches, unused covergroups and missing assertion targets.

↗ Coverage · assertions

On-prem & air-gapped

Your design never leaves your perimeter. Self-hosted from day one. No outbound calls to anyone's API. Audited for fabless and IDM workflows.

↗ Local-first · audited

Flow-agnostic plugin

No rip-and-replace. Sutra sits next to your existing simulator, waveform viewer and bug tracker. Use what you already use.

↗ Make / Tcl / CI hooks

An end-to-end verification copilot that doesn't forget your hierarchy.

Hierarchy is the memory of VLSI work. Most tools flatten it away.

Same name, different things.

What breaks if I change this?

What is missing?

Glitch in wave. Which spec clause?

Up to 10× lower token cost than flat-context agents.

Hierarchy is the context. Discipline is what does the work.

Four windows. One context.

Doc-aware testcase generation

UVM scaffolding, hierarchy-correct

Debug with log + wave + RTL

Post-silicon correlation

Natural-language design queries

Reviewer-style suggestions

On-prem & air-gapped

Flow-agnostic plugin

Sutra sits underneath your existing flow, not in place of it.

Spec

RTL · TB

Sim · Wave

Sutra · the shared context-graph

Built for lead SoCs and every derivative that follows.

30–60% productivity gains, measurably.

Built by people who've shipped silicon.

Try Sutra on a block, IP or full SoC.

Start a conversation

The three questions we get most.