Graph Select¶

graph_select is a dbt-inspired node selector DSL for querying graph lineage — ancestors, descendants, depth-limited traversal, transitive closures, and set operations — in a compact text grammar. Built for dependency analysis, build-system lineage, data-pipeline impact, and dead-code detection.

If you know dbt's node selection syntax, you already know most of this.

Signature¶

Unusually for a muninn graph TVF, graph_select uses positional arguments (not WHERE-constraint syntax):

graph_select(
    edge_table TEXT,      -- source edge table
    src_col TEXT,         -- source (parent) column
    dst_col TEXT,         -- destination (child) column
    selector TEXT         -- selector DSL expression
) -> (node TEXT, depth INTEGER, direction TEXT)

Output columns:

Column	Type	Description
`node`	TEXT	A selected node ID
`depth`	INTEGER	Hop distance from the anchor node (0 for self)
`direction`	TEXT	`'self'`, `'ancestor'`, or `'descendant'`

Example graph¶

Every example below uses this dependency graph:

A   Y
 \ / \
  B   E
  |   |
  C   F
 / \
D   E

edges: A→B, Y→E, B→C, C→D, C→E, E→F

.load ./muninn

CREATE TABLE deps (src TEXT, dst TEXT);
INSERT INTO deps VALUES
  ('A','B'), ('Y','E'),
  ('B','C'),
  ('C','D'), ('C','E'),
  ('E','F');

Grammar¶

expression  := term ( SPACE term )*              -- union
term        := "not" atom                         -- complement
             | atom ( "," atom )*                 -- intersection
atom        := [ "@" ] depth_spec                 -- @ = transitive build closure
depth_spec  := [ INT "+" ] identifier [ "+" [ INT ] ]

Precedence, highest → lowest:

@ (closure) and depth prefixes/suffixes (part of an atom)
not (unary complement)
, (binary intersection)
whitespace (n-ary union)

So +A B+ is (+A) ∪ (B+), and +A,+B not C is ((+A) ∩ (+B)) ∪ (not C).

Operator table¶

Syntax	Meaning	On the example graph
`node`	Just the node	`C` →
`+node`	Node + all ancestors	`+C` →
`node+`	Node + all descendants	`C+` →
`N+node`	Depth-limited ancestors (N hops up, inclusive)	`1+C` →
`node+N`	Depth-limited descendants	`C+1` →
`N+node+M`	Both directions, depth-limited	`1+C+1` →
`+node+`	Unlimited both directions	`+C+` →
`@node`	Transitive build closure (descendants + all their ancestors)	`@C` →
`A B`	Union	`D B` →
`A,B`	Intersection	`+D,+E` → {C} (common ancestors)
`not A`	Complement (everything not in A)	`not C+` → all nodes except

Recipes¶

Ancestors — "what does X depend on?"¶

SELECT node, depth FROM graph_select('deps', 'src', 'dst', '+C');

node  depth
----  -----
C     0
B     1
A     2

Descendants — "what depends on X?"¶

SELECT node, depth FROM graph_select('deps', 'src', 'dst', 'C+');

node  depth
----  -----
C     0
D     1
E     1
F     2

Transitive build closure — "what must rebuild if X changes?"¶

SELECT node FROM graph_select('deps', 'src', 'dst', '@C');

Returns descendants of C plus all their ancestors — the full dependency set needed for a clean rebuild. Equivalent to dbt's @ selector.

node
----
A
B
C
D
E
F
Y        -- ancestor of E, which is a descendant of C

Depth-limited impact radius¶

-- Everything within 1 hop of C in both directions
SELECT node, depth, direction FROM graph_select('deps', 'src', 'dst', '1+C+1');

node  depth  direction
----  -----  ---------
C     0      self
B     1      ancestor
D     1      descendant
E     1      descendant

Common ancestors (intersection)¶

-- What is common upstream of D and E?
SELECT node FROM graph_select('deps', 'src', 'dst', '+D,+E');

node
----
A
B
C

Union¶

-- D's subgraph OR B's subgraph
SELECT DISTINCT node FROM graph_select('deps', 'src', 'dst', 'D+ B+');

Complement — "everything unrelated to X"¶

-- Nodes that are not in C's subgraph (useful for dead-code analysis)
SELECT node FROM graph_select('deps', 'src', 'dst', 'not C+');

node
----
A
B
Y

Use-case patterns¶

dbt-style data-lineage queries¶

When your deps table models a DAG of data models, graph_select answers the standard dbt questions:

Question	Selector
"Run this model and everything downstream of it"	`model_name+`
"Test upstream dependencies of this model"	`+model_name`
"What's the full closure around this set of models?"	`@model_a @model_b`
"What depends on both A and B?"	`A+,B+` (intersection)
"Everything except the staging layer"	`not staging+`

Build-system impact analysis¶

Given an edge table representing source-file #include or import relations, graph_select tells you exactly which tests to re-run when a file changes: changed_file+.

Knowledge-graph sub-graph extraction¶

For retrieval-augmented generation, you often want "the subgraph within K hops of a seed node." N+seed+M gives you that directly.

Dead-code detection¶

Union all entry points, complement the result: not (entry_point_1+ entry_point_2+ ...). Nodes in the complement are unreachable from any entry point.

Performance¶

graph_select loads the graph into memory on each call (like all scan-on-query graph TVFs). For repeated selectors against the same graph, create a graph_adjacency virtual table first — future versions may teach graph_select to read from it directly.