Text Embeddings¶

How to go from raw text to a populated hnsw_index. muninn ships its own GGUF embedder (muninn_embed) that runs inside SQLite with optional Metal GPU acceleration — this is the preferred path. Python-side and remote-API embedders are also supported for workflows that already have embedding code.

The three paths¶

Path	Embedder lives	Best for
Native `muninn_embed`	In muninn, via llama.cpp	New projects, SQL-native pipelines, macOS with Metal
Python batch (sentence-transformers)	Your application	Existing ML code, very large bulk ingestion
Remote API (sqlite-rembed)	OpenAI / Nomic / Cohere / Ollama	API-driven pipelines, models muninn doesn't run locally

All three produce the same float32 blob — once vectors are in the HNSW index, the downstream graph, centrality, and retrieval code is identical regardless of how they got there.

Path 1 — Native `muninn_embed` (preferred)¶

Loading a GGUF model¶

muninn stores models in a session-scoped virtual table called temp.muninn_models. Register a model once per connection:

.load ./muninn

INSERT INTO temp.muninn_models(name, model)
  SELECT 'MiniLM', muninn_embed_model('models/all-MiniLM-L6-v2.Q8_0.gguf');

SELECT name, dim FROM temp.muninn_models;

name    dim
------  ---
MiniLM  384

On macOS, all model layers are offloaded to the Metal GPU by default. Override with MUNINN_GPU_LAYERS=0 for CPU-only. See Getting Started.

Downloading a model¶

Pick one based on your language coverage, quality ceiling, and file-size budget.

mkdir -p models

# English, tiny & fast
curl -L -o models/all-MiniLM-L6-v2.Q8_0.gguf \
  https://huggingface.co/leliuga/all-MiniLM-L6-v2-GGUF/resolve/main/all-MiniLM-L6-v2.Q8_0.gguf

Model	Dims	Quant	File	Strengths
all-MiniLM-L6-v2	384	Q8_0	36 MB	Smallest, fast, English only
nomic-embed-text-v1.5	768	Q4_K_M	84 MB	Long context (8192 tok), Matryoshka
BGE-small-en-v1.5	384	Q8_0	37 MB	Strong English retrieval
BGE-M3	1024	Q4_K_M	438 MB	100+ languages
Qwen3-Embedding-8B	4096	Q4_K_M	4.7 GB	State-of-the-art retrieval quality

Find more GGUF embedding models on HuggingFace or the curated collection.

Model pooling (MEAN for BERT-family, LAST for Qwen3, etc.) is read from the GGUF metadata — muninn never hardcodes it.

Embed + index in a single statement¶

CREATE TABLE documents (id INTEGER PRIMARY KEY, content TEXT);

INSERT INTO documents(content) VALUES
  ('The quick brown fox jumps over the lazy dog'),
  ('A fast runner sprints across the field'),
  ('SQLite is a lightweight embedded database'),
  ('Vector search finds similar items by distance'),
  ('Neural networks learn patterns from data');

CREATE VIRTUAL TABLE docs_vec USING hnsw_index(
  dimensions=384, metric='cosine'
);

INSERT INTO docs_vec(rowid, vector)
  SELECT id, muninn_embed('MiniLM', content) FROM documents;

Semantic search — embed the query inline¶

SELECT d.content, round(v.distance, 4) AS dist
  FROM docs_vec v JOIN documents d ON d.id = v.rowid
  WHERE v.vector MATCH muninn_embed('MiniLM', 'fast animal')
    AND k = 3
  ORDER BY v.distance;

content                                              dist
---------------------------------------------------  ------
A fast runner sprints across the field               0.3881
The quick brown fox jumps over the lazy dog          0.5217
Neural networks learn patterns from data             0.7402

Auto-embed new rows with a trigger¶

CREATE TEMP TRIGGER docs_auto_embed AFTER INSERT ON documents
BEGIN
  INSERT INTO docs_vec(rowid, vector)
    VALUES (NEW.id, muninn_embed('MiniLM', NEW.content));
END;

INSERT INTO documents(content) VALUES ('Graph databases store relationships');
-- The trigger embedded and indexed the new row automatically.

Use TEMP triggers for model-backed embedding

Persistent triggers (CREATE TRIGGER without TEMP) are written into the database schema. Opening the database later without muninn loaded (or without that model registered) causes schema compilation to fail. TEMP triggers live only for the session and avoid this trap.

Re-embed on update¶

CREATE TEMP TRIGGER docs_auto_reembed AFTER UPDATE OF content ON documents
BEGIN
  DELETE FROM docs_vec WHERE rowid = NEW.id;
  INSERT INTO docs_vec(rowid, vector)
    VALUES (NEW.id, muninn_embed('MiniLM', NEW.content));
END;

Unloading a model¶

DELETE FROM temp.muninn_models WHERE name = 'MiniLM';

Performance notes¶

Single-text embedding throughput on M1 Pro (Metal, MiniLM, 384 dim): ~5,000 embeds/sec
muninn_embed does not batch internally — one call per row. For 100k+ document bulk ingestion, Path 2 (Python batch) can be 3–5× faster.
CPU fallback (MUNINN_GPU_LAYERS=0) is ~3× slower on Apple Silicon, but works identically.
Multiple SQLite connections can call muninn_embed concurrently; the registry is thread-safe, each connection gets its own compute context.

Path 2 — Python batch via sentence-transformers¶

Best for bulk ingestion where Python can batch thousands of texts into a single GPU forward pass.

import sqlite3, struct
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
dim = model.get_sentence_embedding_dimension()   # 384

db = sqlite3.connect("mydata.db")
db.enable_load_extension(True)
db.load_extension("./muninn")
db.enable_load_extension(False)

db.execute(f"""
    CREATE VIRTUAL TABLE IF NOT EXISTS docs_vec
    USING hnsw_index(dimensions={dim}, metric='cosine')
""")

documents = [
    (1, "The quick brown fox jumps over the lazy dog"),
    (2, "A fast runner sprints across the field"),
    (3, "SQLite is a lightweight embedded database"),
]

texts = [t for _, t in documents]
vectors = model.encode(texts, normalize_embeddings=True)

db.executemany(
    "INSERT INTO docs_vec(rowid, vector) VALUES (?, ?)",
    [(i, struct.pack(f"{dim}f", *vec.tolist()))
     for (i, _), vec in zip(documents, vectors)],
)
db.commit()

# Query-time: embed the query in Python, pass the blob
query_vec = model.encode("fast animal", normalize_embeddings=True)
query_blob = struct.pack(f"{dim}f", *query_vec.tolist())

for rowid, distance in db.execute(
    "SELECT rowid, distance FROM docs_vec "
    "WHERE vector MATCH ? AND k = 3", (query_blob,)
):
    print(rowid, f"{distance:.4f}")

Batch encoding is the win

model.encode(texts) processes many strings in a single forward pass — the speedup vs. per-row embedding scales roughly with batch size. For large ingestion, batch on the Python side even if you use muninn_embed at query time.

Path 3 — Remote API via sqlite-rembed¶

For OpenAI / Nomic / Cohere / Jina / Ollama. sqlite-rembed adds a rembed() scalar that makes one HTTP call per row. Useful for API-only models muninn can't run locally (text-embedding-3-large, Cohere Embed v3, etc.).

.load rembed0
.load ./muninn

INSERT INTO temp.rembed_clients(name, options) VALUES ('openai', 'openai');
-- Reads OPENAI_API_KEY from the environment

CREATE VIRTUAL TABLE docs_vec USING hnsw_index(dimensions=1536, metric='cosine');

INSERT INTO docs_vec(rowid, vector)
  SELECT id, rembed('openai', content) FROM documents;

One HTTP call per row

rembed() does not batch. Each row is one round-trip to the provider. For thousands of rows, run Path 2 in Python (which can batch provider APIs) and then insert the blobs.

Vector format reference¶

Every embedding path — muninn, Python, remote — must produce the same blob format:

Property	Value
Encoding	Raw little-endian IEEE 754 `float32` array
Size	`4 × dimensions` bytes
Header	none
Normalization	Recommended for `metric='cosine'`; required if you want distances in `[0, 2]`

Python (struct)Python (NumPy)Node.jsCRust

import struct
blob = struct.pack(f"{dim}f", *values)

import numpy as np
blob = np.asarray(values, dtype=np.float32).tobytes()

const blob = Buffer.from(new Float32Array(values).buffer);

float vec[384];
sqlite3_bind_blob(stmt, 1, vec, sizeof(vec), SQLITE_STATIC);

let bytes: Vec<u8> = values.iter()
    .flat_map(|v: &f32| v.to_le_bytes())
    .collect();

Choosing a model¶

Priority	Model	Dim	Why
Smallest / fastest	all-MiniLM-L6-v2	384	22M params, sub-ms on Metal
Best general English	nomic-embed-text-v1.5	768	Long context, Matryoshka-truncatable
Multilingual	BGE-M3	1024	100+ languages
Retrieval quality ceiling	Qwen3-Embedding-8B	4096	Top MTEB, 4.7 GB file

Matryoshka embeddings

nomic-embed-text-v1.5 and some BGE models support Matryoshka Representation Learning — you can truncate the output to a shorter dimension (e.g. 128 from 768) for faster search with minimal quality loss. Truncate, then re-normalize.

Combining with graph retrieval¶

Once embeddings are in the index, the full retrieval pipeline is path-agnostic:

-- Phase 1: vector seed
SELECT rowid FROM docs_vec
  WHERE vector MATCH muninn_embed('MiniLM', 'find close matches') AND k = 5;

-- Phase 2: graph expansion from seeds
SELECT node FROM graph_bfs
  WHERE edge_table = 'relationships' AND src_col = 'src' AND dst_col = 'dst'
    AND start_node = ?seed AND max_depth = 2;

-- Phase 3: centrality ranking
SELECT node, centrality FROM graph_node_betweenness
  WHERE edge_table = 'relationships' AND src_col = 'src' AND dst_col = 'dst'
    AND direction = 'both';

See GraphRAG Cookbook for the full pipeline.