Semantic Search

When you save a note, Engram doesn’t just store the text. It also generates a vector embedding — a high-dimensional numeric representation of the note’s meaning — and indexes that vector in Qdrant. Search later compares your query vector against the indexed ones to find semantically related notes.

The pipeline

note save  →  chunk  →  embed  →  Qdrant index
                          ↑
                       Voyage AI (cloud)
                       or Ollama (self-host)

Chunking — long notes are split at heading boundaries first, then large sections are word-split. Chunk size is ~2048 characters (≈512 tokens at 4 chars/token). No overlap — the chunker relies on heading hierarchy and contextualization (prepending folder/heading path to each chunk) to preserve surrounding meaning rather than text overlap.
Embedding — each chunk goes to the embedding provider, which returns a vector (1024-dim with Voyage’s voyage-4-large).
Indexing — vectors land in Qdrant with binary quantization for memory efficiency. Original full-precision vectors are kept for rescore.
Asynchronous — the embed/index step runs via Oban (background job queue). The note is saved immediately; embedding catches up within seconds.

At query time

Embed the query with the same provider (using a different model for queries — asymmetric retrieval, tighter accuracy)
Vector search in Qdrant — fast, binary-quantized similarity
Rescore the top candidates against full-precision vectors
Rerank (optional) — when RERANKER_BACKEND=jina + JINA_URL are configured, candidates are reordered via the Jina cross-encoder adapter (blends 40% vector + 60% reranker scores). Default is no reranker; raw vector ranking is returned.
Return top-K hits with highlighted excerpts

Why this works for vaults

Semantic search excels when your query and the relevant notes share meaning but not necessarily words. Examples:

Query: “what did I write about onboarding new hires” → finds notes titled “Team intake process” even though they don’t contain “onboarding”
Query: “why we picked Postgres over Mongo” → finds an old decision log titled “Database choice — DD-2024-03”

Vault-wide exact match (substring/regex over raw note text) isn’t something Engram offers — note content is encrypted at rest, so the backend can’t run lexical scans over plaintext without breaking the encryption guarantee. For literal-text search, fall back to Obsidian’s local Search pane, which runs over your on-disk .md files. The editor’s Ctrl/Cmd+F still works for find-within-an-open-note.

What you can do to improve search

Write headings. Chunking respects heading boundaries; clean hierarchy = better chunks
Use frontmatter for tags and metadata. Tags appear in embeddings and as filterable facets
Don’t paste massive code blocks. Code embeds poorly. Link to it from a markdown note instead