Introduction

mq-db treats Markdown documents as structured, hierarchical databases rather than plain text.

It parses Markdown into a flat block list annotated with an interval index (Nested Set / Pre-Post Order), which turns heading-hierarchy questions — “is this paragraph inside that section?” — into a single O(1) integer comparison instead of a tree walk. Documents can be queried with SQL or mq, and persisted to a compact custom page-file format with no SQLite dependency.

This project is under active development and the API may change.

Why Markdown-as-database?

Markdown files already have implicit structure — headings nest sections, code blocks carry a language, front matter carries metadata. mq-db makes that structure queryable directly:

SELECT block_type, count(*) FROM blocks GROUP BY block_type;

.h1

Both engines run against the same underlying block store, so you can pick whichever query language fits the task: SQL for joins, aggregates, and ad-hoc analysis; mq for Markdown-shaped transformations and selectors.

How it fits together

Markdown File(s)
      │  CST Parser (mq-markdown)
      ▼
Block Tree (heading · paragraph · code · list · …)
      │  Interval Index + Secondary Indexes
      ▼
Flat Block Vector (pre/post integers)
      │
      ├── BitmapIndex   (block_type)
      ├── BTreeIndex    (pre / post)
      ├── HashIndex     (content / lang / depth)
      ├── Zone Maps     (per-document stats)
      │
      ├── SQL Engine   (sqlparser — custom native evaluator)
      └── mq Engine    (mq-lang evaluator)

Features

Flat block storage — every Markdown element becomes a typed Block with row-polymorphic properties
O(1) hierarchy queries — interval index (pre/post) makes ancestor/descendant checks a single integer comparison
Three-layer secondary indexes — BitmapIndex (block type), BTreeIndex (pre/post), HashIndex (content/lang/depth) for fast SQL predicate pushdown
Zone Maps — per-document statistics skip irrelevant files before scanning any blocks
Dual query engines — SQL via a custom sqlparser-based evaluator, and mq via mq-lang
DDL support — CREATE TABLE, INSERT INTO, DROP TABLE for in-memory custom tables
Comprehensive SQL function library — string, numeric, null-handling, CASE, and aggregate functions comparable to a general-purpose RDBMS
mq() scalar function — run an mq program against Markdown content inline in SQL
Custom page-file persistence — 8 KB fixed pages, checksums, atomic writes
CLI + interactive REPL + TUI — full terminal experience

Keep reading in Getting Started, or jump straight to the SQL Reference if you already have a .mq-db store.

Getting Started

This section walks through installing mq-db, indexing Markdown into a store file, and querying it from the CLI, REPL, TUI, HTTP server, or the Rust library directly.

The typical workflow is:

Index one or more Markdown files into a .mq-db store file (Install, CLI).
Query the store with SQL or mq, either one-shot from the CLI, interactively in the REPL / TUI, or over HTTP via mq-db serve.
Optionally, embed mq-db directly with the Library API instead of shelling out to the CLI.

mq-db index docs/ --recursive --output store.mq-db
mq-db sql "SELECT block_type, count(*) FROM blocks GROUP BY block_type" --db store.mq-db

Install

Using the Installation Script (Recommended)

curl -fsSL https://raw.githubusercontent.com/harehare/mq-db/main/bin/install.sh | bash

The installer will:

Download the latest release for your platform
Verify the binary with a SHA256 checksum
Install to ~/.local/bin/
Update your shell profile (bash, zsh, or fish)

After installation, restart your terminal or run:

source ~/.bashrc  # or ~/.zshrc, or ~/.config/fish/config.fish

Using Cargo

cargo install mq-db

From Source

# Latest development version
cargo install --git https://github.com/harehare/mq-db.git

Supported Platforms

Linux: x86_64, aarch64
macOS: x86_64 (Intel), aarch64 (Apple Silicon)
Windows: x86_64

Verify

mq-db --version

CLI

Every subcommand operates on a .mq-db store file (--db / -d, default store.mq-db). Output-producing commands accept --format / -F: table (default), json, csv, tsv, markdown, html.

mq-db --help
mq-db <command> --help

`index`

Index Markdown files or directories into a store file.

mq-db index docs/ --recursive --output store.mq-db
mq-db index README.md DESIGN.md
mq-db index docs/ --no-spans   # omit source spans (~21 bytes/block saved)

Flag	Description
`paths`	Markdown files or directories to index (required)
`-o, --output <PATH>`	Output store file (default `store.mq-db`)
`-r, --recursive`	Recursively walk directories
`--no-spans`	Do not store source line/column spans

  ✓ docs/DESIGN.md
  ✓ docs/API.md

Indexed 2 files → store.mq-db

`list`

List all indexed documents.

mq-db list --db store.mq-db
mq-db list --db store.mq-db --format json

┌──────┬────────────────────────────────────────────────────┬────────┬──────────┐
│   ID │ Path / Title                                       │ Blocks │ Tags     │
├──────┼────────────────────────────────────────────────────┼────────┼──────────┤
│    0 │ docs/DESIGN.md                                     │    142 │          │
│    1 │ docs/API.md                                        │     87 │ api, v2  │
└──────┴────────────────────────────────────────────────────┴────────┴──────────┘
2 documents

`sql`

Run a SQL query over the store. See the Reference for the virtual schema and function library.

mq-db sql "SELECT block_type, count(*) FROM blocks GROUP BY block_type" --db store.mq-db
mq-db sql --file query.sql --db store.mq-db
mq-db sql "SELECT ..." --db store.mq-db --format json

Flag	Description
`query`	SQL query string (omit when using `--file`)
`-f, --file <PATH>`	Read SQL from a file

`mq`

Run an mq query over the store.

mq-db mq ".h1" --db store.mq-db
mq-db mq 'select(.code_lang == "rust")' --db store.mq-db
mq-db mq ".h1" --db store.mq-db --format markdown

`repl`

Interactive REPL supporting both query modes; switch with .mode.

mq-db repl --db store.mq-db --mode sql

See REPL for the full command list.

`lint`

Run structural lint checks (currently: a heading at the given depth immediately followed by a list).

mq-db lint --db store.mq-db --depth 2

✗  1 violation  (H2 immediately followed by list)

  file                                      heading
  ────────────────────────────────────────  ──────────────────────────────
  docs/DESIGN.md                            "Quick Start"

`stats`

Show store-wide statistics: document/block counts, block-type distribution, code-language distribution.

mq-db stats --db store.mq-db

  Documents  5
  Blocks     632

  Block types
  ────────────────────────────────────────────────────────
   ¶  paragraph    ████████████████████░░░░   241  (38%)
   #  heading      ████████░░░░░░░░░░░░░░░░    89  (14%)
  {}  code         ███████░░░░░░░░░░░░░░░░░    73  (12%)
   •  list         ██████░░░░░░░░░░░░░░░░░░    58   (9%)

`show`

Show the full block structure of one document by ID (see list for IDs).

mq-db show 0 --db store.mq-db

  docs/DESIGN.md
  title   Design Document
  blocks  142

  pre   post  type               content
  ────  ────  ────────────────   ──────────────────────────────────────────
     0   141  heading H1         Design Document
     2    55  heading H2         Architecture
     4    21  paragraph          The system is built on…

`tui`

Launch the interactive TUI. See TUI.

mq-db tui --db store.mq-db

`serve`

Start an HTTP server exposing SQL/mq query endpoints. See HTTP Server.

mq-db serve --db store.mq-db --host 0.0.0.0 --port 8080

REPL

The interactive REPL supports both query modes in a single session.

mq-db repl --db store.mq-db --mode sql

mq-db  (.help for commands  .quit to exit)
mode: sql  (.mode mq | .mode sql)

sql> SELECT content FROM blocks WHERE block_type = 'heading' LIMIT 3;
┌──────────────────┐
│ content          │
├──────────────────┤
│ Overview         │
│ Architecture     │
│ Query Engine     │
└──────────────────┘
(3 rows)

sql> .mode mq
→ mq mode
mq> .h2
## Architecture
## Query Engine

Dot commands

Command	Description
`.help`	List available commands
`.mode sql`	Switch to SQL query mode
`.mode mq`	Switch to mq query mode
`.quit`	Exit the REPL

The initial mode can be set with --mode sql or --mode mq (default sql).

TUI

mq-db tui opens a full-screen terminal UI for browsing indexed documents and running SQL/mq queries side by side, built with ratatui.

mq-db tui --db store.mq-db

 mq-db  SQL  Tab:switch  i:input  j/k:nav  d/u:scroll  q:quit
┌─ Documents ──────────┬─ SQL ────────────────────────────────────────────────┐
│ DESIGN.md            │ SELECT block_type, count(*) FROM blocks GROUP BY b_  │
│   142 blocks         ├─ Results ────────────────────────────────────────────┤
│ API.md               │ ┌─────────────┬──────────┐                           │
│   87 blocks  API     │ │ block_type  │ count(*) │                           │
│ README.md            │ ├─────────────┼──────────┤                           │
│   34 blocks          │ │ paragraph   │ 48       │                           │
└──────────────────────┴──────────────────────────────────────────────────────┘
 5 docs  632 blocks  3 rows

The left pane lists indexed documents; selecting one shows its full block breakdown (type, pre/post, content preview) in the results pane. The top-right pane accepts a query in the current mode (mq or SQL); running it replaces the results pane with the query output.

Keys

Key	Action
`i`	Focus query input
`Esc`	Blur input
`Enter`	Run query
`Tab`	Toggle mq / SQL mode
`j` / `k` (or `↓` / `↑`)	Navigate document list
`d` / `u` (or PageDown / PageUp)	Scroll results down / up
`g` / `G`	Jump results to top / bottom
`q` / `Ctrl+C`	Quit

Color scheme

Block types are color-coded for quick scanning (heading, paragraph, code, list, blockquote, table, frontmatter, html, math, …), using the same warm paper/ink/accent palette as the project site.

HTTP Server

mq-db serve starts an HTTP server (built on axum) exposing SQL and mq query endpoints over the indexed store.

mq-db serve --db store.mq-db              # listens on 127.0.0.1:7878
mq-db serve --db store.mq-db --port 8080  # custom port
mq-db serve --db store.mq-db --host 0.0.0.0 --port 8080

Endpoints

Method	Path	Body	Description
`GET`	`/health`	—	`{"status":"ok","documents":<n>}`
`POST`	`/sql`	`{"query":"SELECT …"}`	Execute a SQL query, returns JSON rows
`POST`	`/mq`	`{"code":".h1"}`	Evaluate an mq expression, returns `{"results":[…]}`

Examples

# Health check
curl http://127.0.0.1:7878/health

# SQL via HTTP
curl -s -X POST http://127.0.0.1:7878/sql \
  -H 'Content-Type: application/json' \
  -d '{"query":"SELECT block_type, count(*) FROM blocks GROUP BY block_type"}'

# mq via HTTP
curl -s -X POST http://127.0.0.1:7878/mq \
  -H 'Content-Type: application/json' \
  -d '{"code":".h1"}'

Library API (Rust)

mq-db is usable directly as a Rust library, without shelling out to the CLI.

[dependencies]
mq-db = "0.1"

#![allow(unused)]
fn main() {
use mq_db::{DocumentStore, SqlEngine, MqEngine, block::BlockType};

// ── Build in memory ──────────────────────────────────────────────────────────
let mut store = DocumentStore::new();
store.add_file("docs/DESIGN.md")?;
store.add_str("# Hello\n\n## Architecture\n\nDetails\n")?;

// Chainable query API — zone-map skip + interval scope + block predicates
let chunks = store.query()
    .documents(|doc| doc.zone_maps.heading_contents.contains("Architecture"))
    .under_heading("Architecture", Some(2))
    .filter(|b| matches!(b.block_type, BlockType::Paragraph | BlockType::Code))
    .blocks();

// SQL engine (custom sqlparser-based evaluator — no SQLite dependency)
let engine = SqlEngine::new(&store)?;
let out = engine.execute(
    "SELECT content FROM blocks WHERE block_type = 'heading' ORDER BY pre"
)?;
print!("{}", out.to_table());

// mq engine
let results = MqEngine::eval_store(".h1", &store)?;

// Structural lint
let violations = store.query().lint_heading_followed_by(2, &[BlockType::List]);

// ── Persist / load ───────────────────────────────────────────────────────────
store.save("store.mq-db")?;

// Full load — all blocks read into memory, indexes built on first SqlEngine use
let store = DocumentStore::load("store.mq-db")?;

// Lazy open — catalog only; call load_all_blocks() + load_all_indexes() before SQL
let mut store = DocumentStore::open("store.mq-db")?;
store.load_all_blocks()?;
store.load_all_indexes()?;

// Catalog-only — for metadata commands (list, stats) that don't need block data
let store = DocumentStore::load_catalog_only("store.mq-db")?;
}

Loading strategies

Function	Loads	Use for
`DocumentStore::new()`	Nothing (empty, in-memory)	Building a store from scratch
`DocumentStore::load()`	Catalog + all blocks + indexes	One-shot CLI queries
`DocumentStore::open()`	Catalog only, lazily	Long-lived processes that defer block/index loading
`DocumentStore::load_catalog_only()`	Catalog only	`list` / `stats`-style metadata commands

When using open(), call load_all_blocks() and load_all_indexes() before running any SqlEngine query.

Query builder

store.query() returns a chainable builder that applies the same three index layers used by the SQL engine, in order:

.documents(|doc| ...) — zone-map predicate, skips whole documents
.under_heading(title, depth) / interval-scope helpers — narrows to a (pre, post) range
.filter(|block| ...) — per-block predicate over the remaining candidates
.blocks() — materializes the final Vec<&Block>

See Index Layers for how each layer works.

Reference

Technical reference for the SQL surface and the on-disk/in-memory data model.

Virtual Schema — the documents and blocks tables
Built-in Functions — mq-db-specific, string, numeric, null-handling, and aggregate functions
DDL Statements — CREATE TABLE, INSERT INTO, DROP TABLE, and friends
Example Queries — hierarchy extraction, structural lint, and mixed mq/SQL queries
Block Model — the Block struct and per-type properties
Index Layers — zone maps, the interval index, and secondary indexes
Storage Format — the on-disk page-file layout

For mq language syntax itself (selectors, control flow, pattern matching, …), see the mq documentation.

Virtual Schema

The SQL engine exposes two virtual tables backed directly by the in-memory store — there is no separate schema to migrate.

SELECT id, path, title, tags FROM documents;

SELECT id, document_id, block_type, content, pre, post,
       depth, lang, properties FROM blocks;

`documents`

Column	Type	Description
`id`	integer	Document ID (matches `blocks.document_id`)
`path`	text	Source file path, or `NULL` for in-memory-only documents
`title`	text	Front-matter / first-heading title, if any
`tags`	text	Front-matter tags, comma-joined

`blocks`

Column	Type	Description
`id`	integer	Block ID
`document_id`	integer	Owning document ID
`block_type`	text	`'heading'`, `'paragraph'`, `'code'`, `'list'`, `'blockquote'`, `'table_cell'`, `'table_row'`, `'table_align'`, `'yaml'`, `'toml'`, `'html'`, `'horizontal_rule'`, `'math'`, `'definition'`, `'footnote'`
`content`	text	Raw block content
`pre`	integer	Interval-index pre-order boundary
`post`	integer	Interval-index post-order boundary
`depth`	integer	Heading depth (`1`–`6`); `NULL`/`0` for non-headings
`lang`	text	Code fence language, when `block_type = 'code'`
`properties`	text	Remaining block-type-specific properties as JSON

pre/post are the Nested-Set interval-index boundaries described in Index Layers — they encode heading hierarchy as a pure integer range, which is what the under() function operates on.

Built-in Functions

mq-db-specific

Function	Description
`under(pre, post, anc_pre, anc_post)`	`O(1)` interval ancestor check — see Index Layers
`mq(program, content)`	Run an mq program against Markdown content
`json_extract(json, path)`	Extract a value from a JSON string

-- Hierarchy query: everything nested under a heading
SELECT b.block_type, b.content
FROM blocks b
WHERE under(b.pre, b.post,
  (SELECT pre FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'),
  (SELECT post FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'));

-- Run an mq program inline against block content
SELECT mq('.h1 | to_text', content) AS title
FROM blocks WHERE block_type = 'code' AND lang = 'markdown';

String

Function	Description
`lower` / `upper`	Case conversion
`length` / `len` / `char_length` / `character_length`	Character count
`trim` / `ltrim` / `rtrim`	Strip whitespace, or the given characters
`concat` / `concat_ws`	Join strings (with optional separator)
`replace`	Replace all occurrences of a substring
`substring` / `substr`	Extract a substring (1-based, `FROM`/`FOR` or comma form)
`position` / `instr`	Find the 1-based index of a substring (`0` if absent)
`left` / `right`	First/last `n` characters
`lpad` / `rpad`	Pad to a fixed length
`reverse`	Reverse a string
`repeat`	Repeat a string `n` times
`initcap`	Capitalize each word
`ascii` / `chr`	Char ↔ code point
`split_part`	Extract the nth delimiter-separated field

Numeric

Function	Description
`abs`	Absolute value
`round` / `trunc` / `truncate`	Round / truncate, with optional decimal scale
`ceil` / `ceiling` / `floor`	Round up / down
`mod`	Remainder
`power` / `pow` / `sqrt`	Exponentiation / square root
`exp` / `ln`	`e^x` / natural log
`log` / `log10` / `log2`	Logarithm (1-arg = base 10, 2-arg = custom base)
`sign`	`-1` / `0` / `1`
`pi`	π
`greatest` / `least`	Max / min across arguments (ignoring `NULL`)

Date/Time

Function	Description
`now` / `current_timestamp`	Current UTC date and time
`current_date`	Current UTC date
`current_time`	Current UTC time

Null handling & control flow

Function	Description
`coalesce` / `ifnull`	First non-`NULL` argument
`nullif`	`NULL` if the two arguments are equal
`CASE WHEN … THEN … ELSE … END`	Conditional expressions
`typeof`	Runtime type of a value

Aggregates

Usable with GROUP BY:

Function	Description
`count(*)` / `count(DISTINCT col)`	Row / distinct-value count
`min` / `max` / `sum` / `avg`	Standard aggregates
`group_concat` / `string_agg(expr[, sep])`	Concatenate group values (default separator `,`)

DDL Statements

mq-db supports a small set of DDL statements for defining custom in-memory tables alongside the built-in documents/blocks virtual tables. Custom tables live only for the process lifetime — they are not persisted to the .mq-db store file.

Statement	Description
`CREATE TABLE name AS SELECT …`	Create a custom table from a query result
`CREATE TABLE name (col TYPE, …)`	Create an empty custom table with explicit schema
`INSERT INTO name VALUES (…)`	Insert a row into a custom table
`DROP TABLE name`	Drop a custom table
`SHOW TABLES`	List all custom tables
`DESC name`	Show the schema of a custom table

Examples

# Create from a SELECT result
mq-db sql "CREATE TABLE headings AS SELECT content, depth FROM blocks WHERE block_type = 'heading'" --db store.mq-db

# Create with explicit schema, then insert
mq-db sql "CREATE TABLE notes (id TEXT, body TEXT)" --db store.mq-db
mq-db sql "INSERT INTO notes VALUES ('1', 'Hello world')" --db store.mq-db

# Inspect
mq-db sql "SHOW TABLES" --db store.mq-db
mq-db sql "DESC notes"  --db store.mq-db

# Drop
mq-db sql "DROP TABLE notes" --db store.mq-db

Custom tables can be queried and joined exactly like documents/blocks:

SELECT h.content, n.body
FROM headings h
JOIN notes n ON n.id = h.content;

Example Queries

-- All text/code under a specific section (RAG extraction)
SELECT b.block_type, b.content
FROM blocks b
WHERE under(b.pre, b.post,
  (SELECT pre FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'),
  (SELECT post FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'))
  AND b.block_type IN ('paragraph', 'code')
ORDER BY b.pre;

-- Extract H1 title from code block content via the mq() scalar function
SELECT mq('.h1 | to_text', content) AS title
FROM blocks
WHERE block_type = 'code' AND lang = 'markdown';

-- H2 headings immediately followed by a list (structural lint)
SELECT d.path, h.content AS heading
FROM blocks h
JOIN blocks nxt ON nxt.document_id = h.document_id AND nxt.pre = h.pre + 1
JOIN documents d ON d.id = h.document_id
WHERE h.block_type = 'heading' AND depth = 2 AND nxt.block_type = 'list';

-- Documents containing Python code
SELECT DISTINCT d.path
FROM documents d JOIN blocks b ON b.document_id = d.id
WHERE b.block_type = 'code' AND lang = 'python';

-- Bucket headings by depth and summarize with string/numeric functions
SELECT
  CASE WHEN depth <= 1 THEN 'top-level' ELSE 'nested' END AS bucket,
  count(*),
  group_concat(initcap(trim(content)), ', ') AS headings
FROM blocks
WHERE block_type = 'heading'
GROUP BY CASE WHEN depth <= 1 THEN 'top-level' ELSE 'nested' END;

Mixing mq and SQL

The mq() scalar function lets a SQL query delegate per-row Markdown transformation to mq, which is convenient when a block’s content is itself a Markdown snapshot (e.g. a fenced code block containing Markdown, as in the to_text example above).

From the CLI you can also move between the two engines freely, since both mq-db sql and mq-db mq support --format markdown/--format json, so the output of one can feed a pipeline built around the other.

Block Model

Every Markdown element becomes a Block:

#![allow(unused)]
fn main() {
struct Block {
    id: u32,
    document_id: u32,
    block_type: BlockType,  // Heading, Paragraph, Code, List, …
    content: String,
    span: Option<Span>,     // line/column for editor sync
    pre: u32,               // interval index pre-order
    post: u32,              // interval index post-order
    properties: Properties, // row-polymorphic extra attributes
}
}

properties is row-polymorphic: different block_types carry different keys.

Block type	Properties
`Heading`	`{ "depth": 2, "slug": "architecture" }`
`Code`	`{ "lang": "rust", "meta": "no_run" }`
`List`	`{ "ordered": false, "level": 1, "checked": null }`
`Yaml` / `Toml`	parsed front-matter keys (`"title"`, `"tags"`, …)

Block types

BlockType covers every CST node mq-markdown produces:

Heading, Paragraph, Code, List, TableCell, TableRow, TableAlign, Blockquote, HorizontalRule, Html, Yaml, Toml, Math, Definition, Footnote.

In SQL, block_type is exposed as the lowercase, snake_case string form (e.g. 'table_cell', 'horizontal_rule').

See Storage Format for the exact on-disk wire encoding of a Block and its properties.

Index Layers

mq-db applies three complementary index layers, cheapest-first:

SQL Query
   │
   ▼
Layer 1 — Zone Maps (document skip) ────skip───▶ ✗ irrelevant docs
   │ relevant docs
   ▼
Layer 2 — Interval Index (section scope)
   │ candidate blocks
   ▼
Layer 3 — Secondary Indexes (block lookup)
   │  BitmapIndex · BTreeIndex · HashIndex
   │  (no hint ──▶ Full Scan)
   ▼
Result Rows

Layer 1 — Zone Maps (document-level skip)

Built once per document and stored in the .mq-db file. Checked before any block is read.

Via SQL — SqlEngine derives a skip automatically from the WHERE clause, for a single, non-JOINed SELECT ... FROM blocks:

`WHERE` conjunct	Skips documents where…
`lang = 'X'`	`code_languages` doesn’t contain `X`
`depth = N` (`N > 0`)	`N` exceeds `max_heading_depth`
`block_type = 'heading' AND content = 'X'`	`heading_contents` has no case-insensitive match for `X`

Via the Rust API — store.query().documents(|doc| ...) lets you filter on any zone-map field yourself (heading_slugs, frontmatter_keys, title, tags, …), not just the patterns SqlEngine recognizes automatically.

Layer 2 — Interval Index (section hierarchy)

Heading hierarchy is encoded as (pre, post) pairs via Pre-Post Order (Nested Set) traversal:

# Doc                 pre=0  · post=11
├── ## Section A      pre=2  · post=7
│   ├── Paragraph     pre=3  · post=4
│   └── Code          pre=5  · post=6
└── ## Section B      pre=8  · post=11
    └── Paragraph     pre=9  · post=10

A is_under B ↔ B.pre < A.pre AND A.post < B.post — O(1), no tree traversal. This is exactly what the SQL under() function and the Rust .under_heading() query-builder method check.

Layer 3 — Secondary Indexes (block-level fast lookup)

Index	Column(s)	Structure	Complexity
`BitmapIndex`	`block_type`	Inverted list per type	`O(1)` key + `O(k)` iterate
`BTreeIndex`	`pre`, `post`	`BTreeMap`	`O(log n)` point, `O(log n + k)` range
`HashIndex`	`content`, `lang`, `depth`	`HashMap`	`O(1)` average

SQL predicate pushdown picks an IndexHint based on the shape of the WHERE predicate:

`WHERE` predicate	Index used
`block_type = '...'`	`BitmapIndex`
`pre = N`	`BTreeIndex` (point lookup)
`pre BETWEEN N AND M`	`BTreeIndex` (range scan)
`content = '...'`	`HashIndex`
`lang = '...'`	`HashIndex`
`depth = N`	`HashIndex`
anything else	Full scan

MQDB Storage Format

Overview

mq-db persists documents in a fixed-size page file. Every file is split into 8192-byte pages. Page 0 is the file header, page 1 is the catalog root, and all remaining pages are used for document block data or overflow chains.

+-----------+-----------+-----------+-----------+-----------+
| Page 0    | Page 1    | Page N    | Page N+1  | Page ...  |
| FileHeader| Catalog   | BlockData | Overflow  | Free/Future|
+-----------+-----------+-----------+-----------+-----------+

Multi-page values are stored as singly linked page chains using the next_page field in the page header.

Page Layout

Each page is exactly 8192 bytes.

Offset	Size	Field	Type	Description
0	4	`page_type`	`u32 LE`	`0=Free`, `1=FileHeader`, `2=Catalog`, `3=BlockData`, `4=Overflow`
4	4	`checksum`	`u32 LE`	Wrapping sum of all page bytes except bytes `4..8`
8	4	`page_id`	`u32 LE`	Zero-based page index
12	4	`next_page`	`u32 LE`	`0` means end of chain; otherwise next page index

Page body (8176 bytes)

Offset	Size	Description
16	8176	Type-specific payload

File Header Page (page 0)

Page 0 always has page_type = 1 and page_id = 0.

File header body

Offset in body	Size	Field	Type	Value
0	4	`magic`	`u32 LE`	`0x4D514442` (`"MQDB"`)
4	4	`version`	`u32 LE`	`1`
8	4	`page_size`	`u32 LE`	`8192`
12	4	`num_pages`	`u32 LE`	Total pages currently in file
16	4	`catalog_start_page`	`u32 LE`	Always `1`
20	8156	`reserved`	`[u8; 8156]`	All zero bytes

Catalog Pages

The catalog always starts at page 1. If the serialized catalog exceeds one page body, additional catalog pages are linked by next_page.

Page 1 (Catalog) --> Page 12 (Catalog) --> Page 18 (Catalog) --> 0

Catalog body format

Order	Field	Type	Notes
1	`num_entries`	`u32 LE`	Number of catalog entries
2	`document_id`	`u32 LE`	Repeated per entry
3	`path_present`	`u8`	`0` absent, `1` present
4	`path_len`	`u16 LE`	Present only when `path_present = 1`
5	`path`	`UTF-8 bytes`	Not NUL-terminated
6	`first_block_page`	`u32 LE`	First page of block chain
7	`num_blocks`	`u32 LE`	Number of serialized blocks
8	`zone_map_len`	`u32 LE`	Byte length of encoded zone map
9	`zone_map`	`[u8 * zone_map_len]`	Encoded zone map bytes

Block Data Pages

A document is serialized as concatenated encoded blocks. The byte stream is cut into 8176-byte chunks.

The first chunk is stored in a page with page_type = 3 (BlockData).
Continuation chunks are stored in pages with page_type = 4 (Overflow).
The last page in the chain has next_page = 0.

first_block_page
      |
      v
+-------------------+    +-------------------+    +-------------------+
| type=BlockData    | -> | type=Overflow     | -> | type=Overflow     |
| body bytes 0..8175|    | next chunk        |    | final chunk       |
+-------------------+    +-------------------+    +-------------------+

Unused bytes at the end of the final page body are zero-filled.

Block Wire Format

Each block is encoded independently and concatenated without separators.

Order	Field	Type	Description
1	`id`	`u32 LE`	Block ID
2	`document_id`	`u32 LE`	Owning document ID
3	`block_type`	`u8`	See mapping below
4	`pre`	`u32 LE`	Interval-index left boundary
5	`post`	`u32 LE`	Interval-index right boundary
6	`span_present`	`u8`	`0` absent, `1` present
7	`start_line`	`u32 LE`	Present only when span exists
8	`start_col`	`u32 LE`	Present only when span exists
9	`end_line`	`u32 LE`	Present only when span exists
10	`end_col`	`u32 LE`	Present only when span exists
11	`content_len`	`u32 LE`	UTF-8 byte length
12	`content`	`[u8 * content_len]`	UTF-8 bytes
13	`num_props`	`u16 LE`	Number of properties
14	`key_len`	`u8`	Repeated per property
15	`key`	`[u8 * key_len]`	UTF-8 property name
16	`value`	`PropertyValue`	Encoded property value

Block type mapping

`u8`	BlockType
0	`Heading`
1	`Paragraph`
2	`Code`
3	`List`
4	`TableCell`
5	`TableRow`
6	`TableAlign`
7	`Blockquote`
8	`HorizontalRule`
9	`Html`
10	`Yaml`
11	`Toml`
12	`Math`
13	`Definition`
14	`Footnote`

PropertyValue Encoding

Each property value starts with a one-byte type tag.

Tag	Variant	Payload
`0x00`	`Null`	none
`0x01`	`String`	`u32 LE byte_len` + UTF-8 bytes
`0x02`	`Int`	`i64 LE`
`0x03`	`Float`	`f64 LE`
`0x04`	`Bool`	`u8` (`0` or `1`)
`0x05`	`Array`	`u16 LE count` + encoded child values

Arrays are recursive: each element is another complete PropertyValue.

ZoneMap Encoding

Zone maps are encoded independently and embedded as opaque bytes inside catalog entries.

Order	Field	Type
1	`max_heading_depth`	`u8`
2	`num_heading_slugs`	`u16 LE`
3	`heading_slug` items	`u16 LE len` + UTF-8 bytes
4	`num_heading_contents`	`u16 LE`
5	`heading_content` items	`u16 LE len` + UTF-8 bytes
6	`num_code_langs`	`u16 LE`
7	`code_lang` items	`u16 LE len` + UTF-8 bytes
8	`num_frontmatter_keys`	`u16 LE`
9	`frontmatter_key` items	`u16 LE len` + UTF-8 bytes
10	`title_present`	`u8`
11	`title`	`u16 LE len` + UTF-8 bytes when present
12	`num_tags`	`u16 LE`
13	`tag` items	`u16 LE len` + UTF-8 bytes

Sets are serialized as sorted UTF-8 strings for deterministic output.

Checksum Algorithm

The checksum is a simple wrapping sum over every byte in the 8192-byte page except the checksum field itself (page[4..8]).

Pseudo-code:

checksum = 0u32
for i in 0..8192:
    if i in [4, 5, 6, 7]:
        continue
    checksum = checksum.wrapping_add(page[i] as u32)

Verification recomputes the checksum and compares it to the stored checksum field.

Multi-Page Chains

Large catalog payloads and large document block streams are stored as chains.

+---------+      +---------+      +---------+
| page_id |----->| page_id |----->| page_id |
| next=42 |      | next=77 |      | next=0  |
+---------+      +---------+      +---------+

The body bytes of each page are concatenated in chain order to reconstruct the original serialized byte stream.

Atomic Write Procedure

DocumentStore::save() writes atomically using a sibling temporary file:

Create path.tmp.
Write page 0 file header and page 1 empty catalog.
Append all document block chains.
Serialize and write the final catalog chain.
Close the temporary file.
Rename path.tmp to path.

Because the final rename is atomic on the same filesystem, readers either observe the old file or the new complete file, never a partially written database image.

Keyboard shortcuts

mq-db - a Markdown-specialized embedded database