Introduction
mq-db treats Markdown documents as structured, hierarchical databases rather than plain text.
It parses Markdown into a flat block list annotated with an interval index (Nested Set / Pre-Post Order), which turns heading-hierarchy questions — “is this paragraph inside that section?” — into a single O(1) integer comparison instead of a tree walk. Documents can be queried with SQL or mq, and persisted to a compact custom page-file format with no SQLite dependency.
This project is under active development and the API may change.
Why Markdown-as-database?
Markdown files already have implicit structure — headings nest sections, code blocks carry a language, front matter carries metadata. mq-db makes that structure queryable directly:
SELECT block_type, count(*) FROM blocks GROUP BY block_type;
.h1
Both engines run against the same underlying block store, so you can pick whichever query language fits the task: SQL for joins, aggregates, and ad-hoc analysis; mq for Markdown-shaped transformations and selectors.
How it fits together
Markdown File(s)
│ CST Parser (mq-markdown)
▼
Block Tree (heading · paragraph · code · list · …)
│ Interval Index + Secondary Indexes
▼
Flat Block Vector (pre/post integers)
│
├── BitmapIndex (block_type)
├── BTreeIndex (pre / post)
├── HashIndex (content / lang / depth)
├── Zone Maps (per-document stats)
│
├── SQL Engine (sqlparser — custom native evaluator)
└── mq Engine (mq-lang evaluator)
Features
- Flat block storage — every Markdown element becomes a typed
Blockwith row-polymorphic properties - O(1) hierarchy queries — interval index (
pre/post) makes ancestor/descendant checks a single integer comparison - Three-layer secondary indexes —
BitmapIndex(block type),BTreeIndex(pre/post),HashIndex(content/lang/depth) for fast SQL predicate pushdown - Zone Maps — per-document statistics skip irrelevant files before scanning any blocks
- Dual query engines — SQL via a custom
sqlparser-based evaluator, andmqviamq-lang - DDL support —
CREATE TABLE,INSERT INTO,DROP TABLEfor in-memory custom tables - Comprehensive SQL function library — string, numeric, null-handling,
CASE, and aggregate functions comparable to a general-purpose RDBMS mq()scalar function — run an mq program against Markdown content inline in SQL- Custom page-file persistence — 8 KB fixed pages, checksums, atomic writes
- CLI + interactive REPL + TUI — full terminal experience
Keep reading in Getting Started, or jump straight to the SQL Reference if you already have a .mq-db store.
Getting Started
This section walks through installing mq-db, indexing Markdown into a store file, and querying it from the CLI, REPL, TUI, HTTP server, or the Rust library directly.
The typical workflow is:
- Index one or more Markdown files into a
.mq-dbstore file (Install, CLI). - Query the store with SQL or mq, either one-shot from the CLI, interactively in the REPL / TUI, or over HTTP via
mq-db serve. - Optionally, embed
mq-dbdirectly with the Library API instead of shelling out to the CLI.
mq-db index docs/ --recursive --output store.mq-db
mq-db sql "SELECT block_type, count(*) FROM blocks GROUP BY block_type" --db store.mq-db
Install
Using the Installation Script (Recommended)
curl -fsSL https://raw.githubusercontent.com/harehare/mq-db/main/bin/install.sh | bash
The installer will:
- Download the latest release for your platform
- Verify the binary with a SHA256 checksum
- Install to
~/.local/bin/ - Update your shell profile (bash, zsh, or fish)
After installation, restart your terminal or run:
source ~/.bashrc # or ~/.zshrc, or ~/.config/fish/config.fish
Using Cargo
cargo install mq-db
From Source
# Latest development version
cargo install --git https://github.com/harehare/mq-db.git
Supported Platforms
- Linux: x86_64, aarch64
- macOS: x86_64 (Intel), aarch64 (Apple Silicon)
- Windows: x86_64
Verify
mq-db --version
CLI
Every subcommand operates on a .mq-db store file (--db / -d, default store.mq-db). Output-producing commands accept --format / -F: table (default), json, csv, tsv, markdown, html.
mq-db --help
mq-db <command> --help
index
Index Markdown files or directories into a store file.
mq-db index docs/ --recursive --output store.mq-db
mq-db index README.md DESIGN.md
mq-db index docs/ --no-spans # omit source spans (~21 bytes/block saved)
| Flag | Description |
|---|---|
paths | Markdown files or directories to index (required) |
-o, --output <PATH> | Output store file (default store.mq-db) |
-r, --recursive | Recursively walk directories |
--no-spans | Do not store source line/column spans |
✓ docs/DESIGN.md
✓ docs/API.md
Indexed 2 files → store.mq-db
list
List all indexed documents.
mq-db list --db store.mq-db
mq-db list --db store.mq-db --format json
┌──────┬────────────────────────────────────────────────────┬────────┬──────────┐
│ ID │ Path / Title │ Blocks │ Tags │
├──────┼────────────────────────────────────────────────────┼────────┼──────────┤
│ 0 │ docs/DESIGN.md │ 142 │ │
│ 1 │ docs/API.md │ 87 │ api, v2 │
└──────┴────────────────────────────────────────────────────┴────────┴──────────┘
2 documents
sql
Run a SQL query over the store. See the Reference for the virtual schema and function library.
mq-db sql "SELECT block_type, count(*) FROM blocks GROUP BY block_type" --db store.mq-db
mq-db sql --file query.sql --db store.mq-db
mq-db sql "SELECT ..." --db store.mq-db --format json
| Flag | Description |
|---|---|
query | SQL query string (omit when using --file) |
-f, --file <PATH> | Read SQL from a file |
mq
Run an mq query over the store.
mq-db mq ".h1" --db store.mq-db
mq-db mq 'select(.code_lang == "rust")' --db store.mq-db
mq-db mq ".h1" --db store.mq-db --format markdown
repl
Interactive REPL supporting both query modes; switch with .mode.
mq-db repl --db store.mq-db --mode sql
See REPL for the full command list.
lint
Run structural lint checks (currently: a heading at the given depth immediately followed by a list).
mq-db lint --db store.mq-db --depth 2
✗ 1 violation (H2 immediately followed by list)
file heading
──────────────────────────────────────── ──────────────────────────────
docs/DESIGN.md "Quick Start"
stats
Show store-wide statistics: document/block counts, block-type distribution, code-language distribution.
mq-db stats --db store.mq-db
Documents 5
Blocks 632
Block types
────────────────────────────────────────────────────────
¶ paragraph ████████████████████░░░░ 241 (38%)
# heading ████████░░░░░░░░░░░░░░░░ 89 (14%)
{} code ███████░░░░░░░░░░░░░░░░░ 73 (12%)
• list ██████░░░░░░░░░░░░░░░░░░ 58 (9%)
show
Show the full block structure of one document by ID (see list for IDs).
mq-db show 0 --db store.mq-db
docs/DESIGN.md
title Design Document
blocks 142
pre post type content
──── ──── ──────────────── ──────────────────────────────────────────
0 141 heading H1 Design Document
2 55 heading H2 Architecture
4 21 paragraph The system is built on…
tui
Launch the interactive TUI. See TUI.
mq-db tui --db store.mq-db
serve
Start an HTTP server exposing SQL/mq query endpoints. See HTTP Server.
mq-db serve --db store.mq-db --host 0.0.0.0 --port 8080
REPL
The interactive REPL supports both query modes in a single session.
mq-db repl --db store.mq-db --mode sql
mq-db (.help for commands .quit to exit)
mode: sql (.mode mq | .mode sql)
sql> SELECT content FROM blocks WHERE block_type = 'heading' LIMIT 3;
┌──────────────────┐
│ content │
├──────────────────┤
│ Overview │
│ Architecture │
│ Query Engine │
└──────────────────┘
(3 rows)
sql> .mode mq
→ mq mode
mq> .h2
## Architecture
## Query Engine
Dot commands
| Command | Description |
|---|---|
.help | List available commands |
.mode sql | Switch to SQL query mode |
.mode mq | Switch to mq query mode |
.quit | Exit the REPL |
The initial mode can be set with --mode sql or --mode mq (default sql).
TUI
mq-db tui opens a full-screen terminal UI for browsing indexed documents and running SQL/mq queries side by side, built with ratatui.
mq-db tui --db store.mq-db
mq-db SQL Tab:switch i:input j/k:nav d/u:scroll q:quit
┌─ Documents ──────────┬─ SQL ────────────────────────────────────────────────┐
│ DESIGN.md │ SELECT block_type, count(*) FROM blocks GROUP BY b_ │
│ 142 blocks ├─ Results ────────────────────────────────────────────┤
│ API.md │ ┌─────────────┬──────────┐ │
│ 87 blocks API │ │ block_type │ count(*) │ │
│ README.md │ ├─────────────┼──────────┤ │
│ 34 blocks │ │ paragraph │ 48 │ │
└──────────────────────┴──────────────────────────────────────────────────────┘
5 docs 632 blocks 3 rows
The left pane lists indexed documents; selecting one shows its full block breakdown (type, pre/post, content preview) in the results pane. The top-right pane accepts a query in the current mode (mq or SQL); running it replaces the results pane with the query output.
Keys
| Key | Action |
|---|---|
i | Focus query input |
Esc | Blur input |
Enter | Run query |
Tab | Toggle mq / SQL mode |
j / k (or ↓ / ↑) | Navigate document list |
d / u (or PageDown / PageUp) | Scroll results down / up |
g / G | Jump results to top / bottom |
q / Ctrl+C | Quit |
Color scheme
Block types are color-coded for quick scanning (heading, paragraph, code, list, blockquote, table, frontmatter, html, math, …), using the same warm paper/ink/accent palette as the project site.
HTTP Server
mq-db serve starts an HTTP server (built on axum) exposing SQL and mq query endpoints over the indexed store.
mq-db serve --db store.mq-db # listens on 127.0.0.1:7878
mq-db serve --db store.mq-db --port 8080 # custom port
mq-db serve --db store.mq-db --host 0.0.0.0 --port 8080
Endpoints
| Method | Path | Body | Description |
|---|---|---|---|
GET | /health | — | {"status":"ok","documents":<n>} |
POST | /sql | {"query":"SELECT …"} | Execute a SQL query, returns JSON rows |
POST | /mq | {"code":".h1"} | Evaluate an mq expression, returns {"results":[…]} |
Examples
# Health check
curl http://127.0.0.1:7878/health
# SQL via HTTP
curl -s -X POST http://127.0.0.1:7878/sql \
-H 'Content-Type: application/json' \
-d '{"query":"SELECT block_type, count(*) FROM blocks GROUP BY block_type"}'
# mq via HTTP
curl -s -X POST http://127.0.0.1:7878/mq \
-H 'Content-Type: application/json' \
-d '{"code":".h1"}'
Library API (Rust)
mq-db is usable directly as a Rust library, without shelling out to the CLI.
[dependencies]
mq-db = "0.1"
#![allow(unused)]
fn main() {
use mq_db::{DocumentStore, SqlEngine, MqEngine, block::BlockType};
// ── Build in memory ──────────────────────────────────────────────────────────
let mut store = DocumentStore::new();
store.add_file("docs/DESIGN.md")?;
store.add_str("# Hello\n\n## Architecture\n\nDetails\n")?;
// Chainable query API — zone-map skip + interval scope + block predicates
let chunks = store.query()
.documents(|doc| doc.zone_maps.heading_contents.contains("Architecture"))
.under_heading("Architecture", Some(2))
.filter(|b| matches!(b.block_type, BlockType::Paragraph | BlockType::Code))
.blocks();
// SQL engine (custom sqlparser-based evaluator — no SQLite dependency)
let engine = SqlEngine::new(&store)?;
let out = engine.execute(
"SELECT content FROM blocks WHERE block_type = 'heading' ORDER BY pre"
)?;
print!("{}", out.to_table());
// mq engine
let results = MqEngine::eval_store(".h1", &store)?;
// Structural lint
let violations = store.query().lint_heading_followed_by(2, &[BlockType::List]);
// ── Persist / load ───────────────────────────────────────────────────────────
store.save("store.mq-db")?;
// Full load — all blocks read into memory, indexes built on first SqlEngine use
let store = DocumentStore::load("store.mq-db")?;
// Lazy open — catalog only; call load_all_blocks() + load_all_indexes() before SQL
let mut store = DocumentStore::open("store.mq-db")?;
store.load_all_blocks()?;
store.load_all_indexes()?;
// Catalog-only — for metadata commands (list, stats) that don't need block data
let store = DocumentStore::load_catalog_only("store.mq-db")?;
}
Loading strategies
| Function | Loads | Use for |
|---|---|---|
DocumentStore::new() | Nothing (empty, in-memory) | Building a store from scratch |
DocumentStore::load() | Catalog + all blocks + indexes | One-shot CLI queries |
DocumentStore::open() | Catalog only, lazily | Long-lived processes that defer block/index loading |
DocumentStore::load_catalog_only() | Catalog only | list / stats-style metadata commands |
When using open(), call load_all_blocks() and load_all_indexes() before running any SqlEngine query.
Query builder
store.query() returns a chainable builder that applies the same three index layers used by the SQL engine, in order:
.documents(|doc| ...)— zone-map predicate, skips whole documents.under_heading(title, depth)/ interval-scope helpers — narrows to a(pre, post)range.filter(|block| ...)— per-block predicate over the remaining candidates.blocks()— materializes the finalVec<&Block>
See Index Layers for how each layer works.
Reference
Technical reference for the SQL surface and the on-disk/in-memory data model.
- Virtual Schema — the
documentsandblockstables - Built-in Functions — mq-db-specific, string, numeric, null-handling, and aggregate functions
- DDL Statements —
CREATE TABLE,INSERT INTO,DROP TABLE, and friends - Example Queries — hierarchy extraction, structural lint, and mixed mq/SQL queries
- Block Model — the
Blockstruct and per-type properties - Index Layers — zone maps, the interval index, and secondary indexes
- Storage Format — the on-disk page-file layout
For mq language syntax itself (selectors, control flow, pattern matching, …), see the mq documentation.
Virtual Schema
The SQL engine exposes two virtual tables backed directly by the in-memory store — there is no separate schema to migrate.
SELECT id, path, title, tags FROM documents;
SELECT id, document_id, block_type, content, pre, post,
depth, lang, properties FROM blocks;
documents
| Column | Type | Description |
|---|---|---|
id | integer | Document ID (matches blocks.document_id) |
path | text | Source file path, or NULL for in-memory-only documents |
title | text | Front-matter / first-heading title, if any |
tags | text | Front-matter tags, comma-joined |
blocks
| Column | Type | Description |
|---|---|---|
id | integer | Block ID |
document_id | integer | Owning document ID |
block_type | text | 'heading', 'paragraph', 'code', 'list', 'blockquote', 'table_cell', 'table_row', 'table_align', 'yaml', 'toml', 'html', 'horizontal_rule', 'math', 'definition', 'footnote' |
content | text | Raw block content |
pre | integer | Interval-index pre-order boundary |
post | integer | Interval-index post-order boundary |
depth | integer | Heading depth (1–6); NULL/0 for non-headings |
lang | text | Code fence language, when block_type = 'code' |
properties | text | Remaining block-type-specific properties as JSON |
pre/post are the Nested-Set interval-index boundaries described in Index Layers — they encode heading hierarchy as a pure integer range, which is what the under() function operates on.
Built-in Functions
mq-db-specific
| Function | Description |
|---|---|
under(pre, post, anc_pre, anc_post) | O(1) interval ancestor check — see Index Layers |
mq(program, content) | Run an mq program against Markdown content |
json_extract(json, path) | Extract a value from a JSON string |
-- Hierarchy query: everything nested under a heading
SELECT b.block_type, b.content
FROM blocks b
WHERE under(b.pre, b.post,
(SELECT pre FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'),
(SELECT post FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'));
-- Run an mq program inline against block content
SELECT mq('.h1 | to_text', content) AS title
FROM blocks WHERE block_type = 'code' AND lang = 'markdown';
String
| Function | Description |
|---|---|
lower / upper | Case conversion |
length / len / char_length / character_length | Character count |
trim / ltrim / rtrim | Strip whitespace, or the given characters |
concat / concat_ws | Join strings (with optional separator) |
replace | Replace all occurrences of a substring |
substring / substr | Extract a substring (1-based, FROM/FOR or comma form) |
position / instr | Find the 1-based index of a substring (0 if absent) |
left / right | First/last n characters |
lpad / rpad | Pad to a fixed length |
reverse | Reverse a string |
repeat | Repeat a string n times |
initcap | Capitalize each word |
ascii / chr | Char ↔ code point |
split_part | Extract the nth delimiter-separated field |
Numeric
| Function | Description |
|---|---|
abs | Absolute value |
round / trunc / truncate | Round / truncate, with optional decimal scale |
ceil / ceiling / floor | Round up / down |
mod | Remainder |
power / pow / sqrt | Exponentiation / square root |
exp / ln | e^x / natural log |
log / log10 / log2 | Logarithm (1-arg = base 10, 2-arg = custom base) |
sign | -1 / 0 / 1 |
pi | π |
greatest / least | Max / min across arguments (ignoring NULL) |
Date/Time
| Function | Description |
|---|---|
now / current_timestamp | Current UTC date and time |
current_date | Current UTC date |
current_time | Current UTC time |
Null handling & control flow
| Function | Description |
|---|---|
coalesce / ifnull | First non-NULL argument |
nullif | NULL if the two arguments are equal |
CASE WHEN … THEN … ELSE … END | Conditional expressions |
typeof | Runtime type of a value |
Aggregates
Usable with GROUP BY:
| Function | Description |
|---|---|
count(*) / count(DISTINCT col) | Row / distinct-value count |
min / max / sum / avg | Standard aggregates |
group_concat / string_agg(expr[, sep]) | Concatenate group values (default separator ,) |
DDL Statements
mq-db supports a small set of DDL statements for defining custom in-memory tables alongside the built-in documents/blocks virtual tables. Custom tables live only for the process lifetime — they are not persisted to the .mq-db store file.
| Statement | Description |
|---|---|
CREATE TABLE name AS SELECT … | Create a custom table from a query result |
CREATE TABLE name (col TYPE, …) | Create an empty custom table with explicit schema |
INSERT INTO name VALUES (…) | Insert a row into a custom table |
DROP TABLE name | Drop a custom table |
SHOW TABLES | List all custom tables |
DESC name | Show the schema of a custom table |
Examples
# Create from a SELECT result
mq-db sql "CREATE TABLE headings AS SELECT content, depth FROM blocks WHERE block_type = 'heading'" --db store.mq-db
# Create with explicit schema, then insert
mq-db sql "CREATE TABLE notes (id TEXT, body TEXT)" --db store.mq-db
mq-db sql "INSERT INTO notes VALUES ('1', 'Hello world')" --db store.mq-db
# Inspect
mq-db sql "SHOW TABLES" --db store.mq-db
mq-db sql "DESC notes" --db store.mq-db
# Drop
mq-db sql "DROP TABLE notes" --db store.mq-db
Custom tables can be queried and joined exactly like documents/blocks:
SELECT h.content, n.body
FROM headings h
JOIN notes n ON n.id = h.content;
Example Queries
-- All text/code under a specific section (RAG extraction)
SELECT b.block_type, b.content
FROM blocks b
WHERE under(b.pre, b.post,
(SELECT pre FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'),
(SELECT post FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'))
AND b.block_type IN ('paragraph', 'code')
ORDER BY b.pre;
-- Extract H1 title from code block content via the mq() scalar function
SELECT mq('.h1 | to_text', content) AS title
FROM blocks
WHERE block_type = 'code' AND lang = 'markdown';
-- H2 headings immediately followed by a list (structural lint)
SELECT d.path, h.content AS heading
FROM blocks h
JOIN blocks nxt ON nxt.document_id = h.document_id AND nxt.pre = h.pre + 1
JOIN documents d ON d.id = h.document_id
WHERE h.block_type = 'heading' AND depth = 2 AND nxt.block_type = 'list';
-- Documents containing Python code
SELECT DISTINCT d.path
FROM documents d JOIN blocks b ON b.document_id = d.id
WHERE b.block_type = 'code' AND lang = 'python';
-- Bucket headings by depth and summarize with string/numeric functions
SELECT
CASE WHEN depth <= 1 THEN 'top-level' ELSE 'nested' END AS bucket,
count(*),
group_concat(initcap(trim(content)), ', ') AS headings
FROM blocks
WHERE block_type = 'heading'
GROUP BY CASE WHEN depth <= 1 THEN 'top-level' ELSE 'nested' END;
Mixing mq and SQL
The mq() scalar function lets a SQL query delegate per-row Markdown transformation to mq, which is convenient when a block’s content is itself a Markdown snapshot (e.g. a fenced code block containing Markdown, as in the to_text example above).
From the CLI you can also move between the two engines freely, since both mq-db sql and mq-db mq support --format markdown/--format json, so the output of one can feed a pipeline built around the other.
Block Model
Every Markdown element becomes a Block:
#![allow(unused)]
fn main() {
struct Block {
id: u32,
document_id: u32,
block_type: BlockType, // Heading, Paragraph, Code, List, …
content: String,
span: Option<Span>, // line/column for editor sync
pre: u32, // interval index pre-order
post: u32, // interval index post-order
properties: Properties, // row-polymorphic extra attributes
}
}
properties is row-polymorphic: different block_types carry different keys.
| Block type | Properties |
|---|---|
Heading | { "depth": 2, "slug": "architecture" } |
Code | { "lang": "rust", "meta": "no_run" } |
List | { "ordered": false, "level": 1, "checked": null } |
Yaml / Toml | parsed front-matter keys ("title", "tags", …) |
Block types
BlockType covers every CST node mq-markdown produces:
Heading, Paragraph, Code, List, TableCell, TableRow, TableAlign, Blockquote, HorizontalRule, Html, Yaml, Toml, Math, Definition, Footnote.
In SQL, block_type is exposed as the lowercase, snake_case string form (e.g. 'table_cell', 'horizontal_rule').
See Storage Format for the exact on-disk wire encoding of a Block and its properties.
Index Layers
mq-db applies three complementary index layers, cheapest-first:
SQL Query
│
▼
Layer 1 — Zone Maps (document skip) ────skip───▶ ✗ irrelevant docs
│ relevant docs
▼
Layer 2 — Interval Index (section scope)
│ candidate blocks
▼
Layer 3 — Secondary Indexes (block lookup)
│ BitmapIndex · BTreeIndex · HashIndex
│ (no hint ──▶ Full Scan)
▼
Result Rows
Layer 1 — Zone Maps (document-level skip)
Built once per document and stored in the .mq-db file. Checked before any block is read.
Via SQL — SqlEngine derives a skip automatically from the WHERE clause, for a single, non-JOINed SELECT ... FROM blocks:
WHERE conjunct | Skips documents where… |
|---|---|
lang = 'X' | code_languages doesn’t contain X |
depth = N (N > 0) | N exceeds max_heading_depth |
block_type = 'heading' AND content = 'X' | heading_contents has no case-insensitive match for X |
Via the Rust API — store.query().documents(|doc| ...) lets you filter on any zone-map field yourself (heading_slugs, frontmatter_keys, title, tags, …), not just the patterns SqlEngine recognizes automatically.
Layer 2 — Interval Index (section hierarchy)
Heading hierarchy is encoded as (pre, post) pairs via Pre-Post Order (Nested Set) traversal:
# Doc pre=0 · post=11
├── ## Section A pre=2 · post=7
│ ├── Paragraph pre=3 · post=4
│ └── Code pre=5 · post=6
└── ## Section B pre=8 · post=11
└── Paragraph pre=9 · post=10
A is_under B ↔ B.pre < A.pre AND A.post < B.post — O(1), no tree traversal. This is exactly what the SQL under() function and the Rust .under_heading() query-builder method check.
Layer 3 — Secondary Indexes (block-level fast lookup)
| Index | Column(s) | Structure | Complexity |
|---|---|---|---|
BitmapIndex | block_type | Inverted list per type | O(1) key + O(k) iterate |
BTreeIndex | pre, post | BTreeMap | O(log n) point, O(log n + k) range |
HashIndex | content, lang, depth | HashMap | O(1) average |
SQL predicate pushdown picks an IndexHint based on the shape of the WHERE predicate:
WHERE predicate | Index used |
|---|---|
block_type = '...' | BitmapIndex |
pre = N | BTreeIndex (point lookup) |
pre BETWEEN N AND M | BTreeIndex (range scan) |
content = '...' | HashIndex |
lang = '...' | HashIndex |
depth = N | HashIndex |
| anything else | Full scan |
MQDB Storage Format
Overview
mq-db persists documents in a fixed-size page file. Every file is split into 8192-byte pages. Page 0 is the file header, page 1 is the catalog root, and all remaining pages are used for document block data or overflow chains.
+-----------+-----------+-----------+-----------+-----------+
| Page 0 | Page 1 | Page N | Page N+1 | Page ... |
| FileHeader| Catalog | BlockData | Overflow | Free/Future|
+-----------+-----------+-----------+-----------+-----------+
Multi-page values are stored as singly linked page chains using the next_page field in the page header.
Page Layout
Each page is exactly 8192 bytes.
Page header (16 bytes)
| Offset | Size | Field | Type | Description |
|---|---|---|---|---|
| 0 | 4 | page_type | u32 LE | 0=Free, 1=FileHeader, 2=Catalog, 3=BlockData, 4=Overflow |
| 4 | 4 | checksum | u32 LE | Wrapping sum of all page bytes except bytes 4..8 |
| 8 | 4 | page_id | u32 LE | Zero-based page index |
| 12 | 4 | next_page | u32 LE | 0 means end of chain; otherwise next page index |
Page body (8176 bytes)
| Offset | Size | Description |
|---|---|---|
| 16 | 8176 | Type-specific payload |
File Header Page (page 0)
Page 0 always has page_type = 1 and page_id = 0.
File header body
| Offset in body | Size | Field | Type | Value |
|---|---|---|---|---|
| 0 | 4 | magic | u32 LE | 0x4D514442 ("MQDB") |
| 4 | 4 | version | u32 LE | 1 |
| 8 | 4 | page_size | u32 LE | 8192 |
| 12 | 4 | num_pages | u32 LE | Total pages currently in file |
| 16 | 4 | catalog_start_page | u32 LE | Always 1 |
| 20 | 8156 | reserved | [u8; 8156] | All zero bytes |
Catalog Pages
The catalog always starts at page 1. If the serialized catalog exceeds one page body, additional catalog pages are linked by next_page.
Page 1 (Catalog) --> Page 12 (Catalog) --> Page 18 (Catalog) --> 0
Catalog body format
| Order | Field | Type | Notes |
|---|---|---|---|
| 1 | num_entries | u32 LE | Number of catalog entries |
| 2 | document_id | u32 LE | Repeated per entry |
| 3 | path_present | u8 | 0 absent, 1 present |
| 4 | path_len | u16 LE | Present only when path_present = 1 |
| 5 | path | UTF-8 bytes | Not NUL-terminated |
| 6 | first_block_page | u32 LE | First page of block chain |
| 7 | num_blocks | u32 LE | Number of serialized blocks |
| 8 | zone_map_len | u32 LE | Byte length of encoded zone map |
| 9 | zone_map | [u8 * zone_map_len] | Encoded zone map bytes |
Block Data Pages
A document is serialized as concatenated encoded blocks. The byte stream is cut into 8176-byte chunks.
- The first chunk is stored in a page with
page_type = 3(BlockData). - Continuation chunks are stored in pages with
page_type = 4(Overflow). - The last page in the chain has
next_page = 0.
first_block_page
|
v
+-------------------+ +-------------------+ +-------------------+
| type=BlockData | -> | type=Overflow | -> | type=Overflow |
| body bytes 0..8175| | next chunk | | final chunk |
+-------------------+ +-------------------+ +-------------------+
Unused bytes at the end of the final page body are zero-filled.
Block Wire Format
Each block is encoded independently and concatenated without separators.
| Order | Field | Type | Description |
|---|---|---|---|
| 1 | id | u32 LE | Block ID |
| 2 | document_id | u32 LE | Owning document ID |
| 3 | block_type | u8 | See mapping below |
| 4 | pre | u32 LE | Interval-index left boundary |
| 5 | post | u32 LE | Interval-index right boundary |
| 6 | span_present | u8 | 0 absent, 1 present |
| 7 | start_line | u32 LE | Present only when span exists |
| 8 | start_col | u32 LE | Present only when span exists |
| 9 | end_line | u32 LE | Present only when span exists |
| 10 | end_col | u32 LE | Present only when span exists |
| 11 | content_len | u32 LE | UTF-8 byte length |
| 12 | content | [u8 * content_len] | UTF-8 bytes |
| 13 | num_props | u16 LE | Number of properties |
| 14 | key_len | u8 | Repeated per property |
| 15 | key | [u8 * key_len] | UTF-8 property name |
| 16 | value | PropertyValue | Encoded property value |
Block type mapping
u8 | BlockType |
|---|---|
| 0 | Heading |
| 1 | Paragraph |
| 2 | Code |
| 3 | List |
| 4 | TableCell |
| 5 | TableRow |
| 6 | TableAlign |
| 7 | Blockquote |
| 8 | HorizontalRule |
| 9 | Html |
| 10 | Yaml |
| 11 | Toml |
| 12 | Math |
| 13 | Definition |
| 14 | Footnote |
PropertyValue Encoding
Each property value starts with a one-byte type tag.
| Tag | Variant | Payload |
|---|---|---|
0x00 | Null | none |
0x01 | String | u32 LE byte_len + UTF-8 bytes |
0x02 | Int | i64 LE |
0x03 | Float | f64 LE |
0x04 | Bool | u8 (0 or 1) |
0x05 | Array | u16 LE count + encoded child values |
Arrays are recursive: each element is another complete PropertyValue.
ZoneMap Encoding
Zone maps are encoded independently and embedded as opaque bytes inside catalog entries.
| Order | Field | Type |
|---|---|---|
| 1 | max_heading_depth | u8 |
| 2 | num_heading_slugs | u16 LE |
| 3 | heading_slug items | u16 LE len + UTF-8 bytes |
| 4 | num_heading_contents | u16 LE |
| 5 | heading_content items | u16 LE len + UTF-8 bytes |
| 6 | num_code_langs | u16 LE |
| 7 | code_lang items | u16 LE len + UTF-8 bytes |
| 8 | num_frontmatter_keys | u16 LE |
| 9 | frontmatter_key items | u16 LE len + UTF-8 bytes |
| 10 | title_present | u8 |
| 11 | title | u16 LE len + UTF-8 bytes when present |
| 12 | num_tags | u16 LE |
| 13 | tag items | u16 LE len + UTF-8 bytes |
Sets are serialized as sorted UTF-8 strings for deterministic output.
Checksum Algorithm
The checksum is a simple wrapping sum over every byte in the 8192-byte page except the checksum field itself (page[4..8]).
Pseudo-code:
checksum = 0u32
for i in 0..8192:
if i in [4, 5, 6, 7]:
continue
checksum = checksum.wrapping_add(page[i] as u32)
Verification recomputes the checksum and compares it to the stored checksum field.
Multi-Page Chains
Large catalog payloads and large document block streams are stored as chains.
+---------+ +---------+ +---------+
| page_id |----->| page_id |----->| page_id |
| next=42 | | next=77 | | next=0 |
+---------+ +---------+ +---------+
The body bytes of each page are concatenated in chain order to reconstruct the original serialized byte stream.
Atomic Write Procedure
DocumentStore::save() writes atomically using a sibling temporary file:
- Create
path.tmp. - Write page 0 file header and page 1 empty catalog.
- Append all document block chains.
- Serialize and write the final catalog chain.
- Close the temporary file.
- Rename
path.tmptopath.
Because the final rename is atomic on the same filesystem, readers either observe the old file or the new complete file, never a partially written database image.