Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

mq-db treats Markdown documents as structured, hierarchical databases rather than plain text.

It parses Markdown into a flat block list annotated with an interval index (Nested Set / Pre-Post Order), which turns heading-hierarchy questions — “is this paragraph inside that section?” — into a single O(1) integer comparison instead of a tree walk. Documents can be queried with SQL or mq, and persisted to a compact custom page-file format with no SQLite dependency.

This project is under active development and the API may change.

Why Markdown-as-database?

Markdown files already have implicit structure — headings nest sections, code blocks carry a language, front matter carries metadata. mq-db makes that structure queryable directly:

SELECT block_type, count(*) FROM blocks GROUP BY block_type;
.h1

Both engines run against the same underlying block store, so you can pick whichever query language fits the task: SQL for joins, aggregates, and ad-hoc analysis; mq for Markdown-shaped transformations and selectors.

How it fits together

Markdown File(s)
      │  CST Parser (mq-markdown)
      ▼
Block Tree (heading · paragraph · code · list · …)
      │  Interval Index + Secondary Indexes
      ▼
Flat Block Vector (pre/post integers)
      │
      ├── BitmapIndex   (block_type)
      ├── BTreeIndex    (pre / post)
      ├── HashIndex     (content / lang / depth)
      ├── Zone Maps     (per-document stats)
      │
      ├── SQL Engine   (sqlparser — custom native evaluator)
      └── mq Engine    (mq-lang evaluator)

Features

  • Flat block storage — every Markdown element becomes a typed Block with row-polymorphic properties
  • O(1) hierarchy queries — interval index (pre/post) makes ancestor/descendant checks a single integer comparison
  • Three-layer secondary indexesBitmapIndex (block type), BTreeIndex (pre/post), HashIndex (content/lang/depth) for fast SQL predicate pushdown
  • Zone Maps — per-document statistics skip irrelevant files before scanning any blocks
  • Dual query engines — SQL via a custom sqlparser-based evaluator, and mq via mq-lang
  • DDL supportCREATE TABLE, INSERT INTO, DROP TABLE for in-memory custom tables
  • Comprehensive SQL function library — string, numeric, null-handling, CASE, and aggregate functions comparable to a general-purpose RDBMS
  • mq() scalar function — run an mq program against Markdown content inline in SQL
  • Custom page-file persistence — 8 KB fixed pages, checksums, atomic writes
  • CLI + interactive REPL + TUI — full terminal experience

Keep reading in Getting Started, or jump straight to the SQL Reference if you already have a .mq-db store.

Getting Started

This section walks through installing mq-db, indexing Markdown into a store file, and querying it from the CLI, REPL, TUI, HTTP server, or the Rust library directly.

The typical workflow is:

  1. Index one or more Markdown files into a .mq-db store file (Install, CLI).
  2. Query the store with SQL or mq, either one-shot from the CLI, interactively in the REPL / TUI, or over HTTP via mq-db serve.
  3. Optionally, embed mq-db directly with the Library API instead of shelling out to the CLI.
mq-db index docs/ --recursive --output store.mq-db
mq-db sql "SELECT block_type, count(*) FROM blocks GROUP BY block_type" --db store.mq-db

Install

curl -fsSL https://raw.githubusercontent.com/harehare/mq-db/main/bin/install.sh | bash

The installer will:

  • Download the latest release for your platform
  • Verify the binary with a SHA256 checksum
  • Install to ~/.local/bin/
  • Update your shell profile (bash, zsh, or fish)

After installation, restart your terminal or run:

source ~/.bashrc  # or ~/.zshrc, or ~/.config/fish/config.fish

Using Cargo

cargo install mq-db

From Source

# Latest development version
cargo install --git https://github.com/harehare/mq-db.git

Supported Platforms

  • Linux: x86_64, aarch64
  • macOS: x86_64 (Intel), aarch64 (Apple Silicon)
  • Windows: x86_64

Verify

mq-db --version

CLI

Every subcommand operates on a .mq-db store file (--db / -d, default store.mq-db). Output-producing commands accept --format / -F: table (default), json, csv, tsv, markdown, html.

mq-db --help
mq-db <command> --help

index

Index Markdown files or directories into a store file.

mq-db index docs/ --recursive --output store.mq-db
mq-db index README.md DESIGN.md
mq-db index docs/ --no-spans   # omit source spans (~21 bytes/block saved)
FlagDescription
pathsMarkdown files or directories to index (required)
-o, --output <PATH>Output store file (default store.mq-db)
-r, --recursiveRecursively walk directories
--no-spansDo not store source line/column spans
  ✓ docs/DESIGN.md
  ✓ docs/API.md

Indexed 2 files → store.mq-db

list

List all indexed documents.

mq-db list --db store.mq-db
mq-db list --db store.mq-db --format json
┌──────┬────────────────────────────────────────────────────┬────────┬──────────┐
│   ID │ Path / Title                                       │ Blocks │ Tags     │
├──────┼────────────────────────────────────────────────────┼────────┼──────────┤
│    0 │ docs/DESIGN.md                                     │    142 │          │
│    1 │ docs/API.md                                        │     87 │ api, v2  │
└──────┴────────────────────────────────────────────────────┴────────┴──────────┘
2 documents

sql

Run a SQL query over the store. See the Reference for the virtual schema and function library.

mq-db sql "SELECT block_type, count(*) FROM blocks GROUP BY block_type" --db store.mq-db
mq-db sql --file query.sql --db store.mq-db
mq-db sql "SELECT ..." --db store.mq-db --format json
FlagDescription
querySQL query string (omit when using --file)
-f, --file <PATH>Read SQL from a file

mq

Run an mq query over the store.

mq-db mq ".h1" --db store.mq-db
mq-db mq 'select(.code_lang == "rust")' --db store.mq-db
mq-db mq ".h1" --db store.mq-db --format markdown

repl

Interactive REPL supporting both query modes; switch with .mode.

mq-db repl --db store.mq-db --mode sql

See REPL for the full command list.

lint

Run structural lint checks (currently: a heading at the given depth immediately followed by a list).

mq-db lint --db store.mq-db --depth 2
✗  1 violation  (H2 immediately followed by list)

  file                                      heading
  ────────────────────────────────────────  ──────────────────────────────
  docs/DESIGN.md                            "Quick Start"

stats

Show store-wide statistics: document/block counts, block-type distribution, code-language distribution.

mq-db stats --db store.mq-db
  Documents  5
  Blocks     632

  Block types
  ────────────────────────────────────────────────────────
   ¶  paragraph    ████████████████████░░░░   241  (38%)
   #  heading      ████████░░░░░░░░░░░░░░░░    89  (14%)
  {}  code         ███████░░░░░░░░░░░░░░░░░    73  (12%)
   •  list         ██████░░░░░░░░░░░░░░░░░░    58   (9%)

show

Show the full block structure of one document by ID (see list for IDs).

mq-db show 0 --db store.mq-db
  docs/DESIGN.md
  title   Design Document
  blocks  142

  pre   post  type               content
  ────  ────  ────────────────   ──────────────────────────────────────────
     0   141  heading H1         Design Document
     2    55  heading H2         Architecture
     4    21  paragraph          The system is built on…

tui

Launch the interactive TUI. See TUI.

mq-db tui --db store.mq-db

serve

Start an HTTP server exposing SQL/mq query endpoints. See HTTP Server.

mq-db serve --db store.mq-db --host 0.0.0.0 --port 8080

REPL

The interactive REPL supports both query modes in a single session.

mq-db repl --db store.mq-db --mode sql
mq-db  (.help for commands  .quit to exit)
mode: sql  (.mode mq | .mode sql)

sql> SELECT content FROM blocks WHERE block_type = 'heading' LIMIT 3;
┌──────────────────┐
│ content          │
├──────────────────┤
│ Overview         │
│ Architecture     │
│ Query Engine     │
└──────────────────┘
(3 rows)

sql> .mode mq
→ mq mode
mq> .h2
## Architecture
## Query Engine

Dot commands

CommandDescription
.helpList available commands
.mode sqlSwitch to SQL query mode
.mode mqSwitch to mq query mode
.quitExit the REPL

The initial mode can be set with --mode sql or --mode mq (default sql).

TUI

mq-db tui opens a full-screen terminal UI for browsing indexed documents and running SQL/mq queries side by side, built with ratatui.

mq-db tui --db store.mq-db
 mq-db  SQL  Tab:switch  i:input  j/k:nav  d/u:scroll  q:quit
┌─ Documents ──────────┬─ SQL ────────────────────────────────────────────────┐
│ DESIGN.md            │ SELECT block_type, count(*) FROM blocks GROUP BY b_  │
│   142 blocks         ├─ Results ────────────────────────────────────────────┤
│ API.md               │ ┌─────────────┬──────────┐                           │
│   87 blocks  API     │ │ block_type  │ count(*) │                           │
│ README.md            │ ├─────────────┼──────────┤                           │
│   34 blocks          │ │ paragraph   │ 48       │                           │
└──────────────────────┴──────────────────────────────────────────────────────┘
 5 docs  632 blocks  3 rows

The left pane lists indexed documents; selecting one shows its full block breakdown (type, pre/post, content preview) in the results pane. The top-right pane accepts a query in the current mode (mq or SQL); running it replaces the results pane with the query output.

Keys

KeyAction
iFocus query input
EscBlur input
EnterRun query
TabToggle mq / SQL mode
j / k (or / )Navigate document list
d / u (or PageDown / PageUp)Scroll results down / up
g / GJump results to top / bottom
q / Ctrl+CQuit

Color scheme

Block types are color-coded for quick scanning (heading, paragraph, code, list, blockquote, table, frontmatter, html, math, …), using the same warm paper/ink/accent palette as the project site.

HTTP Server

mq-db serve starts an HTTP server (built on axum) exposing SQL and mq query endpoints over the indexed store.

mq-db serve --db store.mq-db              # listens on 127.0.0.1:7878
mq-db serve --db store.mq-db --port 8080  # custom port
mq-db serve --db store.mq-db --host 0.0.0.0 --port 8080

Endpoints

MethodPathBodyDescription
GET/health{"status":"ok","documents":<n>}
POST/sql{"query":"SELECT …"}Execute a SQL query, returns JSON rows
POST/mq{"code":".h1"}Evaluate an mq expression, returns {"results":[…]}

Examples

# Health check
curl http://127.0.0.1:7878/health

# SQL via HTTP
curl -s -X POST http://127.0.0.1:7878/sql \
  -H 'Content-Type: application/json' \
  -d '{"query":"SELECT block_type, count(*) FROM blocks GROUP BY block_type"}'

# mq via HTTP
curl -s -X POST http://127.0.0.1:7878/mq \
  -H 'Content-Type: application/json' \
  -d '{"code":".h1"}'

Library API (Rust)

mq-db is usable directly as a Rust library, without shelling out to the CLI.

[dependencies]
mq-db = "0.1"
#![allow(unused)]
fn main() {
use mq_db::{DocumentStore, SqlEngine, MqEngine, block::BlockType};

// ── Build in memory ──────────────────────────────────────────────────────────
let mut store = DocumentStore::new();
store.add_file("docs/DESIGN.md")?;
store.add_str("# Hello\n\n## Architecture\n\nDetails\n")?;

// Chainable query API — zone-map skip + interval scope + block predicates
let chunks = store.query()
    .documents(|doc| doc.zone_maps.heading_contents.contains("Architecture"))
    .under_heading("Architecture", Some(2))
    .filter(|b| matches!(b.block_type, BlockType::Paragraph | BlockType::Code))
    .blocks();

// SQL engine (custom sqlparser-based evaluator — no SQLite dependency)
let engine = SqlEngine::new(&store)?;
let out = engine.execute(
    "SELECT content FROM blocks WHERE block_type = 'heading' ORDER BY pre"
)?;
print!("{}", out.to_table());

// mq engine
let results = MqEngine::eval_store(".h1", &store)?;

// Structural lint
let violations = store.query().lint_heading_followed_by(2, &[BlockType::List]);

// ── Persist / load ───────────────────────────────────────────────────────────
store.save("store.mq-db")?;

// Full load — all blocks read into memory, indexes built on first SqlEngine use
let store = DocumentStore::load("store.mq-db")?;

// Lazy open — catalog only; call load_all_blocks() + load_all_indexes() before SQL
let mut store = DocumentStore::open("store.mq-db")?;
store.load_all_blocks()?;
store.load_all_indexes()?;

// Catalog-only — for metadata commands (list, stats) that don't need block data
let store = DocumentStore::load_catalog_only("store.mq-db")?;
}

Loading strategies

FunctionLoadsUse for
DocumentStore::new()Nothing (empty, in-memory)Building a store from scratch
DocumentStore::load()Catalog + all blocks + indexesOne-shot CLI queries
DocumentStore::open()Catalog only, lazilyLong-lived processes that defer block/index loading
DocumentStore::load_catalog_only()Catalog onlylist / stats-style metadata commands

When using open(), call load_all_blocks() and load_all_indexes() before running any SqlEngine query.

Query builder

store.query() returns a chainable builder that applies the same three index layers used by the SQL engine, in order:

  1. .documents(|doc| ...) — zone-map predicate, skips whole documents
  2. .under_heading(title, depth) / interval-scope helpers — narrows to a (pre, post) range
  3. .filter(|block| ...) — per-block predicate over the remaining candidates
  4. .blocks() — materializes the final Vec<&Block>

See Index Layers for how each layer works.

Reference

Technical reference for the SQL surface and the on-disk/in-memory data model.

For mq language syntax itself (selectors, control flow, pattern matching, …), see the mq documentation.

Virtual Schema

The SQL engine exposes two virtual tables backed directly by the in-memory store — there is no separate schema to migrate.

SELECT id, path, title, tags FROM documents;

SELECT id, document_id, block_type, content, pre, post,
       depth, lang, properties FROM blocks;

documents

ColumnTypeDescription
idintegerDocument ID (matches blocks.document_id)
pathtextSource file path, or NULL for in-memory-only documents
titletextFront-matter / first-heading title, if any
tagstextFront-matter tags, comma-joined

blocks

ColumnTypeDescription
idintegerBlock ID
document_idintegerOwning document ID
block_typetext'heading', 'paragraph', 'code', 'list', 'blockquote', 'table_cell', 'table_row', 'table_align', 'yaml', 'toml', 'html', 'horizontal_rule', 'math', 'definition', 'footnote'
contenttextRaw block content
preintegerInterval-index pre-order boundary
postintegerInterval-index post-order boundary
depthintegerHeading depth (16); NULL/0 for non-headings
langtextCode fence language, when block_type = 'code'
propertiestextRemaining block-type-specific properties as JSON

pre/post are the Nested-Set interval-index boundaries described in Index Layers — they encode heading hierarchy as a pure integer range, which is what the under() function operates on.

Built-in Functions

mq-db-specific

FunctionDescription
under(pre, post, anc_pre, anc_post)O(1) interval ancestor check — see Index Layers
mq(program, content)Run an mq program against Markdown content
json_extract(json, path)Extract a value from a JSON string
-- Hierarchy query: everything nested under a heading
SELECT b.block_type, b.content
FROM blocks b
WHERE under(b.pre, b.post,
  (SELECT pre FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'),
  (SELECT post FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'));

-- Run an mq program inline against block content
SELECT mq('.h1 | to_text', content) AS title
FROM blocks WHERE block_type = 'code' AND lang = 'markdown';

String

FunctionDescription
lower / upperCase conversion
length / len / char_length / character_lengthCharacter count
trim / ltrim / rtrimStrip whitespace, or the given characters
concat / concat_wsJoin strings (with optional separator)
replaceReplace all occurrences of a substring
substring / substrExtract a substring (1-based, FROM/FOR or comma form)
position / instrFind the 1-based index of a substring (0 if absent)
left / rightFirst/last n characters
lpad / rpadPad to a fixed length
reverseReverse a string
repeatRepeat a string n times
initcapCapitalize each word
ascii / chrChar ↔ code point
split_partExtract the nth delimiter-separated field

Numeric

FunctionDescription
absAbsolute value
round / trunc / truncateRound / truncate, with optional decimal scale
ceil / ceiling / floorRound up / down
modRemainder
power / pow / sqrtExponentiation / square root
exp / lne^x / natural log
log / log10 / log2Logarithm (1-arg = base 10, 2-arg = custom base)
sign-1 / 0 / 1
piπ
greatest / leastMax / min across arguments (ignoring NULL)

Date/Time

FunctionDescription
now / current_timestampCurrent UTC date and time
current_dateCurrent UTC date
current_timeCurrent UTC time

Null handling & control flow

FunctionDescription
coalesce / ifnullFirst non-NULL argument
nullifNULL if the two arguments are equal
CASE WHEN … THEN … ELSE … ENDConditional expressions
typeofRuntime type of a value

Aggregates

Usable with GROUP BY:

FunctionDescription
count(*) / count(DISTINCT col)Row / distinct-value count
min / max / sum / avgStandard aggregates
group_concat / string_agg(expr[, sep])Concatenate group values (default separator ,)

DDL Statements

mq-db supports a small set of DDL statements for defining custom in-memory tables alongside the built-in documents/blocks virtual tables. Custom tables live only for the process lifetime — they are not persisted to the .mq-db store file.

StatementDescription
CREATE TABLE name AS SELECT …Create a custom table from a query result
CREATE TABLE name (col TYPE, …)Create an empty custom table with explicit schema
INSERT INTO name VALUES (…)Insert a row into a custom table
DROP TABLE nameDrop a custom table
SHOW TABLESList all custom tables
DESC nameShow the schema of a custom table

Examples

# Create from a SELECT result
mq-db sql "CREATE TABLE headings AS SELECT content, depth FROM blocks WHERE block_type = 'heading'" --db store.mq-db

# Create with explicit schema, then insert
mq-db sql "CREATE TABLE notes (id TEXT, body TEXT)" --db store.mq-db
mq-db sql "INSERT INTO notes VALUES ('1', 'Hello world')" --db store.mq-db

# Inspect
mq-db sql "SHOW TABLES" --db store.mq-db
mq-db sql "DESC notes"  --db store.mq-db

# Drop
mq-db sql "DROP TABLE notes" --db store.mq-db

Custom tables can be queried and joined exactly like documents/blocks:

SELECT h.content, n.body
FROM headings h
JOIN notes n ON n.id = h.content;

Example Queries

-- All text/code under a specific section (RAG extraction)
SELECT b.block_type, b.content
FROM blocks b
WHERE under(b.pre, b.post,
  (SELECT pre FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'),
  (SELECT post FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'))
  AND b.block_type IN ('paragraph', 'code')
ORDER BY b.pre;

-- Extract H1 title from code block content via the mq() scalar function
SELECT mq('.h1 | to_text', content) AS title
FROM blocks
WHERE block_type = 'code' AND lang = 'markdown';

-- H2 headings immediately followed by a list (structural lint)
SELECT d.path, h.content AS heading
FROM blocks h
JOIN blocks nxt ON nxt.document_id = h.document_id AND nxt.pre = h.pre + 1
JOIN documents d ON d.id = h.document_id
WHERE h.block_type = 'heading' AND depth = 2 AND nxt.block_type = 'list';

-- Documents containing Python code
SELECT DISTINCT d.path
FROM documents d JOIN blocks b ON b.document_id = d.id
WHERE b.block_type = 'code' AND lang = 'python';

-- Bucket headings by depth and summarize with string/numeric functions
SELECT
  CASE WHEN depth <= 1 THEN 'top-level' ELSE 'nested' END AS bucket,
  count(*),
  group_concat(initcap(trim(content)), ', ') AS headings
FROM blocks
WHERE block_type = 'heading'
GROUP BY CASE WHEN depth <= 1 THEN 'top-level' ELSE 'nested' END;

Mixing mq and SQL

The mq() scalar function lets a SQL query delegate per-row Markdown transformation to mq, which is convenient when a block’s content is itself a Markdown snapshot (e.g. a fenced code block containing Markdown, as in the to_text example above).

From the CLI you can also move between the two engines freely, since both mq-db sql and mq-db mq support --format markdown/--format json, so the output of one can feed a pipeline built around the other.

Block Model

Every Markdown element becomes a Block:

#![allow(unused)]
fn main() {
struct Block {
    id: u32,
    document_id: u32,
    block_type: BlockType,  // Heading, Paragraph, Code, List, …
    content: String,
    span: Option<Span>,     // line/column for editor sync
    pre: u32,               // interval index pre-order
    post: u32,              // interval index post-order
    properties: Properties, // row-polymorphic extra attributes
}
}

properties is row-polymorphic: different block_types carry different keys.

Block typeProperties
Heading{ "depth": 2, "slug": "architecture" }
Code{ "lang": "rust", "meta": "no_run" }
List{ "ordered": false, "level": 1, "checked": null }
Yaml / Tomlparsed front-matter keys ("title", "tags", …)

Block types

BlockType covers every CST node mq-markdown produces:

Heading, Paragraph, Code, List, TableCell, TableRow, TableAlign, Blockquote, HorizontalRule, Html, Yaml, Toml, Math, Definition, Footnote.

In SQL, block_type is exposed as the lowercase, snake_case string form (e.g. 'table_cell', 'horizontal_rule').

See Storage Format for the exact on-disk wire encoding of a Block and its properties.

Index Layers

mq-db applies three complementary index layers, cheapest-first:

SQL Query
   │
   ▼
Layer 1 — Zone Maps (document skip) ────skip───▶ ✗ irrelevant docs
   │ relevant docs
   ▼
Layer 2 — Interval Index (section scope)
   │ candidate blocks
   ▼
Layer 3 — Secondary Indexes (block lookup)
   │  BitmapIndex · BTreeIndex · HashIndex
   │  (no hint ──▶ Full Scan)
   ▼
Result Rows

Layer 1 — Zone Maps (document-level skip)

Built once per document and stored in the .mq-db file. Checked before any block is read.

Via SQLSqlEngine derives a skip automatically from the WHERE clause, for a single, non-JOINed SELECT ... FROM blocks:

WHERE conjunctSkips documents where…
lang = 'X'code_languages doesn’t contain X
depth = N (N > 0)N exceeds max_heading_depth
block_type = 'heading' AND content = 'X'heading_contents has no case-insensitive match for X

Via the Rust APIstore.query().documents(|doc| ...) lets you filter on any zone-map field yourself (heading_slugs, frontmatter_keys, title, tags, …), not just the patterns SqlEngine recognizes automatically.

Layer 2 — Interval Index (section hierarchy)

Heading hierarchy is encoded as (pre, post) pairs via Pre-Post Order (Nested Set) traversal:

# Doc                 pre=0  · post=11
├── ## Section A      pre=2  · post=7
│   ├── Paragraph     pre=3  · post=4
│   └── Code          pre=5  · post=6
└── ## Section B      pre=8  · post=11
    └── Paragraph     pre=9  · post=10

A is_under BB.pre < A.pre AND A.post < B.postO(1), no tree traversal. This is exactly what the SQL under() function and the Rust .under_heading() query-builder method check.

Layer 3 — Secondary Indexes (block-level fast lookup)

IndexColumn(s)StructureComplexity
BitmapIndexblock_typeInverted list per typeO(1) key + O(k) iterate
BTreeIndexpre, postBTreeMapO(log n) point, O(log n + k) range
HashIndexcontent, lang, depthHashMapO(1) average

SQL predicate pushdown picks an IndexHint based on the shape of the WHERE predicate:

WHERE predicateIndex used
block_type = '...'BitmapIndex
pre = NBTreeIndex (point lookup)
pre BETWEEN N AND MBTreeIndex (range scan)
content = '...'HashIndex
lang = '...'HashIndex
depth = NHashIndex
anything elseFull scan

MQDB Storage Format

Overview

mq-db persists documents in a fixed-size page file. Every file is split into 8192-byte pages. Page 0 is the file header, page 1 is the catalog root, and all remaining pages are used for document block data or overflow chains.

+-----------+-----------+-----------+-----------+-----------+
| Page 0    | Page 1    | Page N    | Page N+1  | Page ...  |
| FileHeader| Catalog   | BlockData | Overflow  | Free/Future|
+-----------+-----------+-----------+-----------+-----------+

Multi-page values are stored as singly linked page chains using the next_page field in the page header.

Page Layout

Each page is exactly 8192 bytes.

Page header (16 bytes)

OffsetSizeFieldTypeDescription
04page_typeu32 LE0=Free, 1=FileHeader, 2=Catalog, 3=BlockData, 4=Overflow
44checksumu32 LEWrapping sum of all page bytes except bytes 4..8
84page_idu32 LEZero-based page index
124next_pageu32 LE0 means end of chain; otherwise next page index

Page body (8176 bytes)

OffsetSizeDescription
168176Type-specific payload

File Header Page (page 0)

Page 0 always has page_type = 1 and page_id = 0.

File header body

Offset in bodySizeFieldTypeValue
04magicu32 LE0x4D514442 ("MQDB")
44versionu32 LE1
84page_sizeu32 LE8192
124num_pagesu32 LETotal pages currently in file
164catalog_start_pageu32 LEAlways 1
208156reserved[u8; 8156]All zero bytes

Catalog Pages

The catalog always starts at page 1. If the serialized catalog exceeds one page body, additional catalog pages are linked by next_page.

Page 1 (Catalog) --> Page 12 (Catalog) --> Page 18 (Catalog) --> 0

Catalog body format

OrderFieldTypeNotes
1num_entriesu32 LENumber of catalog entries
2document_idu32 LERepeated per entry
3path_presentu80 absent, 1 present
4path_lenu16 LEPresent only when path_present = 1
5pathUTF-8 bytesNot NUL-terminated
6first_block_pageu32 LEFirst page of block chain
7num_blocksu32 LENumber of serialized blocks
8zone_map_lenu32 LEByte length of encoded zone map
9zone_map[u8 * zone_map_len]Encoded zone map bytes

Block Data Pages

A document is serialized as concatenated encoded blocks. The byte stream is cut into 8176-byte chunks.

  • The first chunk is stored in a page with page_type = 3 (BlockData).
  • Continuation chunks are stored in pages with page_type = 4 (Overflow).
  • The last page in the chain has next_page = 0.
first_block_page
      |
      v
+-------------------+    +-------------------+    +-------------------+
| type=BlockData    | -> | type=Overflow     | -> | type=Overflow     |
| body bytes 0..8175|    | next chunk        |    | final chunk       |
+-------------------+    +-------------------+    +-------------------+

Unused bytes at the end of the final page body are zero-filled.

Block Wire Format

Each block is encoded independently and concatenated without separators.

OrderFieldTypeDescription
1idu32 LEBlock ID
2document_idu32 LEOwning document ID
3block_typeu8See mapping below
4preu32 LEInterval-index left boundary
5postu32 LEInterval-index right boundary
6span_presentu80 absent, 1 present
7start_lineu32 LEPresent only when span exists
8start_colu32 LEPresent only when span exists
9end_lineu32 LEPresent only when span exists
10end_colu32 LEPresent only when span exists
11content_lenu32 LEUTF-8 byte length
12content[u8 * content_len]UTF-8 bytes
13num_propsu16 LENumber of properties
14key_lenu8Repeated per property
15key[u8 * key_len]UTF-8 property name
16valuePropertyValueEncoded property value

Block type mapping

u8BlockType
0Heading
1Paragraph
2Code
3List
4TableCell
5TableRow
6TableAlign
7Blockquote
8HorizontalRule
9Html
10Yaml
11Toml
12Math
13Definition
14Footnote

PropertyValue Encoding

Each property value starts with a one-byte type tag.

TagVariantPayload
0x00Nullnone
0x01Stringu32 LE byte_len + UTF-8 bytes
0x02Inti64 LE
0x03Floatf64 LE
0x04Boolu8 (0 or 1)
0x05Arrayu16 LE count + encoded child values

Arrays are recursive: each element is another complete PropertyValue.

ZoneMap Encoding

Zone maps are encoded independently and embedded as opaque bytes inside catalog entries.

OrderFieldType
1max_heading_depthu8
2num_heading_slugsu16 LE
3heading_slug itemsu16 LE len + UTF-8 bytes
4num_heading_contentsu16 LE
5heading_content itemsu16 LE len + UTF-8 bytes
6num_code_langsu16 LE
7code_lang itemsu16 LE len + UTF-8 bytes
8num_frontmatter_keysu16 LE
9frontmatter_key itemsu16 LE len + UTF-8 bytes
10title_presentu8
11titleu16 LE len + UTF-8 bytes when present
12num_tagsu16 LE
13tag itemsu16 LE len + UTF-8 bytes

Sets are serialized as sorted UTF-8 strings for deterministic output.

Checksum Algorithm

The checksum is a simple wrapping sum over every byte in the 8192-byte page except the checksum field itself (page[4..8]).

Pseudo-code:

checksum = 0u32
for i in 0..8192:
    if i in [4, 5, 6, 7]:
        continue
    checksum = checksum.wrapping_add(page[i] as u32)

Verification recomputes the checksum and compares it to the stored checksum field.

Multi-Page Chains

Large catalog payloads and large document block streams are stored as chains.

+---------+      +---------+      +---------+
| page_id |----->| page_id |----->| page_id |
| next=42 |      | next=77 |      | next=0  |
+---------+      +---------+      +---------+

The body bytes of each page are concatenated in chain order to reconstruct the original serialized byte stream.

Atomic Write Procedure

DocumentStore::save() writes atomically using a sibling temporary file:

  1. Create path.tmp.
  2. Write page 0 file header and page 1 empty catalog.
  3. Append all document block chains.
  4. Serialize and write the final catalog chain.
  5. Close the temporary file.
  6. Rename path.tmp to path.

Because the final rename is atomic on the same filesystem, readers either observe the old file or the new complete file, never a partially written database image.