Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

MQDB Storage Format

Overview

mq-db persists documents in a fixed-size page file. Every file is split into 8192-byte pages. Page 0 is the file header, page 1 is the catalog root, and all remaining pages are used for document block data or overflow chains.

+-----------+-----------+-----------+-----------+-----------+
| Page 0    | Page 1    | Page N    | Page N+1  | Page ...  |
| FileHeader| Catalog   | BlockData | Overflow  | Free/Future|
+-----------+-----------+-----------+-----------+-----------+

Multi-page values are stored as singly linked page chains using the next_page field in the page header.

Page Layout

Each page is exactly 8192 bytes.

Page header (16 bytes)

OffsetSizeFieldTypeDescription
04page_typeu32 LE0=Free, 1=FileHeader, 2=Catalog, 3=BlockData, 4=Overflow
44checksumu32 LEWrapping sum of all page bytes except bytes 4..8
84page_idu32 LEZero-based page index
124next_pageu32 LE0 means end of chain; otherwise next page index

Page body (8176 bytes)

OffsetSizeDescription
168176Type-specific payload

File Header Page (page 0)

Page 0 always has page_type = 1 and page_id = 0.

File header body

Offset in bodySizeFieldTypeValue
04magicu32 LE0x4D514442 ("MQDB")
44versionu32 LE1
84page_sizeu32 LE8192
124num_pagesu32 LETotal pages currently in file
164catalog_start_pageu32 LEAlways 1
208156reserved[u8; 8156]All zero bytes

Catalog Pages

The catalog always starts at page 1. If the serialized catalog exceeds one page body, additional catalog pages are linked by next_page.

Page 1 (Catalog) --> Page 12 (Catalog) --> Page 18 (Catalog) --> 0

Catalog body format

OrderFieldTypeNotes
1num_entriesu32 LENumber of catalog entries
2document_idu32 LERepeated per entry
3path_presentu80 absent, 1 present
4path_lenu16 LEPresent only when path_present = 1
5pathUTF-8 bytesNot NUL-terminated
6first_block_pageu32 LEFirst page of block chain
7num_blocksu32 LENumber of serialized blocks
8zone_map_lenu32 LEByte length of encoded zone map
9zone_map[u8 * zone_map_len]Encoded zone map bytes

Block Data Pages

A document is serialized as concatenated encoded blocks. The byte stream is cut into 8176-byte chunks.

  • The first chunk is stored in a page with page_type = 3 (BlockData).
  • Continuation chunks are stored in pages with page_type = 4 (Overflow).
  • The last page in the chain has next_page = 0.
first_block_page
      |
      v
+-------------------+    +-------------------+    +-------------------+
| type=BlockData    | -> | type=Overflow     | -> | type=Overflow     |
| body bytes 0..8175|    | next chunk        |    | final chunk       |
+-------------------+    +-------------------+    +-------------------+

Unused bytes at the end of the final page body are zero-filled.

Block Wire Format

Each block is encoded independently and concatenated without separators.

OrderFieldTypeDescription
1idu32 LEBlock ID
2document_idu32 LEOwning document ID
3block_typeu8See mapping below
4preu32 LEInterval-index left boundary
5postu32 LEInterval-index right boundary
6span_presentu80 absent, 1 present
7start_lineu32 LEPresent only when span exists
8start_colu32 LEPresent only when span exists
9end_lineu32 LEPresent only when span exists
10end_colu32 LEPresent only when span exists
11content_lenu32 LEUTF-8 byte length
12content[u8 * content_len]UTF-8 bytes
13num_propsu16 LENumber of properties
14key_lenu8Repeated per property
15key[u8 * key_len]UTF-8 property name
16valuePropertyValueEncoded property value

Block type mapping

u8BlockType
0Heading
1Paragraph
2Code
3List
4TableCell
5TableRow
6TableAlign
7Blockquote
8HorizontalRule
9Html
10Yaml
11Toml
12Math
13Definition
14Footnote

PropertyValue Encoding

Each property value starts with a one-byte type tag.

TagVariantPayload
0x00Nullnone
0x01Stringu32 LE byte_len + UTF-8 bytes
0x02Inti64 LE
0x03Floatf64 LE
0x04Boolu8 (0 or 1)
0x05Arrayu16 LE count + encoded child values

Arrays are recursive: each element is another complete PropertyValue.

ZoneMap Encoding

Zone maps are encoded independently and embedded as opaque bytes inside catalog entries.

OrderFieldType
1max_heading_depthu8
2num_heading_slugsu16 LE
3heading_slug itemsu16 LE len + UTF-8 bytes
4num_heading_contentsu16 LE
5heading_content itemsu16 LE len + UTF-8 bytes
6num_code_langsu16 LE
7code_lang itemsu16 LE len + UTF-8 bytes
8num_frontmatter_keysu16 LE
9frontmatter_key itemsu16 LE len + UTF-8 bytes
10title_presentu8
11titleu16 LE len + UTF-8 bytes when present
12num_tagsu16 LE
13tag itemsu16 LE len + UTF-8 bytes

Sets are serialized as sorted UTF-8 strings for deterministic output.

Checksum Algorithm

The checksum is a simple wrapping sum over every byte in the 8192-byte page except the checksum field itself (page[4..8]).

Pseudo-code:

checksum = 0u32
for i in 0..8192:
    if i in [4, 5, 6, 7]:
        continue
    checksum = checksum.wrapping_add(page[i] as u32)

Verification recomputes the checksum and compares it to the stored checksum field.

Multi-Page Chains

Large catalog payloads and large document block streams are stored as chains.

+---------+      +---------+      +---------+
| page_id |----->| page_id |----->| page_id |
| next=42 |      | next=77 |      | next=0  |
+---------+      +---------+      +---------+

The body bytes of each page are concatenated in chain order to reconstruct the original serialized byte stream.

Atomic Write Procedure

DocumentStore::save() writes atomically using a sibling temporary file:

  1. Create path.tmp.
  2. Write page 0 file header and page 1 empty catalog.
  3. Append all document block chains.
  4. Serialize and write the final catalog chain.
  5. Close the temporary file.
  6. Rename path.tmp to path.

Because the final rename is atomic on the same filesystem, readers either observe the old file or the new complete file, never a partially written database image.