Database Internalsmodule 1 of 8
Module 01 ~14 min · architecture & data flow

What actually happens
when you INSERT a row?

You write one innocent line: db.Insert("Users", row). On the surface it looks trivial — but under the hood, five layers cooperate in milliseconds to make that row findable, durable, and safe from crashes. Let's trace it.

🏥 Hold this picture

Think of a busy hospital ER taking in a new patient. The row (the patient) passes through triage (the Database), reaches a specialist desk (the Table), is written into the official intake log (the WAL) before anyone does anything permanent, and is finally filed into the ward roster (the B+ tree), which lives in cabinets (the pages). If the power fails mid-intake, the intake log tells us exactly which admissions were official.

◆ The key insight

A database is not one thing — it is a stack of specialized layers, each solving one narrow problem. If you know which layer owns a concern, you know where to look when something breaks.

Why should you care?

When you tell an AI "add a column" or "speed up this query," it needs to know which layer the change belongs to. If you can name the layers — storage, index, transaction, query — you can steer the AI with specifics instead of accepting generic, half-right code.

01

The five layers, top to bottom

Every insert falls through these. Each row below is one specialist with exactly one job.

1 · Public API

The friendly front door: db.Insert(...). Hides everything below it.

Database.cs

2 · Table & Schema

Validates the row against the column types and extracts the primary key.

Table.cs

3 · Transaction + WAL

Records the change in a durable log before applying it — the durability point.

WALManager.cs

4 · B+ Tree Index

Stores the row by key in sorted order so it can be found again in O(log n).

BPlusTree.cs

5 · Page Storage

Lays the bytes onto fixed-size 4 KB pages — the only thing the disk understands.

StorageEngine.cs
02

Read the real code

This engine is a real C#/.NET embedded database. Here's the public API a developer actually calls — code on the left, plain English on the right.

README.md — usage
using var db = new Database("mydata.mde", cacheSize: 100);

var columns = new List<ColumnDefinition>
{
    new ColumnDefinition("Id", DataType.Int, false),
    new ColumnDefinition("Name", DataType.String),
    new ColumnDefinition("Age", DataType.Int)
};

var table = db.CreateTable("Users", columns,
                           primaryKeyColumn: "Id");

var row = new DataRow(table.Schema);
row["Id"] = 1;
row["Name"] = "Alice";
row["Age"] = 30;

db.Insert("Users", row);
In plain English

Open (or create) a database file on disk. The cacheSize says how many pages to keep hot in memory.

Describe the schema: three columns, with Id as the primary key. The schema is the contract every row must satisfy.

Build one row, fill its cells by column name, and hand it to db.Insert. That single call is the whole iceberg tip — everything else in this course is what happens beneath it.

03

Watch the row fall through the stack

Press play. The packet is your row; follow it from your code down to durable bytes. The amber step is where durability is guaranteed.

db.Insert("Users", row) — the journey
04

Inside Table.Insert

Here is the actual insert path inside the Table layer. Notice the order of operations — it is the entire story in fifteen lines.

MiniDatabaseEngine/Table.cs
public void Insert(DataRow row, Transaction? transaction = null)
{
    _lock.EnterWriteLock();
    try
    {
        var key = GetPrimaryKey(row);
        ValidateRowForWrite(row);

        var keyExists = _index.Search(key) != null;
        if (keyExists || keyPendingInTransaction)
            throw new InvalidOperationException(
                $"Duplicate primary key value '{key}'.");

        var serialized = SerializeRow(row);

        if (transaction != null)
        {
            transaction.LogInsert(_schema.TableName, key, serialized);
            return;
        }
        _index.Insert(key, serialized);
    }
    finally
    {
        _lock.ExitWriteLock();
    }
}
Line by line

EnterWriteLock — grab an exclusive lock so no other thread can write this table at the same time (Module 6).

GetPrimaryKey / ValidateRowForWrite — pull out the key and check the row obeys the schema.

_index.Search(key) — reject duplicates. A key already in the tree (or buffered in this transaction) is an error.

SerializeRow — flatten the row to bytes.

transaction != null — if inside a transaction, only log the insert now; the index is updated later, on commit (Module 5). Otherwise, write straight to the index.

finally — release the lock no matter what, even if an exception was thrown.

Separation of concerns. Each layer has one job. Durability is the WAL's job, sort order is the index's job, disk I/O is the storage engine's job. Confusing these is the #1 bug in amateur database code.

05

Zoom in: what "serialize to bytes" really means

Step 4 — "the row becomes a flat byte array" — sounds like hand-waving. It isn't. A disk can't store a tidy { Id: 1, Name: "Alice" } object; it stores bytes. Serialization is the precise rule for turning each field into bytes. Here is exactly what this engine's DataSerializer produces — edit the row and watch every byte update live.

DataRow → byte[]0 bytes total
▼   each field is serialized in order   ▼
the actual bytes written  ·  hover any cell
null-marker (01 = present) length prefix (strings) value bytes

Three rules are visible here. (1) Every field starts with a null-marker byte — 01 means "a value is present" (a null would be a lone 00). (2) An Int is always exactly 4 bytes, written little-endian (least-significant byte first — that's why 1 shows as 01 00 00 00). (3) A String is variable length, so it's prefixed with its byte-count, then the UTF-8 characters. Fixed-width numbers, length-prefixed text — that's the whole grammar.

Try typing a longer name, or a multi-byte character like é or . The length prefix and byte count climb — proof that text is genuinely variable-width on disk, which is exactly why the storage layer (Module 2) can't assume every row is the same size.

06

The seven steps, at your own pace

The same journey as the animation, frozen so you can re-read it.

You call db.Insert

One method call enters the engine carrying your row.

Database finds the table

Looks up "Users" by name and forwards the row to it.

Table validates the row

Checks each cell against the schema; rejects type mismatches and duplicate keys.

Row is serialized to bytes

The structured row becomes a flat byte array — the form disk understands.

Bytes appended to the WAL

Written to the durable log first. If we crash after this, recovery can replay it.

Bytes inserted into the B+ tree

Filed by key in sorted order, ready for fast lookups and range scans.

Success returns to you

The row is findable, durable, and safe. Total time: milliseconds.

07

Check yourself

Four questions. Pick an answer to reveal the reasoning.

Scenario
Users report duplicate-key errors even after rolling back a transaction. Which check is firing?
Correct. Besides _index.Search(key), the Table checks HasBufferedValueForKey — a key still pending in the current transaction also counts as a duplicate, which can surprise you mid-transaction.
Architecture
You want to add compression for stored rows. Which layer should own it?
Right. Compression belongs where bytes are formed — the serialization/storage layer. Compressing at the API bypasses validation and durability guarantees below it.
Debugging
A write "succeeded" but disappeared after a crash. What's the likely cause?
Exactly. If "success" is returned before the log is physically on disk, a crash can erase a change that was reported as durable. Module 5 is all about getting this right.
Tracing
Put these in the order they happen: (a) bytes on disk, (b) row serialized to bytes, (c) primary key extracted, (d) WAL entry appended.
Correct. Extract the key, serialize the row, append to the WAL (durable), and only then do the bytes ultimately settle onto disk pages.

Up next: we zoom into the very bottom of the stack — storage. How does the engine physically organize a disk it can only address in fixed-size blocks? Enter pages and extents.