Database Internalsmodule 5 of 8
Module 05 ~17 min · durability & recovery

Transactions, WAL
& Crash Recovery

A bank transfer: debit account A, then credit account B. What if the power dies between those two steps? Without transactions, A is poorer and B is no richer — money just evaporates. This module is how databases make that impossible.

✈️ Hold this picture

An airline check-in desk. You present your tickets (a transaction begins). The agent writes every item into the official logbook before stamping anything on your boarding passes (the write-ahead log). If the power cuts mid-check-in, the next agent reads the logbook and finishes exactly the check-ins marked "complete." Anything without a complete stamp is thrown out. That's how airlines don't lose your luggage — and how databases don't lose your money.

◆ The key insight

Durability requires logging before doing. Apply the change first and hope to log it later, and a crash between the two leaves the change done but unrecorded. The invariant is simple: log first, apply after. That one rule powers crash recovery everywhere.

Why should you care?

When an AI writes "we'll save the order and send the email in parallel" — that's wrong. When it suggests cache.set() before db.commit() — that's wrong. Recognizing the "log first" pattern lets you catch real durability bugs in AI-written code instantly.

01

First, feel the problem

A transfer is really two steps: take $100 from A, then give $100 to B. Between those steps the money exists nowhere. If the power dies right then, where did it go? Run it both ways — leave the crash switched on — and keep an eye on the system's total.

Transfer $100 · A → Bdoes the money survive a crash?
ACCOUNT A
$500
$100
ACCOUNT B
$300
02

The promise a transaction makes: ACID

A transaction wraps a group of changes in four guarantees. The money demo you just ran was atomicity and durability in action — here are all four.

all or nothing

Atomicity

Every step of the transaction happens, or none does. Never half-done.

invariants hold

Consistency

Rules like "total balance doesn't change" are true before and after.

as if one-at-a-time

Isolation

Concurrent transactions look like they ran in sequence, never mid-mess.

survives anything

Durability

Committed means committed — even if the server explodes one millisecond later.

03

The fix: write it down before you do it

The write-ahead log (WAL) is an append-only file. Every change is recorded there — durably, in order — before the database itself is touched. The ordering is the entire trick. Compare the two possible orderings and what a crash does to each:

apply first, log later
1. apply the change to the database
   💥 CRASH HERE
2. write the change to the log  ← never runs
The change is in the database, but the log never recorded it. Recovery has no idea it happened — and cannot undo it if the transaction should have rolled back. Silent corruption.
log first, apply after
1. write the change to the log, flush
   💥 CRASH HERE
2. apply the change  ← not yet done
The log holds the exact intended state. Recovery replays committed entries and ignores the rest. Whatever the timing of the crash, the database rebuilds correctly. Always safe.

Append-only

Entries are only ever added to the end — never edited in place. Appending is fast and crash-friendly.

Sequenced

Every entry carries a sequence number, giving a total order so recovery replays events exactly as they happened.

Durable on commit

A change is only "real" once its record is flushed to disk — that fsync is the heartbeat of durability.

04

The WAL Crash Lab

Now drive the engine yourself. Begin a transaction, insert a few rows, and watch the WAL (left, on disk) fill up while the database (right, in memory) stays empty — that is deferred writing. Then either Commit to apply the rows, or pull the plug with Crash and press Recover to see exactly what the log can rebuild.

WAL Crash Lablog first · apply after · recover from the log
try a scenario:
Write-Ahead Log ● on disk — survives a crash
Database · the index ● in memory — lost on crash

The whole module in one experiment. Inserts pile up in the log but never touch the database until commit. A crash wipes memory but not the log. Recovery replays only what was committed. Atomicity, durability, and deferred writes — all visible in one panel.

05

Watch a commit happen

The actors talk it out. Notice the order: the WAL is flushed to disk before the index is touched. That flush is the exact moment "maybe" becomes "definitely."

💬 #txn-42 — a commit, step by step
06

The commit, in code

Transaction/Transaction.cs — Commit
public void Commit()
{
    _lock.EnterWriteLock();
    try
    {
        // 1. Log the commit
        _walManager.AppendEntry(new WALEntry
        {
            TransactionId = _transactionId,
            OperationType = WALOperationType.Commit
        });

        // 2. Force flush to ensure durability
        _walManager.Flush();

        // 3. Apply buffered writes only AFTER the WAL is durable
        _commitApplyCallback(_entries);

        _state = TransactionState.Committed;
    }
    finally { _lock.ExitWriteLock(); }
}
The ACID-defining moment

AppendEntry(Commit) — write a COMMIT marker into the log.

Flush()this line is durability. It forces the log to disk. If we crash one instruction later, recovery will replay this transaction.

_commitApplyCallback — only now do the buffered changes land in the B+ tree. Flush before apply. Always.

07

What a log entry remembers

Transaction/WALEntry.cs
public class WALEntry
{
    public long TransactionId { get; set; }
    public WALOperationType OperationType { get; set; }
    public string TableName { get; set; }
    public object? Key { get; set; }
    public byte[]? OldValue { get; set; }
    public byte[]? NewValue { get; set; }
    public long SequenceNumber { get; set; }
}
Enough to redo or undo

OldValue + NewValue together — keep both and you can redo a change or undo it.

SequenceNumber — a total order across all entries, so recovery replays them in exactly the order they happened.

TransactionId — which transaction this belongs to. Recovery groups by this to find complete transactions.

08

Crash recovery, step by step

The power came back. Press play to watch the engine rebuild a correct state from nothing but the log on disk — replaying committed work, discarding everything else.

RecoverFromWAL() — back from the dead
Transaction/TransactionManager.cs — RecoverFromWAL
public void RecoverFromWAL(Action<WALEntry> applyEntry)
{
    var entries = _walManager.ReadEntriesForRecovery();
    var transactions = new Dictionary<long, List<WALEntry>>();
    var committed = new HashSet<long>();

    foreach (var entry in entries)
    {
        if (entry.OperationType == WALOperationType.Commit)
            committed.Add(entry.TransactionId);
        else if (entry.OperationType != WALOperationType.Rollback)
            Bucket(transactions, entry);
    }

    foreach (var txnId in committed)         // replay ONLY committed
        foreach (var entry in transactions[txnId])
            applyEntry(entry);
}
Group, filter, replay

Read every entry — the whole log, start to finish.

committed.Add(...) — note which transactions reached a COMMIT.

Bucket(...) — pile each non-commit entry under its transaction id.

replay ONLY committed — apply the committed buckets to the index; everything else is thrown away. The log is the single source of truth.

09

Checkpoints keep recovery fast

If the log grew forever, recovery would take forever. A checkpoint is the engine tidying up.

Flush all dirty pages

Push every in-memory change out to disk so the data file is current.

Write a Checkpoint entry

Record which transactions were still active at this moment.

Truncate the old WAL

Everything before the checkpoint is now safely in the data file — discard it.

Smaller WAL, faster recovery

Next crash, there's far less log to replay. Recovery time stays bounded.

"Log first, apply after" is ACID's beating heart. If the commit record is on disk, recovery will replay the change. If it isn't, the change never happened. There is no third state.

Deferred writes make rollback free. This engine doesn't touch the B+ tree until after commit. So rollback has nothing to undo — it just discards the buffer. The trade-off: other transactions can't see your in-flight writes (that's the next module).

10

Check yourself

Scenario
Power cuts after the commit record was written but before the index was updated. On restart, is the row there?
Correct. The COMMIT marker is durably on disk, so recovery replays this transaction onto the index. The crash mid-apply doesn't matter — replay finishes the job.
Scenario
Power cuts between two inserts in the same transaction, before commit. What survives?
Right. Logged ≠ committed. Without a COMMIT marker the whole transaction is dropped on replay. That's atomicity — all or nothing.
Debugging
A stress test shows committed data lost after a crash. Where do you look first?
Exactly. If "success" returns before the log is physically on disk, a crash erases supposedly-durable data. The flush ordering is the first suspect, every time.
Architecture
Why doesn't rollback need to "undo" anything in this engine?
Correct. Nothing was applied yet, so there's nothing to reverse. Discard the buffered entries and you're done — clean and cheap.

Up next: one transaction is tidy. But a real database runs thousands at once. How does the engine keep them from corrupting each other's work? Locks, lock ordering, and the dreaded deadlock.