Module 08 · Finale ~13 min · metrics & logs

Observability
Watching the engine run

Your database has been running for three days. Was everything fine? Did commits happen? How many rolled back? If you can't answer those in under five seconds, your system isn't observable.

🛩️ Hold this picture

The cockpit instrument panel of a plane. The pilot can't open the hood and look at the engine mid-flight — they trust the gauges (metrics) and the cockpit voice recorder (structured logs) to know what's happening. Without instruments, you fly blind and only learn there's a problem when the plane falls.

◆ The key insight

Observability has two halves. Metrics — cheap numbers counting events over time — tell you something is wrong. Structured logs — rich, searchable records — tell you what. You need both.

Why should you care?

When AI writes Console.WriteLine("did the thing"), that's not observability — it's debug-print pollution. Recognizing the difference between structured logging and print-debugging is one of the biggest quality markers in professional code.

The cockpit, live

Press Run workload to drive a simulated stream of database operations. The counters tick, structured log lines stream past, and the sparkline tracks write pressure — exactly the signals this engine emits. Then take a snapshot, the way real code reads metrics.

Engine telemetrymetrics + structured logs, live

structured log stream · JSON

write pressure (flushes + checkpoints)

Print-debugging vs. structured logging

Same event, two worlds apart. One you can grep, filter, and ship to a dashboard. The other you can only squint at.

debug-print pollution

Console.WriteLine("insert done");

No timestamp. No table. No transaction id. Not searchable, not filterable, not shippable. Useless the moment you have more than one of them.

structured logging

{"timestamp":"2026-04-22T10:12:03Z",
 "level":"Information",
 "event":"row.inserted",
 "table":"Users","txn":42}

Greppable by event:row.inserted. Filterable by level. Carries context as fields. Ships straight to Splunk, Datadog, or a query.

A structured log entry

Observability.cs — DatabaseLogEntry

public sealed class DatabaseLogEntry
{
    public DateTimeOffset Timestamp { get; init; }
        = DateTimeOffset.UtcNow;
    public DatabaseLogLevel Level { get; init; }
    public string EventName { get; init; } = "";
    public string Message { get; init; } = "";
    public IReadOnlyDictionary<string, object?> Properties
        { get; init; } = new Dictionary<string, object?>();

    public string ToJson() { /* serialize all of the above */ }
}

The anatomy

Timestamp — an ISO timestamp, sortable and unambiguous across time zones.

EventName — the stable key. You search event:row.inserted forever, even as the message wording changes.

Properties — arbitrary context as key/value pairs: which table, which transaction, how many pages.

ToJson — flatten it all to one machine-readable line.

Counters that never lose a tick

Observability.cs — DatabaseMetrics

internal sealed class DatabaseMetrics
{
    private long _inserts;
    private long _flushes;
    private long _checkpoints;

    public void IncrementInserts()
        => Interlocked.Increment(ref _inserts);

    public DatabaseMetricsSnapshot Snapshot() => new()
    {
        Inserts     = Interlocked.Read(ref _inserts),
        Flushes     = Interlocked.Read(ref _flushes),
        Checkpoints = Interlocked.Read(ref _checkpoints),
    };
}

Lock-free counting

Interlocked.Increment — the one thread-safety primitive that makes a counter safe without a lock.

A plain _inserts++ is three steps (read, add, write); two threads can interleave and lose a count. Interlocked does it as one atomic CPU instruction.

Snapshot() — reads every counter atomically into a consistent point-in-time view. That's exactly what the dashboard's snapshot button calls.

Interlocked.Increment is lock-free. It uses a hardware compare-and-swap that's atomic at the CPU level. That's why you can count millions of events per second without contention — no lock, no waiting.

The events the engine emits

Each is a stable, greppable key. Search any one of these in production and find every matching moment.

database.opened table.created transaction.started row.inserted row.updated row.deleted database.checkpoint database.flushed database.integrity_check

Metrics tell you what; logs tell you why. A dashboard showing "commits dropped 80%" is the alert. The matching log entries show which transactions failed and how — that's the root cause.

The telemetry pipeline, traced

Where do these signals actually come from? Every write quietly forks into two cheap side-channels — a counter bump and a log line — without slowing the operation down. Press play to follow one insert through the pipeline, then a later snapshot read.

one insert → two telemetry signals

Why structured wins: grep the logs

Here's the payoff. Because every line is structured data — not prose — you can query it. Filter by event, by level, or by free text and watch the matching count collapse. This is impossible with Console.WriteLine; it's effortless here. The bar up top shows the equivalent grep you'd run in production.

Log explorer15 structured entries · filter them like a pro

event

level

text

Try: filter to event: row.deleted — instantly you see every deletion across both tables. Now imagine doing that across a million lines from three days of traffic. Structured logs turn "I think something deleted those rows" into "here are the exact four that did, with timestamps."

Check yourself

Scenario

You suspect a query is making storage blow up. Which single metric do you watch live?

Correct. Flushes and checkpoints both mean bytes going to disk. A sudden climb is your "storage is under pressure" signal — exactly what the sparkline above tracks.

Debugging

Your boss asks "were any transactions rolled back last hour?" Can this engine answer directly?

Right. TransactionsStarted exists, but not a rollback counter. This is a great example of designing your telemetry around the questions you'll actually ask.

Architecture

Why not just use Console.WriteLine everywhere for logs?

Exactly. Free-text prints are a dead end at scale. Structured records are queryable data — the difference between "I think" and "I know."

Scenario

Two threads call IncrementInserts at the same instant. Can the counter lose a tick?

Correct. That's the whole point of Interlocked — the increment is indivisible at the hardware level. Plain ++ is a read-modify-write that can interleave and drop counts.

★

You made it through all eight modules

The goal was never to make you a database engineer. It was to make you fluent enough to steer AI and debug production — to look at a slow query, a lost write, or a hung thread and know which layer to interrogate. You can now do that.

Read real databases

SQLite's source is famously readable. Postgres's WAL code is legendary. Now you have the vocabulary to follow them.

Instrument your own code

Start with one counter and one structured log line per boundary: API → DB, DB → disk. Build the habit.

Question every AI suggestion

"Which layer does this belong in?" "How does it affect the WAL?" "What's the lock order?" Push back with specifics.

ObservabilityWatching the engine run

Why should you care?

The cockpit, live

Print-debugging vs. structured logging

A structured log entry

The anatomy

Counters that never lose a tick

Lock-free counting

The events the engine emits

The telemetry pipeline, traced

Why structured wins: grep the logs

Check yourself

You made it through all eight modules

Read real databases

Instrument your own code

Question every AI suggestion

Observability
Watching the engine run