Module 04 ~15 min · LINQ & query planning

Indexes & the
Query Plan

The same query can be lightning-fast or painfully slow depending on what the planner decides. You wrote .Where(r => r["Id"] == 42) — did it hit the index, or scan all 10 million rows? The query plan holds the answer.

🛰️ Hold this picture

A GPS route planner. When you ask for directions, it doesn't try every possible road — it reads your request, notices "highway only," and picks an efficient path. A LINQ query like .Where(r => (int)r["Id"] > 5) goes through a planner that asks one question: "Is the filter on the indexed column? If yes, use the B+ tree. If not, scan the whole table."

◆ The key insight

A query is just a request. The planner turns it into a strategy — access path, filters, sort. This split — declare what you want vs. decide how to get it — is exactly why SQL and LINQ feel magical: you describe the destination, the planner picks the route.

Why should you care?

"Why is this query slow?" is the most common question developers ask. Ninety percent of the time the answer is: the planner couldn't use an index. If you can read a query plan, you can fix most slowness yourself — no guessing, no AI roulette.

What an index actually is

An index is a second structure that keeps your keys in sorted order, each pointing at its row — precisely the B+ tree from Module 3. It's the index at the back of a textbook: rather than reading all 900 pages to find "mitochondria," you flip to M. But an index is never free — it is a deliberate trade you make with your eyes open.

the upside

Reads get dramatically faster

A lookup drops from O(n) — read every row — to O(log n). On 10M rows that's ~10 million comparisons versus ~23. This is the entire point.

the cost

Writes get slower

Every insert, update, and delete must also update the index tree — finding the leaf, maybe splitting it. More indexes means slower writes. There's no free lunch.

the cost

It takes real space

The index is genuine data on disk in its own pages. A heavily-indexed table can carry more bytes in its indexes than in the rows themselves.

◆ The optimizer's bargain

An index is a bet: you pay a little on every write so you can win big on reads. Index the columns you actually filter and sort on — never every column, and never blindly.

Why it matters: scan vs. index

Here is the whole argument for optimization in one picture. The same query, run two ways. Pick a table size and a query, hit Run, and count the rows examined by each. Then switch the query to a column with no index and watch what happens.

Scan vs. Indexrows examined to answer one query

query

rows

Full table scanO(n)examined 0

Index lookup · B+ treeO(log n)examined 0

Run the planner

So who decides between those two paths? The query planner. Pick a query below and watch it translate your declarative LINQ into a concrete access path, then see which physical method on Table actually runs — and what it costs.

Query plannerdeclarative intent → physical execution

① LINQ you wrote

→

② Plan chosen

→

③ Physical op

How the planner picks a path

Every query funnels through the same three questions. This little flowchart is the optimization decision — internalize it and you can predict any query's access path and cost before you run it.

Does the filter touch the primary key, Id?

if NO →Full scan · O(n)

if YES

Is it an equality test, Id == x?

if YES →Point lookup · O(log n)

if NO — it's >, <, or a between

Range scan · O(log n + k)

Filters the index can absorb are pushed down into the access path and cost nothing extra. Anything left over — a filter on a non-key column — becomes an in-memory predicate applied to every row the access path returns. Fewer rows returned = less leftover work.

Inside the planner's code

The planner is three small methods. Toggle between them — each is one phase of turning a request into a result.

public TResult Execute<TResult>(Expression expression)
{
    var plan = BuildExecutionPlan(expression);
    IEnumerable<DataRow> rows = ExecuteAccessPath(plan);

    foreach (var predicate in plan.Predicates)
        rows = rows.Where(predicate);            // filter

    if (plan.OrderByColumn != null)              // sort
        rows = plan.IsOrderByDescending
            ? rows.OrderByDescending(r => r[plan.OrderByColumn])
            : rows.OrderBy(r => r[plan.OrderByColumn]);

    return (TResult)(object)rows;
}

Three phases. Pick a door (the access path), filter what comes through, then sort the result. Filtering and sorting always happen after the rows are fetched — in memory.

private QueryExecutionPlan BuildExecutionPlan(Expression expression)
{
    var whereVisitor = new WhereExpressionVisitor(
        _primaryKeyColumn, _primaryKeyType, _primaryKeyComparer);
    whereVisitor.Visit(expression);

    var orderVisitor = new OrderByExpressionVisitor();
    orderVisitor.Visit(expression);

    return new QueryExecutionPlan(
        whereVisitor.Predicates,
        whereVisitor.IndexRange,
        orderVisitor.OrderByColumn,
        orderVisitor.IsDescending);
}

The visitor pattern. A visitor walks the LINQ expression tree and pulls out two things: predicates it can't push down, and an IndexRange it can — when the filter touches the primary key.

private IEnumerable<DataRow> ExecuteAccessPath(QueryExecutionPlan plan)
{
    if (_primaryKeyColumn != null && plan.IndexRange != null)
    {
        if (plan.IndexRange.ExactKey != null)
            return One(_table.SelectByKey(plan.IndexRange.ExactKey));

        if (plan.IndexRange.HasLowerBound || plan.IndexRange.HasUpperBound)
            return _table.SelectByPrimaryKeyRange(
                plan.IndexRange.LowerBound, plan.IndexRange.UpperBound);
    }
    return _table.SelectAll();   // fall back to a full scan
}

The if-ladder that decides everything. Exact key? One jump. A range? Leaf-walk. Neither? Full scan. This single method is the difference between O(log n) and O(n).

Spot the footgun

This query looks like it should use the index — but it won't. Click the line you think breaks the optimization.

Click the line that defeats the index

var x = db.Query("Users")
    .Where(r => ((int)r["Id"]).ToString() == "42")
    .ToList();

hint: the planner can only recognize one specific shape…

Found it. The planner pattern-matches r["Id"] == constant. Wrapping the column in a function call — .ToString() — breaks the pattern, so the IndexRange is never built and it falls back to a full scan. This is the exact same real-world footgun as WHERE TO_STRING(id) = '42' in SQL: never wrap an indexed column in a function in your filter.

The optimization playbook

Everything in this module distills into a handful of habits. These are the moves that turn a slow query fast — and the lens to read any AI-suggested query through.

Filter on an indexed column

If the planner can't map your filter to an index, it full-scans. Here only the primary key is indexed — so filter on Id to stay fast.

point / range

Never wrap an indexed column in a function

Keep filters sargable. WHERE Id = 42 uses the index; WHERE toString(Id) = "42" defeats it and falls back to a scan.

sargability

Push the work into the access path

A key range the index can absorb beats fetching everything and filtering in memory. Let the B+ tree do the narrowing — that's pushdown.

do less work

Remember ORDER BY costs a sort

Sorting happens in memory, after filtering — an extra O(n log n) over the matched rows. Filter hard first so there's less to sort.

O(n log n)

Index for selectivity

An index pays off when it narrows the result a lot. Indexing a column with two values (like a yes/no flag) barely helps — you still touch half the table.

be choosy

Where it lands: a point lookup

Table.cs — SelectByKey

public DataRow? SelectByKey(object key)
{
    _lock.EnterReadLock();
    try
    {
        var serialized = _index.Search(key);
        if (serialized == null)
            return null;
        return DeserializeRow((byte[])serialized);
    }
    finally
    {
        _lock.ExitReadLock();
    }
}

The whole chain

When the plan is a point lookup, this is what runs. EnterReadLock — many readers allowed at once (Module 6).

_index.Search(key) — straight into the B+ tree from Module 3. One descent, O(log n).

DeserializeRow — turn the stored bytes back into a row. Your declarative == 42 ended here, as a single tree walk.

Declarative vs. imperative. LINQ describes what you want; the planner figures out how. This is the single biggest reason databases are productive — you never hand-code the loops.

No secondary indexes here. Only the primary key is indexed. .Where(r => r["Name"] == "Alice") will always full-scan. Knowing this stops you from blaming the AI — or a phantom slowdown — when the real answer is "there was never an index to use."

Check yourself

Scenario

.Where(r => (int)r["Age"] > 25).Where(r => (int)r["Id"] == 7) — which filter helps the index?

Correct. Id is the primary key, so == 7 becomes a point lookup. The Age predicate can't use an index, so it's applied to the single fetched row in memory.

Debugging

A query got slower after someone added an ORDER BY. Why?

Right. The access path is unchanged, but a sort over the result set is added at the end. On large result sets that O(n log n) sort is what you feel.

Architecture

You want a secondary index on Email. Which class changes most?

Exactly. The Table would own one tree per indexed column, and WhereExpressionVisitor would need to choose which index a given filter can use.

Tracing

Given .Where(r => (int)r["Id"] >= 10 && (int)r["Id"] < 20), describe the plan.

Correct. Both comparisons are on the indexed key, so the planner folds them into a single range. The B+ tree finds leaf 10 and walks to 20 — no predicate left over.

Up next: reads are the easy half. Writes that must survive a crash are where it gets interesting. Transactions, the write-ahead log, and how databases never lose your money.

Indexes & theQuery Plan

Why should you care?

What an index actually is

Reads get dramatically faster

Writes get slower

It takes real space

Why it matters: scan vs. index

Run the planner

How the planner picks a path

Inside the planner's code

Spot the footgun

The optimization playbook

Filter on an indexed column

Never wrap an indexed column in a function

Push the work into the access path

Remember ORDER BY costs a sort

Index for selectivity

Where it lands: a point lookup

The whole chain

Check yourself

Indexes & the
Query Plan