Indexes & the
Query Plan
The same query can be lightning-fast or painfully slow depending on what the planner decides. You wrote .Where(r => r["Id"] == 42) — did it hit the index, or scan all 10 million rows? The query plan holds the answer.
Why should you care?
"Why is this query slow?" is the most common question developers ask. Ninety percent of the time the answer is: the planner couldn't use an index. If you can read a query plan, you can fix most slowness yourself — no guessing, no AI roulette.
What an index actually is
An index is a second structure that keeps your keys in sorted order, each pointing at its row — precisely the B+ tree from Module 3. It's the index at the back of a textbook: rather than reading all 900 pages to find "mitochondria," you flip to M. But an index is never free — it is a deliberate trade you make with your eyes open.
Reads get dramatically faster
A lookup drops from O(n) — read every row — to O(log n). On 10M rows that's ~10 million comparisons versus ~23. This is the entire point.
Writes get slower
Every insert, update, and delete must also update the index tree — finding the leaf, maybe splitting it. More indexes means slower writes. There's no free lunch.
It takes real space
The index is genuine data on disk in its own pages. A heavily-indexed table can carry more bytes in its indexes than in the rows themselves.
Why it matters: scan vs. index
Here is the whole argument for optimization in one picture. The same query, run two ways. Pick a table size and a query, hit Run, and count the rows examined by each. Then switch the query to a column with no index and watch what happens.
Run the planner
So who decides between those two paths? The query planner. Pick a query below and watch it translate your declarative LINQ into a concrete access path, then see which physical method on Table actually runs — and what it costs.
How the planner picks a path
Every query funnels through the same three questions. This little flowchart is the optimization decision — internalize it and you can predict any query's access path and cost before you run it.
Id?Id == x?Filters the index can absorb are pushed down into the access path and cost nothing extra. Anything left over — a filter on a non-key column — becomes an in-memory predicate applied to every row the access path returns. Fewer rows returned = less leftover work.
Inside the planner's code
The planner is three small methods. Toggle between them — each is one phase of turning a request into a result.
public TResult Execute<TResult>(Expression expression)
{
var plan = BuildExecutionPlan(expression);
IEnumerable<DataRow> rows = ExecuteAccessPath(plan);
foreach (var predicate in plan.Predicates)
rows = rows.Where(predicate); // filter
if (plan.OrderByColumn != null) // sort
rows = plan.IsOrderByDescending
? rows.OrderByDescending(r => r[plan.OrderByColumn])
: rows.OrderBy(r => r[plan.OrderByColumn]);
return (TResult)(object)rows;
}
private QueryExecutionPlan BuildExecutionPlan(Expression expression)
{
var whereVisitor = new WhereExpressionVisitor(
_primaryKeyColumn, _primaryKeyType, _primaryKeyComparer);
whereVisitor.Visit(expression);
var orderVisitor = new OrderByExpressionVisitor();
orderVisitor.Visit(expression);
return new QueryExecutionPlan(
whereVisitor.Predicates,
whereVisitor.IndexRange,
orderVisitor.OrderByColumn,
orderVisitor.IsDescending);
}
IndexRange it can — when the filter touches the primary key.private IEnumerable<DataRow> ExecuteAccessPath(QueryExecutionPlan plan)
{
if (_primaryKeyColumn != null && plan.IndexRange != null)
{
if (plan.IndexRange.ExactKey != null)
return One(_table.SelectByKey(plan.IndexRange.ExactKey));
if (plan.IndexRange.HasLowerBound || plan.IndexRange.HasUpperBound)
return _table.SelectByPrimaryKeyRange(
plan.IndexRange.LowerBound, plan.IndexRange.UpperBound);
}
return _table.SelectAll(); // fall back to a full scan
}
Spot the footgun
This query looks like it should use the index — but it won't. Click the line you think breaks the optimization.
var x = db.Query("Users") .Where(r => ((int)r["Id"]).ToString() == "42") .ToList();
r["Id"] == constant. Wrapping the column in a function call — .ToString() — breaks the pattern, so the IndexRange is never built and it falls back to a full scan. This is the exact same real-world footgun as WHERE TO_STRING(id) = '42' in SQL: never wrap an indexed column in a function in your filter.
The optimization playbook
Everything in this module distills into a handful of habits. These are the moves that turn a slow query fast — and the lens to read any AI-suggested query through.
Filter on an indexed column
If the planner can't map your filter to an index, it full-scans. Here only the primary key is indexed — so filter on Id to stay fast.
Never wrap an indexed column in a function
Keep filters sargable. WHERE Id = 42 uses the index; WHERE toString(Id) = "42" defeats it and falls back to a scan.
Push the work into the access path
A key range the index can absorb beats fetching everything and filtering in memory. Let the B+ tree do the narrowing — that's pushdown.
Remember ORDER BY costs a sort
Sorting happens in memory, after filtering — an extra O(n log n) over the matched rows. Filter hard first so there's less to sort.
Index for selectivity
An index pays off when it narrows the result a lot. Indexing a column with two values (like a yes/no flag) barely helps — you still touch half the table.
Where it lands: a point lookup
public DataRow? SelectByKey(object key)
{
_lock.EnterReadLock();
try
{
var serialized = _index.Search(key);
if (serialized == null)
return null;
return DeserializeRow((byte[])serialized);
}
finally
{
_lock.ExitReadLock();
}
}
The whole chain
When the plan is a point lookup, this is what runs. EnterReadLock — many readers allowed at once (Module 6).
_index.Search(key) — straight into the B+ tree from Module 3. One descent, O(log n).
DeserializeRow — turn the stored bytes back into a row. Your declarative == 42 ended here, as a single tree walk.
Declarative vs. imperative. LINQ describes what you want; the planner figures out how. This is the single biggest reason databases are productive — you never hand-code the loops.
No secondary indexes here. Only the primary key is indexed. .Where(r => r["Name"] == "Alice") will always full-scan. Knowing this stops you from blaming the AI — or a phantom slowdown — when the real answer is "there was never an index to use."
Check yourself
.Where(r => (int)r["Age"] > 25).Where(r => (int)r["Id"] == 7) — which filter helps the index?Id is the primary key, so == 7 becomes a point lookup. The Age predicate can't use an index, so it's applied to the single fetched row in memory.ORDER BY. Why?Email. Which class changes most?WhereExpressionVisitor would need to choose which index a given filter can use..Where(r => (int)r["Id"] >= 10 && (int)r["Id"] < 20), describe the plan.Up next: reads are the easy half. Writes that must survive a crash are where it gets interesting. Transactions, the write-ahead log, and how databases never lose your money.