Your Build Cache Is Lying to You

Your CI passed. Your tests passed. The build cache said nothing changed. It was wrong.

You shipped a broken build to production because a cache returned stale output from three weeks ago. The bug took two days to find. The fix was one line. The root cause was not your code. It was your build system’s trust model.

This is not a theoretical failure mode. I’ve seen it at Microsoft, in monorepos with hundreds of packages, where a stale cache returned test results that passed against an old version of a shared config. The tests looked green. The config had changed. Nobody declared it as an input. The cache never knew.

The enemy: declared-input trust

Every mainstream JS monorepo build cache works the same way. Turborepo, Nx, and lage (which I co-created at Microsoft) all share the same architecture: you declare what files a task reads, the system hashes those files, and the hash becomes the cache key. If the hash matches a stored entry, the cache returns the stored output without running the task.

The contract is simple. You tell the cache what matters. The cache trusts you.

The problem is also simple. You will forget something. Not today. Not on the obvious files. You’ll forget on the weird one: the shared tsconfig.base.json three directories up, the .env.local that a test helper reads at runtime, the codegen config that graphql-codegen pulls in during a build step. You won’t forget because you’re careless. You’ll forget because the relationship between that file and your task is invisible until something breaks.

Here’s what that looks like in code. Turborepo computes a cache key like this:

cache_key = blake3(
  command           // "tsc -b"
  + input_globs     // "src/**/*.ts", "tsconfig.json"
  + lockfile_hash   // yarn.lock content hash
  + env_vars        // NODE_ENV, etc.
)

Nx does the same thing with “named inputs.” Lage does the same thing with environmentGlob and package content hashes. The surface syntax differs. The trust model is identical: hash what the developer declared, skip what they didn’t.

The lie in practice

Here’s a concrete scenario. Your team has a monorepo with 200 packages. Package api has a build task that runs tsc -b. The cache key includes src/**/*.ts and tsconfig.json. Builds are fast, CI is green, everyone is happy.

Then a developer on another team adds a feature flag system. The flags live in config/feature-flags.json at the repo root. The api package reads this file at build time through a small codegen step baked into its tsc plugin. The developer who added the flags updated the build script. They did not update the Turborepo pipeline to declare config/feature-flags.json as an input to api#build.

What happens next:

CI runs. api#build executes with the new codegen. The cache stores the output, keyed to the current hash of src/**/*.ts + tsconfig.json.
A week later, someone changes config/feature-flags.json to disable a flag.
CI runs again. Turborepo hashes src/**/*.ts + tsconfig.json. Nothing changed in those files. Cache hit. Turborepo returns the stored output from step 1.
The build output still has the old feature flags baked in. Tests run against the stale output. They pass, because the tests don’t check flag values.
The stale build ships to production. The feature that should be disabled is still enabled.

The developer who changed the flag file did everything right. The developer who set up the codegen did everything right except declare one input. The cache did exactly what it was told. The result is a production bug that traces back to a missing line in turbo.json.

No tool warned about this. No test caught it. The cache returned a stale result and reported it as a cache hit. The system lied, and the lie was silent.

Why this is structural, not accidental

The stale cache bug above looks like a mistake someone made. It’s tempting to say “just be more careful about declaring inputs.” That’s the official answer from every tool that uses this model. Turborepo’s docs say to list all relevant files in your inputs configuration. Nx says to define “named inputs” correctly.

This advice is correct and useless. It’s correct because yes, if you perfectly declare every file every task reads, the cache will be correct. It’s useless because no human, across hundreds of packages maintained by dozens of developers over years, will sustain perfect declarations. The failure rate approaches 100% given enough time and enough packages.

The deeper problem is that “declare your inputs” conflates two architecturally different things:

Trust declared inputs. The developer tells the system what matters. The system believes them. If they’re wrong, the system is wrong. This is the Turborepo/Nx/lage model.

Verify observed inputs. The system watches what the task actually reads at runtime. If the developer forgot to declare a file, the system catches it anyway because it observed the file access. This is the BuildXL model. It’s also what rage implements.

These are not two flavors of the same thing. They are fundamentally different trust architectures. One makes correctness a property of human discipline. The other makes correctness a property of the mechanism.

The receipts

Here’s what Turborepo’s cache key computation looks like, simplified from the Turborepo source:

// Turborepo: single-phase, declared inputs only
cache_key = blake3(
    task_command,
    sorted(hash(file) for file in resolve_globs(declared_input_globs)),
    hash(lockfile_relevant_section),
    sorted(env_var_pairs),
)
// If file X was read by the task but not in declared_input_globs,
// it is invisible. A change to X produces the same cache_key.
// The cache returns stale output. No warning.

Now here’s what a two-phase fingerprint looks like. This is the model rage uses, borrowed directly from BuildXL:

// Phase 1: Weak Fingerprint (cheap, runs every lookup)
weak_fp = blake3(
    task_command,
    hash(tool_binary),               // compiler upgrade = new key
    sorted(hash(file) for file in resolve_globs(declared_input_globs)),
    sorted(env_var_pairs),
    sorted(dep_abi_fingerprints),     // upstream package ABI hashes
)

// Phase 2: Strong Fingerprint (runs only on WF match)
pathset = pathset_store.lookup(weak_fp)  // files the sandbox OBSERVED
                                         // on a previous run of this task
strong_fp = blake3(
    weak_fp,
    sorted(hash(content(file)) for file in pathset.reads),
)
// Cache hit only if strong_fp matches a stored entry.

The difference is one word: pathset. The weak fingerprint uses declared inputs, same as Turborepo. The strong fingerprint adds the files the task actually read, as reported by a file-access sandbox that monitored the process. If a task reads feature-flags.json and nobody declared it, the sandbox catches the read. The pathset includes it. The strong fingerprint includes its content hash. Next time feature-flags.json changes, the strong fingerprint changes, the cache misses, and the task re-runs.

The failure mode is inverted. In Turborepo, a missing declaration is a silent stale cache. In rage, a missing declaration is a one-time cache miss while the sandbox observes the read. The next run is correct and cached. You don’t need perfect declarations. The mechanism compensates.

The cost of observation

This is not free. The sandbox adds overhead. On macOS, rage uses DYLD_INSERT_LIBRARIES to inject a shared library that intercepts filesystem calls. On Linux, it uses eBPF tracepoints. Both record every file the task reads and writes, then ship that log back to the cache layer.

The overhead is real: single-digit milliseconds per task on warm caches, more on cold runs. Turborepo’s flat hash is faster when every input is correctly declared. If you have a small monorepo and a disciplined team that maintains perfect input declarations, the simpler model wins on raw speed.

But “perfect declarations” is a bet against human nature. I maintained lage at Microsoft for years. I watched teams get input declarations wrong. Not because they were bad engineers. Because the codebase evolved, new files appeared, implicit dependencies formed, and nobody updated the pipeline config. The stale cache bugs were always silent, always delayed, always expensive to diagnose.

The sandbox makes the mechanism do the work that humans can’t sustain. The cost is milliseconds. The payoff is correctness you don’t have to think about.

What the sandbox actually sees

When rage runs a task, the sandbox records a Pathset:

pub struct Pathset {
    pub reads: BTreeSet<PathBuf>,   // every file the task read
    pub writes: BTreeSet<PathBuf>,  // every file the task wrote
}

This pathset is stored alongside the weak fingerprint. A single weak fingerprint can have multiple pathsets: the same tsc -b command might follow different code paths under different conditions and read different files. The strong fingerprint phase disambiguates them.

On cache lookup, rage iterates through candidate pathsets for the matching weak fingerprint, hashes the current content of each pathset’s read files, and checks whether any resulting strong fingerprint matches a stored output. A hit means the task’s actual inputs (not just declared inputs) are unchanged. A miss means something the task reads has changed, and it needs to re-run.

The cache tells the truth because the sandbox saw the truth.

The next post

The two-phase fingerprint and the sandbox that feeds it are the core of what makes rage’s cache correct. How the weak fingerprint and strong fingerprint interact, how pathsets accumulate and resolve, how ABI fingerprints cut off unnecessary downstream rebuilds. That’s Post 2: “Two-Phase Fingerprinting: BuildXL’s Deep Magic.”

For now, the point is simpler. If your build cache hashes only what you declared, it is exactly as correct as your declarations are complete. In a 200-package monorepo maintained by 40 engineers over 3 years, your declarations are not complete. Your cache is lying to you. You just haven’t caught it yet.