604ms Rebuilds at 50,000 Modules

604ms. Warm rebuild, 50,000 modules. One file changed, one file re-summarized, the rest served from a content-addressed cache. Graph analysis runs on metadata alone and finishes before you lift your finger off the save key.

This is Cloudpack. The architecture borrows more from linker engineering than from anything in the JavaScript ecosystem. Here’s how we got there.

The ceiling

Every JavaScript bundler shares the same foundational architecture: read all source files, analyze the full module graph, emit output. Vite, webpack, esbuild, Rspack, Rolldown. Each generation made the traversal faster. None of them changed what gets traversed.

At 1,000 modules this works. At 10,000, builds take seconds and developers lose flow. At 50,000, CI runs in minutes, hot-reload delays compound into thousands of lost engineer-hours per month, and the problem gets linearly worse as the product grows.

This is not a vendor problem. It is a category problem. Build time scales with total module count, not with what changed. Rewriting the traversal in Rust makes O(n) faster. It does not make it O(Δ).

Compiler vs database

The traditional bundler is a compiler. Source files go in, a bundle comes out, the next build starts over.

Cloudpack is a database. The module graph persists across builds. Every output (dev server, CI artifact, production chunks, per-client delta manifest) is a materialized view of that graph. One truth, many queries. No separate dev/prod pipeline. No divergence.

The module graph is the software. The bundle is a query.

Change one file in a 50,000-module application. A compiler re-reads everything. A database updates one row and re-runs the affected queries.

Three phases

Cloudpack splits the build into three phases, each with a different cost model.

Phase 1: Summarize. Parse each source file into a ~2KB ModuleSummary: exports, imports, side-effect markers, inter-export call edges, ambient global references. Runs in parallel via rayon, one task per file, no shared mutable state. Cached by SHA-256(source). A file that hasn’t changed costs zero. At 50,000 modules the entire summary set is ~100MB. That fits in L3 cache.

Phase 2: Analyze. Read all summaries, build the full module graph. Three passes: BFS/DFS reachability from entry points, call-edge dead code elimination, delivery group assignment. Operates on summaries only, never source. At 50,000 modules this is a bandwidth-bound serial scan finishing in ~50ms. It re-runs only when a summary’s import or export structure changes. Implementation-only edits (function bodies, comments, formatting) skip it entirely.

Phase 3: Transform. Produce output files for alive, changed modules only. Drives Rolldown or Rspack behind a pluggable TransformEngine interface. Wall-clock time scales as |changed_alive_modules| / cores. Dead modules never reach this component.

The warm-rebuild path: change one file, re-summarize it (sub-millisecond), confirm no structural changes (skip Phase 2), transform one module. Everything else is a cache read at memory speed.

What linkers already proved

JavaScript bundling and native-code linking are structurally isomorphic. The linker world solved the equivalent scaling ceiling between 2019 and 2022. Cloudpack maps their innovations directly.

Linker Innovation	What It Proved	Cloudpack Equivalent
ThinLTO	Per-module summaries are sufficient for cross-module optimization. Full IR never needs to be in memory simultaneously.	The Summarizer. `ModuleSummary` carries call edges, side-effect markers, reachability data. Same role as LLVM’s `FunctionSummary`.
mold	LLD’s bottleneck was serial data-structure contention, not computation. Concurrent hash maps fixed it.	Parallel summarization via `rayon`. One task per file, zero shared mutable state.
BOLT / Propeller	Static heuristics lose to measured runtime call patterns for code layout. C³ clustering outperforms compile-time guesses.	PGO chunk grouping. Real browser sessions drive chunk layout, no rebuild required.
Incremental linking (mold)	If the tool is fast enough, the OS page cache makes re-running free. Complex on-disk incremental state is overhead.	At 2KB per summary, re-summarizing unchanged modules is cheaper than deserializing incremental state files.
Dynamic linking (ld.so)	Late-bound resolution. Load only what’s referenced. Share across processes.	The Adaptive Bundle Service. The service worker is `ld.so`. Content-hashed CDN chunks are shared libraries.

ThinLTO is the key precedent. Before it, LLVM loaded every module’s full IR into memory for link-time optimization. At large scale, this was impractical. ThinLTO proved that lightweight per-module summaries are sufficient for correct cross-module decisions. Cloudpack applies the same proof: you do not need to parse 50,000 source files to make correct bundling decisions. Parse them once, cache the 2KB summaries, operate on summaries after that.

One graph, four views

The module graph is the center of gravity. Every output form is a materialization of the same data.

The dev server reads summaries and serves native ESM directly. CI produces incremental artifacts by transforming only changed, alive modules. Production emits chunked bundles through the full transform engine. The Adaptive Bundle Service computes delta manifests per client, sending only chunks the browser doesn’t already have.

Same graph. Four queries. A bug in dev is a bug in prod because both derive from the same truth.

The receipts

Benchmarks run against synthetic codebases generated from the statistical fingerprint of real production code: file sizes, dependency fan-out, side-effect density, directory structure. Same seed produces byte-identical output. Reproducible.

Analysis pipeline (summarize + analyze, no transform)

Modules	Cold (ms)	Warm (ms)	Speedup	Graph (ms)
100	19	3	6.3x	0
500	73	27	2.7x	3
1,000	141	58	2.4x	7
5,000	694	297	2.3x	38
10,000	1,363	604	2.3x	84

On a warm rebuild (one file changed out of N), N-1 modules are sub-millisecond cache reads. At 50,000 modules, projected via linear regression: 604ms warm, 11x faster than cold.

Full pipeline (with transform)

Modules	Cold	Warm +1 change	Speedup
100	20 ms	15 ms	1.3x
500	131 ms	79 ms	1.7x
1,000	262 ms	172 ms	1.5x
5,000	1,306 ms	897 ms	1.5x
10,000	2,770 ms	1,845 ms	1.5x

Full-pipeline speedup is modest (1.5x vs 2.3x) because the transform engine re-processes every alive module regardless of cache. That’s the current bottleneck, and the architecture makes it solvable: the transform is pluggable, and the summary cache already identifies exactly which modules are alive and changed. The work is bounded by O(Δ_alive), not O(n).

The next post covers what those 2KB summaries actually contain: 50KB of source compressed 25x into metadata that carries everything the graph analyzer needs, and why that compression ratio is what makes the rest of the architecture possible.