Skip to content

Benchmark Results

The benchmark synthesizes a 50,000-file TypeScript workspace nested 10 directories deep, then measures rollback cycles with process.hrtime.bigint().

Final audit results

RunnerTotal Block TimeAvg Rollback LatencySpeedup vs GitReduction
Git (reset --hard + clean -fd)34,784.070 ms3,478.407 ms1.00×0.00%
Manifest restore (dirty-set only)9.715 ms0.971 ms3,580.50×99.97%
rsync link-dest restore504.942 ms50.494 ms68.89×98.55%
tmpfs dirty-set restore (WSL2)0.634 ms0.063 ms54,851.92×100.00%

10 rollback iterations per runner, measured in WSL2 on an XFS loopback test drive for the cleanest filesystem signal.

Benchmark dashboard full

What the benchmark measures

The benchmark creates a 50,000-file fixture and simulates one agent edit cycle per iteration:

  1. Legacy Runner — mutates a tracked file, creates an untracked scratch file, then runs git reset --hard HEAD followed by git clean -fd. This is the baseline.

  2. Targeted Manifest Restore — tracks modified files in a manifest, restores only those files from a read-only base, and deletes only manifest-listed scratch files.

  3. rsync Targeted Restore — creates a linked working tree with rsync --link-dest, then restores only changed files with an rsync file list.

  4. tmpfs Dirty-Set Restore — keeps the dirty-set rollback cache in /dev/shm on Linux/WSL2 so the files the agent actually touched restore from RAM.

The metadata bottleneck

The first implementation used Linux reflinks with cp -a --reflink=always and then deleted and recloned the whole 50,000-file sandbox every turn.

Legacy Git reset average: 3,813.890 ms
Full clone/delete average: 16,332.289 ms (4.3× slower than Git)

Reflinks avoid copying file blocks, but they do not eliminate directory traversal, inode allocation, unlink work, or metadata updates. A real local agent should not throw away an entire tree when it knows which files it touched.

Hyperion’s practical optimization is targeted state reversion: track the agent’s dirty set and revert only those paths. tmpfs demonstrates the upper bound when dirty-set content and metadata operations live in RAM.

Raw evidence

Sweep: Dirty-set scaling

Proves that rollback scales with the number of files changed, not the size of the repository. 1,000-file repo, 20-15 iterations, Windows NTFS.

Dirty FilesGit ResetManifest RestoreSpeedup
1663.229 ms4.325 ms153.35×
5586.910 ms7.830 ms74.96×
10657.239 ms19.992 ms32.87×
50784.373 ms171.880 ms4.56×
100854.158 ms298.908 ms2.86×

Git reset time stays flat at ~600-850ms regardless of dirty count — it always inspects the entire working tree. Hyperion scales linearly: each additional dirty file adds roughly ~3ms.

Sweep: Repo-size independence

Proves that Hyperion’s rollback time does not grow with repository size. 10 dirty files, Windows NTFS.

Repo FilesGit ResetManifest RestoreSpeedup
1,000579.231 ms21.326 ms27.16×
5,0001,887.470 ms13.410 ms140.75×

Git reset balloons 3.3× when the repo grows from 1K to 5K files. Hyperion’s manifest restore stays flat at ~13-21ms — it only touches the 10 files in the dirty set regardless of repo size.

Sweep: Agent-search stress

Simulates MCTS-style branching where an agent explores multiple code paths simultaneously. 500-file repo, Windows NTFS.

BranchesFiles/BranchAvg Cycle
5367.757 ms
8545.422 ms
1010147.297 ms

Each cycle mutates files across all branches in sequence. An agent testing 10 alternative paths with 10 edits each completes in under 150ms — fast enough for real-time search loops.

Windows-native: No Git required

Hyperion manifest rollback on Windows without any Git dependency. 10,000-file repo, 10 dirty files, 3 iterations.

MetricValue
Synthesis (one-time)31,796 ms
Avg rollback latency62.034 ms

Compare to Git reset on the same machine at 10K files: ~570ms. Hyperion is 9.2× faster on Windows alone, with zero Git operations on the hot path.

Next steps