fast-grep — indexed regex search, 6–25× faster than ripgrep

Why this exists

Coding agents call grep on every reasoning step. Each call blocks the next thought. Across a session, slow search compounds into minutes of dead time per task — the single biggest latency leak in any agent harness.

Cursor's blog on indexed regex search lit the spark. They tied their index to git commits. We didn't: fast-grep watches any directory, kept fresh by an FS daemon, and works on non-git repos, generated artifacts, and unstaged changes. It's the fastest open-source indexed grep we've measured.

Built at Globant · github.com/gmilano/fast-grep-rust

How fast-grep searches

Five techniques combine to eliminate >99% of file I/O before the regex engine runs.

1. Sparse n-grams

Variable-length substrings whose boundaries fall on rare bigram pairs (computed from the corpus itself). Fewer, longer, more selective posting lists than fixed trigrams.

2. Position masks

Two 8-bit Bloom filters per (n-gram, document) pair encode position-mod-8 and successor character. Drops the false-positive rate to 0.42% before any I/O.

3. mmap'd posting lists

Binary index files memory-mapped at query time. 17ms cold load regardless of corpus size — the OS pages in only the lists you touch.

4. Line-level offsets

Index stores byte offsets of candidate lines, not just file IDs. Verification jumps directly to the suspicious line instead of scanning the whole file.

5. 4-byte prefix filter

Before invoking the regex engine, fast-grep checks a 4-byte content prefix per candidate line. Eliminates >95% of remaining I/O.

Deep dive: Techniques · vs ripgrep · source code

Live: how a pattern decomposes

Type a regex (or paste a real one) and see how fast-grep would decompose it into trigrams to look up in the index. Falls back to a full scan when there's no usable literal run.

Pattern:

This demo uses fixed 3-grams for clarity. The real engine uses corpus-adaptive sparse n-grams (variable length, bigram-rarity weighted) which produce fewer, more selective lookups — but the decomposition pattern matching shown here (literal vs alternation vs regex) is exactly what src/searcher.rs implements.

Indexed search pipeline

The 5-stage path a query takes through fast-grep, with realistic numbers from the Linux kernel benchmark (EXPORT_SYMBOL over 81 690 files):

1. Pattern

2. Trigrams

3. Posting lists

4. Position masks

5. Verify

Press play to start

Steps 1–4 take ≈10 ms. Step 5 (the only stage that touches actual file bytes) is where the 4-byte prefix filter eliminates >95% of the I/O. End-to-end this query returns 197 matches in 197 ms — vs 1 553 ms for ripgrep.

Architecture (C4 model)

Three zoom levels, from "where does this fit" to "what runs in the search hot path". Mermaid renders these in your browser.

Level 1 — System Context

Who uses fast-grep, and what does it talk to?

graph TB
  dev["👤 Developer
shell user"]
  agent["🤖 Coding agent
(Cursor, Claude Code, Aider)"]
  fgr(("fast-grep
CLI + optional daemon"))
  fs[(File system
any directory)]

  dev -- "fgr 'pattern' /path" --> fgr
  agent -- "tool-call: search" --> fgr
  fgr -- "mmap reads" --> fs
  fgr -- "watch + read" --> fs

Level 2 — Containers

Inside fast-grep there are two cooperating processes plus the on-disk index.

graph LR
  cli["fgr CLI
(Rust binary, single static)"]
  daemon["fgr daemon
(optional, FS-watcher)"]
  index[(Index files
mmap'd: postings, bitmaps,
lookup, meta.json)]
  fs[(File system)]

  cli -- "build / load" --> index
  cli -- "TCP localhost
(flush before search)" --> daemon
  daemon -- "incremental
updates" --> index
  daemon -- "notify::Watcher
(debounced 3s)" --> fs
  cli -- "verify reads" --> fs

Level 3 — Search Components

The internals of one indexed query, top-to-bottom in execution order.

graph TB
  cli["CLI / argument parser"]
  pat["Pattern decomposer
(extract literal runs &
alternation branches)"]
  load["Index loader
(mmap meta.json + postings)"]
  ngram["Sparse n-gram extractor"]
  posting["Posting list intersection"]
  bloom["Position-mask Bloom filter
(Blackbird locMask + nextMask)"]
  prefix["4-byte content prefix filter"]
  verify["Line-level regex verify"]
  out["Output renderer
(grouped/colored or piped)"]

  cli --> pat --> ngram --> posting --> bloom --> prefix --> verify --> out
  load -.-> posting
  load -.-> bloom

The optional Metal GPU pre-filter (macOS, opt-in via FGR_METAL=1) lives between "prefix" and "verify" and runs literal scans on candidate lines as a Metal compute kernel.

Benchmarks

Linux kernel 6.6 (81 690 files), Apple M1 Pro, warm cache. Numbers from the project's reproducible bench script.

vs ripgrep (no index)

Pattern	fast-grep	ripgrep	Speedup
`TODO`	97 ms	2 463 ms	25×
`printk`	172 ms	2 492 ms	14×
`EXPORT_SYMBOL`	197 ms	1 553 ms	8×
`container_of`	344 ms	2 440 ms	7×
`static.*inline`	394 ms	2 369 ms	6×

vs ugrep (indexed)

Pattern	fast-grep	ugrep	Speedup
`EXPORT_SYMBOL`	197 ms	1 898 ms	9.6×
`TODO`	97 ms	599 ms	6.2×
`static.*inline`	394 ms	1 595 ms	4.0×
`printk`	172 ms	645 ms	3.8×
`container_of`	344 ms	656 ms	1.9×

Index cost

Full build	~60 s (one-time)
Incremental update	<1 s for 10–100 files (75× faster than rebuild)
Index load (mmap)	17 ms
Index size	775 MB postings + 161 MB bitmaps

Testing pyramid

Where the project sits today, and where we want it to go. Numbers reflect cargo test --release output as of the latest release.

Current state

E2E / CLI snapshot tests

Integration: searcher (7) + regex correctness corpus (2)

Unit: index, sparse, trigram

Target state

What a healthy pyramid for this project looks like — gaps to fill, ranked by ROI:

Tier	Now	Target	Gap (why it matters)
Unit	33	~80	Coverage holes in `persist.rs`, `daemon.rs`, the `output_matches`/`highlight_into` renderer (the recent v0.3.1 work has zero unit coverage).
Property	0	~10	Invariants of the regex decomposer: literal runs of length ≥3 should always intersect-match the input; alternation splits should round-trip. `proptest` is the right tool.
Integration	9	~25	Daemon lifecycle (start → fs change → flush before search), incremental update against committed-then-modified files, --type filter combinations.
CLI snapshot	0	~15	Lock the user-facing TTY output (grouped/colored) and piped output (`path:line:content`) with `insta` snapshots so we don't silently regress the rendering.
Fuzz	0	1 target	`cargo-fuzz` on the verifier: random pattern + random byte buffer should never panic and should match the regex crate's own behavior.
Bench (regression)	baseline JSON	CI gate	Wire `scripts/bench.sh` into a GitHub Action that compares against `benches/baseline-v0.3.1.json` and fails the build if any pattern regresses by >15%.

Install

All channels we control. Community packaging (apt, Fedora, Arch, MacPorts, Chocolatey…) welcome — see README.

# Cargo (any platform with a Rust toolchain)
cargo install fast-grep

# Prebuilt binary via cargo-binstall
cargo binstall fast-grep

# Homebrew (macOS / Linux)
brew install gmilano/fast-grep/fast-grep

# Scoop (Windows)
scoop bucket add fast-grep https://github.com/gmilano/scoop-fast-grep
scoop install fast-grep

# Debian / Ubuntu — .deb attached to every release
curl -LO https://github.com/gmilano/fast-grep-rust/releases/latest/download/fast-grep_0.3.1-1_amd64.deb
sudo dpkg -i fast-grep_*_amd64.deb

Then:

# Build the index once (auto-built on first search if missing)
fgr index /path/to/repo --output .fgr

# Search — sub-200ms on cached queries
fgr "EXPORT_SYMBOL" /path/to/repo --index .fgr

# Watch + auto-update on file changes
fgr daemon start /path/to/repo --output .fgr