fast-grep

Indexed regex search for codebases. 6–25× faster than ripgrep, sub-200ms per query, git-independent. Built so coding agents stop waiting for grep.

crates.io release MIT

Why this exists

Coding agents call grep on every reasoning step. Each call blocks the next thought. Across a session, slow search compounds into minutes of dead time per task — the single biggest latency leak in any agent harness.

Cursor's blog on indexed regex search lit the spark. They tied their index to git commits. We didn't: fast-grep watches any directory, kept fresh by an FS daemon, and works on non-git repos, generated artifacts, and unstaged changes. It's the fastest open-source indexed grep we've measured.

Built at Globant · github.com/gmilano/fast-grep-rust

How fast-grep searches

Five techniques combine to eliminate >99% of file I/O before the regex engine runs.

1. Sparse n-grams

Variable-length substrings whose boundaries fall on rare bigram pairs (computed from the corpus itself). Fewer, longer, more selective posting lists than fixed trigrams.

2. Position masks

Two 8-bit Bloom filters per (n-gram, document) pair encode position-mod-8 and successor character. Drops the false-positive rate to 0.42% before any I/O.

3. mmap'd posting lists

Binary index files memory-mapped at query time. 17ms cold load regardless of corpus size — the OS pages in only the lists you touch.

4. Line-level offsets

Index stores byte offsets of candidate lines, not just file IDs. Verification jumps directly to the suspicious line instead of scanning the whole file.

5. 4-byte prefix filter

Before invoking the regex engine, fast-grep checks a 4-byte content prefix per candidate line. Eliminates >95% of remaining I/O.

Deep dive: Techniques · vs ripgrep · source code

Live: how a pattern decomposes

Type a regex (or paste a real one) and see how fast-grep would decompose it into trigrams to look up in the index. Falls back to a full scan when there's no usable literal run.

This demo uses fixed 3-grams for clarity. The real engine uses corpus-adaptive sparse n-grams (variable length, bigram-rarity weighted) which produce fewer, more selective lookups — but the decomposition pattern matching shown here (literal vs alternation vs regex) is exactly what src/searcher.rs implements.

Indexed search pipeline

The 5-stage path a query takes through fast-grep, with realistic numbers from the Linux kernel benchmark (EXPORT_SYMBOL over 81 690 files):

1. Pattern
2. Trigrams
3. Posting lists
4. Position masks
5. Verify
Press play to start

Steps 1–4 take ≈10 ms. Step 5 (the only stage that touches actual file bytes) is where the 4-byte prefix filter eliminates >95% of the I/O. End-to-end this query returns 197 matches in 197 ms — vs 1 553 ms for ripgrep.

Architecture (C4 model)

Three zoom levels, from "where does this fit" to "what runs in the search hot path". Mermaid renders these in your browser.

Level 1 — System Context

Who uses fast-grep, and what does it talk to?

graph TB
  dev["👤 Developer
shell user"] agent["🤖 Coding agent
(Cursor, Claude Code, Aider)"] fgr(("fast-grep
CLI + optional daemon")) fs[(File system
any directory)] dev -- "fgr 'pattern' /path" --> fgr agent -- "tool-call: search" --> fgr fgr -- "mmap reads" --> fs fgr -- "watch + read" --> fs

Level 2 — Containers

Inside fast-grep there are two cooperating processes plus the on-disk index.

graph LR
  cli["fgr CLI
(Rust binary, single static)"] daemon["fgr daemon
(optional, FS-watcher)"] index[(Index files
mmap'd: postings, bitmaps,
lookup, meta.json)] fs[(File system)] cli -- "build / load" --> index cli -- "TCP localhost
(flush before search)" --> daemon daemon -- "incremental
updates" --> index daemon -- "notify::Watcher
(debounced 3s)" --> fs cli -- "verify reads" --> fs

Level 3 — Search Components

The internals of one indexed query, top-to-bottom in execution order.

graph TB
  cli["CLI / argument parser"]
  pat["Pattern decomposer
(extract literal runs &
alternation branches)"] load["Index loader
(mmap meta.json + postings)"] ngram["Sparse n-gram extractor"] posting["Posting list intersection"] bloom["Position-mask Bloom filter
(Blackbird locMask + nextMask)"] prefix["4-byte content prefix filter"] verify["Line-level regex verify"] out["Output renderer
(grouped/colored or piped)"] cli --> pat --> ngram --> posting --> bloom --> prefix --> verify --> out load -.-> posting load -.-> bloom

The optional Metal GPU pre-filter (macOS, opt-in via FGR_METAL=1) lives between "prefix" and "verify" and runs literal scans on candidate lines as a Metal compute kernel.

Benchmarks

Linux kernel 6.6 (81 690 files), Apple M1 Pro, warm cache. Numbers from the project's reproducible bench script.

vs ripgrep (no index)

Patternfast-grepripgrepSpeedup
TODO97 ms2 463 ms25×
printk172 ms2 492 ms14×
EXPORT_SYMBOL197 ms1 553 ms
container_of344 ms2 440 ms
static.*inline394 ms2 369 ms

vs ugrep (indexed)

Patternfast-grepugrepSpeedup
EXPORT_SYMBOL197 ms1 898 ms9.6×
TODO97 ms599 ms6.2×
static.*inline394 ms1 595 ms4.0×
printk172 ms645 ms3.8×
container_of344 ms656 ms1.9×

Index cost

Full build~60 s (one-time)
Incremental update<1 s for 10–100 files (75× faster than rebuild)
Index load (mmap)17 ms
Index size775 MB postings + 161 MB bitmaps

Testing pyramid

Where the project sits today, and where we want it to go. Numbers reflect cargo test --release output as of the latest release.

Current state

0
E2E / CLI snapshot tests
9
Integration: searcher (7) + regex correctness corpus (2)
33
Unit: index, sparse, trigram

Target state

What a healthy pyramid for this project looks like — gaps to fill, ranked by ROI:

TierNowTargetGap (why it matters)
Unit 33 ~80 Coverage holes in persist.rs, daemon.rs, the output_matches/highlight_into renderer (the recent v0.3.1 work has zero unit coverage).
Property 0 ~10 Invariants of the regex decomposer: literal runs of length ≥3 should always intersect-match the input; alternation splits should round-trip. proptest is the right tool.
Integration 9 ~25 Daemon lifecycle (start → fs change → flush before search), incremental update against committed-then-modified files, --type filter combinations.
CLI snapshot 0 ~15 Lock the user-facing TTY output (grouped/colored) and piped output (path:line:content) with insta snapshots so we don't silently regress the rendering.
Fuzz 0 1 target cargo-fuzz on the verifier: random pattern + random byte buffer should never panic and should match the regex crate's own behavior.
Bench (regression) baseline JSON CI gate Wire scripts/bench.sh into a GitHub Action that compares against benches/baseline-v0.3.1.json and fails the build if any pattern regresses by >15%.

Install

All channels we control. Community packaging (apt, Fedora, Arch, MacPorts, Chocolatey…) welcome — see README.

# Cargo (any platform with a Rust toolchain)
cargo install fast-grep

# Prebuilt binary via cargo-binstall
cargo binstall fast-grep

# Homebrew (macOS / Linux)
brew install gmilano/fast-grep/fast-grep

# Scoop (Windows)
scoop bucket add fast-grep https://github.com/gmilano/scoop-fast-grep
scoop install fast-grep

# Debian / Ubuntu — .deb attached to every release
curl -LO https://github.com/gmilano/fast-grep-rust/releases/latest/download/fast-grep_0.3.1-1_amd64.deb
sudo dpkg -i fast-grep_*_amd64.deb

Then:

# Build the index once (auto-built on first search if missing)
fgr index /path/to/repo --output .fgr

# Search — sub-200ms on cached queries
fgr "EXPORT_SYMBOL" /path/to/repo --index .fgr

# Watch + auto-update on file changes
fgr daemon start /path/to/repo --output .fgr