On Mon, Jan 25, 2021 at 12:42 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> Tangent, but: you might be interested in fd-find, which is an
> implementation of  (paraphrasing here) about 80% of find that's usually
> several times faster.

Any hint *why* it's faster?

At this point I'm just passing along what I've read on the project pages, but it claims most of the speed comes from leveraging the speed efforts of ripgrep's component pieces (it's rust-based regex and ignore packages). There are two relevant chunks from ripgrep's project page; a blog post with details and benchmarks, and a summary on ripgrep's project page.

Blog post: https://blog.burntsushi.net/ripgrep/

Summarizing, ripgrep is fast because:
  • It is built on top of Rust's regex engine. Rust's regex engine uses finite automata, SIMD and aggressive literal optimizations to make searching very fast. (PCRE2 support can be opted into with the -P/--pcre2 flag.)
  • Rust's regex library maintains performance with full Unicode support by building UTF-8 decoding directly into its deterministic finite automaton engine.
  • It supports searching with either memory maps or by searching incrementally with an intermediate buffer. The former is better for single files and the latter is better for large directories. ripgrep chooses the best searching strategy for you automatically.
  • Applies your ignore patterns in .gitignore files using a RegexSet. That means a single file path can be matched against multiple glob patterns simultaneously.
  • It uses a lock-free parallel recursive directory iterator, courtesy of crossbeam and ignore

Hope that helps,
~Chad