Show HN: A somewhat faster, more lightweight, ripgrep-inspired alternative

athorax · on Sept 1, 2024

I appreciate including the caveat about this being circumstantially faster, but do you have the benchmarks for the usage cases this performs better than rip-grep?

alexpasmantier · on Sept 2, 2024

Hi Thanks for your comment. I uploaded a couple of tests using `hyperfine` to show cases where it might be faster. Will put in more work to do a proper benchmarking session in the days to come.

thealistra · on Sept 2, 2024

Would be good to also do plain text benchmarks, not simple regexes. As I assume this might be the best optimized case

alexpasmantier · on Sept 2, 2024

ericyd · on Sept 2, 2024

I understand from other comments that this was started as a learning project, but I gotta say I can't imagine a case where ripgrep wouldn't be fast enough for my use case. Maybe I'm not enough of a power user. Totally fine to have multiple players in the pattern matching CLI space of course, but performance alone would not convince me to switch

burntsushi · on Sept 2, 2024

ripgrep author here. I'm not sure I totally get the motivation here to be honest. It is certainly more lightweight in the sense that it has fewer features, but it actually has more dependencies than ripgrep and takes about as long to compile (from scratch) on my system. Also, the flags that it does support are overriding long-held custom that are likely to be confusing to users. For example, -f doesn't read patterns from a file. (ripgrep does this to an extent as well, for example, -I/--no-filename instead of grep's -h/--no-filename to allow `rg -h` to adhere to an even stronger custom: show the help output.)

It's also pretty annoying to share screenshots of benchmarks instead of just showing a simple copyable command with a paste of the results.

I also can't quite reproduce at least the curl benchmark:

    $ hyperfine "rg '[A-Z]+_NOBODY' ." "gg '[A-Z]+_NOBODY'" "grep -rE '[A-Z]+_NOBODY' ."
    Benchmark 1: rg '[A-Z]+_NOBODY' .
      Time (mean ± σ):       9.7 ms ±   0.9 ms    [User: 17.1 ms, System: 16.0 ms]
      Range (min … max):     7.4 ms …  13.1 ms    289 runs

    Benchmark 2: gg '[A-Z]+_NOBODY'
      Time (mean ± σ):      13.3 ms ±   1.1 ms    [User: 19.9 ms, System: 13.2 ms]
      Range (min … max):    10.8 ms …  16.3 ms    211 runs

    Benchmark 3: grep -rE '[A-Z]+_NOBODY' .
      Time (mean ± σ):      40.2 ms ±   3.1 ms    [User: 24.3 ms, System: 15.7 ms]
      Range (min … max):    36.4 ms …  49.3 ms    75 runs

    Summary
      rg '[A-Z]+_NOBODY' . ran
        1.36 ± 0.17 times faster than gg '[A-Z]+_NOBODY'
        4.13 ± 0.50 times faster than grep -rE '[A-Z]+_NOBODY' .

But the times here are so fast that this is likely not the most reliable of benchmarks. Doing it on a bigger repo gives a better sense I think:

    $ git remote -v
    origin  git@github.com:nwjs/chromium.src (fetch)
    origin  git@github.com:nwjs/chromium.src (push)
    $ git rev-parse HEAD
    1e57811fe4583ac92d2f277837718486fbb98252
    $ hyperfine "rg -p Openbox ." "gg Openbox ."
    Benchmark 1: rg -p Openbox .
      Time (mean ± σ):     317.4 ms ±   6.6 ms    [User: 1327.6 ms, System: 2335.4 ms]
      Range (min … max):   308.6 ms … 326.3 ms    10 runs

    Benchmark 2: gg Openbox .
      Time (mean ± σ):     734.5 ms ±  13.0 ms    [User: 1336.3 ms, System: 1567.0 ms]
      Range (min … max):   718.1 ms … 756.1 ms    10 runs

    Summary
      rg -p Openbox . ran
        2.31 ± 0.06 times faster than gg Openbox .

I tried other queries. For example, `gg '\w'`, to get a sense of whether the simpler grep implementation is better at dealing with match overhead. But I get a panic on this line in `gg`[1]. It looks like it's assuming that the `ArrayQueue` it uses is never full?

Even with the failure though, we can look at its perf on a checkout of the Linux kernel:

    $ hyperfine -i "rg '\w' ." "gg '\w' ."
    Benchmark 1: rg '\w' .
      Time (mean ± σ):     266.2 ms ±   6.5 ms    [User: 2168.8 ms, System: 807.6 ms]
      Range (min … max):   254.8 ms … 275.4 ms    11 runs

    Benchmark 2: gg '\w' .
      Time (mean ± σ):      1.043 s ±  0.086 s    [User: 3.569 s, System: 0.260 s]
      Range (min … max):    0.904 s …  1.150 s    10 runs

      Warning: Ignoring non-zero exit code.

    Summary
      rg '\w' . ran
        3.92 ± 0.34 times faster than gg '\w' .

I tried finding other cases where `gg` is meaningfully faster, but I didn't turn anything up.

Now, grip-grab is using the same libraries as ripgrep. So why doesn't it have the same performance profile as ripgrep? That is harder to answer, but it's likely not using the libraries in the best way possible. That's largely my failing, since the libraries are poorly documented, complex and sprawling.

[1]: https://github.com/alexpasmantier/grip-grab/blob/92cc5f6dc2f...

oguz-ismail · on Sept 2, 2024

> I'm not sure I totally get the motivation here to be honest.

Who says there has to be one? What's wrong with competition?

burntsushi · on Sept 2, 2024

I didn't say there had to be one. I didn't imply there had to be one. And nowhere did I say, imply or even hint at anything wrong with competition.

oguz-ismail · on Sept 2, 2024

[flagged]

burntsushi · on Sept 2, 2024

Yeah my agenda is correcting or clarifying misleading claims about software I've written. Competition is otherwise great! I think it's a good thing to have more choices.

alexpasmantier · on Sept 2, 2024

@burntsushi

Hi! First of all, thank you for taking the time to write this. I've been using ripgrep for quite some time, and it's an amazing piece of software. Having your comment here is truly an honor.

> I'm not sure I totally get the motivation here to be honest

This is primarily a small project I started to familiarize myself with Rust. I thought that exploring the basics of ripgrep and attempting to build something similar would be a good way to get started.

> Also, the flags that it does support are overriding long-held custom that are likely to be confusing to users

Noted. I'll consider making these changes to avoid potentially confusing anyone.

> It's also pretty annoying to share screenshots of benchmarks instead of just showing a simple copyable command with a paste of the results.

I've updated the documentation with the actual commands and included a copy of the results.

> I also can't quite reproduce at least the curl benchmark

I just ran the curl benchmark again on the same machine (my work laptop, an M3 Apple MacBook), and here are the results:

  $ hyperfine "rg '[A-Z]+_NOBODY' ." "gg '[A-Z]+_NOBODY'" "ggrep -rE '[A-Z]+_NOBODY' ."

  Benchmark 1: rg '[A-Z]+_NOBODY' .
     Time (mean ± σ):      38.5 ms ±   2.2 ms    [User: 18.1 ms, System: 207.3 ms]
     Range (min … max):    33.8 ms …  42.8 ms    72 runs
  
  Benchmark 2: gg '[A-Z]+_NOBODY'
     Time (mean ± σ):      21.8 ms ±   0.8 ms    [User: 15.4 ms, System: 53.1 ms]
     Range (min … max):    20.2 ms …  23.8 ms    115 runs
  
  Benchmark 3: ggrep -rE '[A-Z]+_NOBODY' .
     Time (mean ± σ):      73.3 ms ±   0.9 ms    [User: 26.5 ms, System: 45.7 ms]
     Range (min … max):    70.8 ms …  75.6 ms    41 runs
  
  Summary
     gg '[A-Z]+_NOBODY' ran
       1.77 ± 0.12 times faster than rg '[A-Z]+_NOBODY' .
       3.36 ± 0.13 times faster than ggrep -rE '[A-Z]+_NOBODY' .

> It looks like it's assuming that the `ArrayQueue` it uses is never full?

I used a default maximum size for the queue (configurable via the --max-results argument) to pre-allocate it, as I thought this might improve performance. However, I'm currently not handling errors properly and just allowing the program to panic when the number of results exceeds the set limit.

> So why doesn't it have the same performance profile as ripgrep?

Given the differences in execution times between our benchmarks, I suspect that because ripgrep's (and, by extension, gg's) performance bottleneck is primarily disk I/O, variations in filesystems and underlying storage hardware could explain the significantly different results we're observing. What do you think?

burntsushi · on Sept 2, 2024

It's not disk I/O because we're using hyperfine for measuring. It does warm-up runs first, and unless your machine has a teeny amount of RAM, everything is in cache. You can put your corpus on a ramdisk (usually `/tmp` is on Linux and I believe always `/dev/shm`, IDK about macOS) to verify this.

Since you're running on macOS, I'll do the same. I have an M2 mac mini. My previous benchmarks were on my Linux workstation. Your `curl` benchmark:

    $ hyperfine "rg '[A-Z]+_NOBODY' ." "gg '[A-Z]+_NOBODY'"
    Benchmark 1: rg '[A-Z]+_NOBODY' .
      Time (mean ± σ):      20.3 ms ±   0.7 ms    [User: 18.6 ms, System: 96.0 ms]
      Range (min … max):    18.4 ms …  21.3 ms    126 runs

    Benchmark 2: gg '[A-Z]+_NOBODY'
      Time (mean ± σ):      17.9 ms ±   0.7 ms    [User: 15.6 ms, System: 38.6 ms]
      Range (min … max):    17.0 ms …  19.9 ms    141 runs

    Summary
      gg '[A-Z]+_NOBODY' ran
        1.13 ± 0.06 times faster than rg '[A-Z]+_NOBODY' .

So slightly edged out by `gg` here, but not as big of a difference as you're seeing. What version of ripgrep are you using?

Also, as I said before, these times are pretty short. Try a bigger corpus. For example, in my clone of Linux (also on my M2 mac mini):

    $ git remote -v
    origin  git@github.com:BurntSushi/linux (fetch)
    origin  git@github.com:BurntSushi/linux (push)

    $ git rev-parse HEAD
    84e57d292203a45c96dbcb2e6be9dd80961d981a

    $ hyperfine "rg '[A-Z]+_NOBODY' ." "gg '[A-Z]+_NOBODY'"
    Benchmark 1: rg '[A-Z]+_NOBODY' .
      Time (mean ± σ):     343.3 ms ±   4.2 ms    [User: 359.3 ms, System: 2243.3 ms]
      Range (min … max):   339.0 ms … 352.7 ms    10 runs

    Benchmark 2: gg '[A-Z]+_NOBODY'
      Time (mean ± σ):     351.1 ms ±   4.6 ms    [User: 326.4 ms, System: 1059.1 ms]
      Range (min … max):   348.2 ms … 363.8 ms    10 runs

    Summary
      rg '[A-Z]+_NOBODY' . ran
        1.02 ± 0.02 times faster than gg '[A-Z]+_NOBODY'

It is very interesting that the differences are almost zero on macOS but quite a bit bigger on Linux. That might be worth investigating.

IMO, if you're advertising "circumstantially faster than ripgrep," then you should be able to characterize the circumstances in which that occurs.

burntsushi · on Sept 2, 2024

Oh... I see the problem. It's probably the thread heuristic. When running gg and rg, make sure -T and -j, respectively, are set to the same number. Because I think gg always defaults to `4`. Where as ripgrep is probably defaulting to a higher number. On very small corpora, like curl, this can actually lead to overall slower times due to the overhead of starting the threads.

This also explains why the times are faster on Linux. My Linux workstation has a lot more CPUs than my M2 mac mini. My mac mini has 8 logical CPUs while my Linux box has 24. ripgrep won't necessarily start one thread per core, but at 8 cores, it will indeed start one thread per core. Where as gg will start 4. You can see ripgrep's heuristic here: https://github.com/BurntSushi/ripgrep/blob/e0f1000df67f82ab0...

I suppose thread count heuristics are fair game for benchmarks, but in order to measure those better, you need a bigger variety of corpus sizes. Even with the Linux kernel, the difference between 4 and 8 threads for `gg` is not that big:

    $ hyperfine "gg -T4 '[A-Z]+_NOBODY'" "gg -T8 '[A-Z]+_NOBODY'"
    Benchmark 1: gg -T4 '[A-Z]+_NOBODY'
      Time (mean ± σ):     364.3 ms ±   2.5 ms    [User: 331.1 ms, System: 1108.6 ms]
      Range (min … max):   360.8 ms … 369.1 ms    10 runs

    Benchmark 2: gg -T8 '[A-Z]+_NOBODY'
      Time (mean ± σ):     349.3 ms ±   3.1 ms    [User: 454.2 ms, System: 2056.2 ms]
      Range (min … max):   345.4 ms … 355.8 ms    10 runs

    Summary
      gg -T8 '[A-Z]+_NOBODY' ran
        1.04 ± 0.01 times faster than gg -T4 '[A-Z]+_NOBODY'

But go to a bigger corpus and a difference becomes much more apparent:

    $ hyperfine "gg -T4 '[A-Z]+_NOBODY'" "gg -T8 '[A-Z]+_NOBODY'"
    Benchmark 1: gg -T4 '[A-Z]+_NOBODY'
      Time (mean ± σ):     16.777 s ±  0.351 s    [User: 1.868 s, System: 12.301 s]
      Range (min … max):   16.376 s … 17.396 s    10 runs

    Benchmark 2: gg -T8 '[A-Z]+_NOBODY'
      Time (mean ± σ):     10.273 s ±  0.628 s    [User: 1.931 s, System: 12.215 s]
      Range (min … max):    8.980 s … 11.066 s    10 runs

    Summary
      gg -T8 '[A-Z]+_NOBODY' ran
        1.63 ± 0.11 times faster than gg -T4 '[A-Z]+_NOBODY'

This is on a checkout of the Chromium repository.

The increased variety of benchmarks is important here because you might have a simpler heuristic for thread count that does result in overall marginally faster times in some cases, but this obscures what you're giving up: substantially slower times in other cases. Moreover, the cases where 4 versus 8 threads results in faster times for 4 threads tend to have very small absolute differences. i.e., Not hugely perceptible by humans.

alexpasmantier · on Sept 2, 2024

Ahh! Great catch, and thanks for taking the time to put that in writing.

I did set gg to default to 4 threads, which seemed to be the optimal number on my machine for the typical repo sizes I navigate daily. Increasing the number of threads beyond that often results in unnecessary overhead for my personal use cases.

I appreciate you pointing out the heuristic used in the ripgrep project. From what I understand, it also uses a fixed, machine-dependent number of threads, predetermined regardless of the task at hand (except for single-file tasks).

This is something I was curious about while writing the code but couldn't fully answer due to my limited knowledge of the subject: could we potentially use a filesystem-specific heuristic to estimate the workload and dynamically adjust the number of threads accordingly?

What I mean is a method, perhaps within the ignore crate, to estimate the amount of data to process—such as the number of files, file sizes, or number of lines—based on easily and cheaply accessible filesystem metadata.

burntsushi · on Sept 2, 2024

I'm not aware of one. Any tool that tells you disk space has to actually crawl the directory tree to report it. But that is precisely the thing we want to parallelize.

The only other option I can think of is to dynamically adjust. Maybe after a certain amount of work has completed, spin up more threads. But I'm not sure it's worth doing.

alexpasmantier · on Sept 2, 2024

Looking at inode metadata—specifically the number of links for directory nodes—might iteratively provide a one-step-ahead view of what's left to crawl, allowing for preemptive thread adjustments during recursion.

e.g. looking at the Links: 101 metadata on the `curl` codebase for src:

  $ stat -x src
  
    File: "src"
    Size: 3232         FileType: Directory
    Mode: (0755/drwxr-xr-x)         Uid: (  501/    alex)  Gid: (   20/   staff)
  Device: 1,22   Inode: 5857579    Links: 101
  Access: Tue Aug 27 22:21:23 2024
  Modify: Tue Aug 27 22:21:19 2024
  Change: Tue Aug 27 22:21:19 2024
   Birth: Tue Aug 27 22:21:19 2024

But then that still involves dynamically adjusting and might be kind of overkill for a relatively uncertain benefit...

yoavm · on Sept 2, 2024

The video on the readme doesn't work here on Firefox, Linux. I suspect it's because it's a mov file.

alexpasmantier · on Sept 2, 2024

Changed to mp4, should work now.

alexpasmantier · on Sept 2, 2024

Thanks for the feedback, will update the video!

VeejayRampay · on Sept 2, 2024

those rust utils are always so fast, it's insane

i find that they're also often associated with higher quality READMEs for some reason...

johnisgood · on Sept 1, 2024

Benchmarks? In what cases is it more performant?

alexpasmantier · on Sept 2, 2024

Hi Thanks for your comment. I uploaded a couple of tests using `hyperfine` to show cases where it might be faster. Will put in more work to do a proper benchmarking session in the days to come.

johnisgood · on Sept 2, 2024

Thank you for having added those!

What makes it circumstantially more performant, by the way?

Off-topic, but consider changing "circumstancially" to "circumstantially" in the README; the latter is the correct term.

alexpasmantier · on Sept 2, 2024

> What makes it circumstantially more performant, by the way?

The thread above might help provide the start of an answer.

> Off-topic, but consider changing "circumstancially" to "circumstantially" in the README; the latter is the correct term.

Done, thanks for spotting the typo.

johnisgood · on Sept 2, 2024

Oh okay, thank you, and you're welcome. :)