More

ipozgaj · on March 8, 2019

If you use and like `ag`, I suggest taking a look at ripgrep (`rg`). It seems to be by far the fastest out of three (`ack`, `ag`, `rg`). And it has a pretty interesting codebase (written in Rust).

Myrmornis · on March 8, 2019

If you're working in a git repository then IMO the most appropriate search tool is simply `git grep`. I don't think there's any reason to use ripgrep, ag, ack etc in that situation. (Personally, if I'm working with text files, then I'm nearly always in a git repo.)

burntsushi · on March 9, 2019

(author of ripgrep here)

Well at least one reason is because ripgrep is faster. On simple literal queries they'll have comparable speed, but beyond that, `git grep` is _a lot_ slower. Here's an example on a checkout of the Linux kernel:

    $ time rg '\w+_PM_RESUME' | wc -l
    8
    
    real    0.127
    user    0.689
    sys     0.589
    maxmem  19 MB
    faults  0
    
    $ time LC_ALL=C git grep -E '\w+_PM_RESUME' | wc -l
    8
    
    real    4.607
    user    28.059
    sys     0.442
    maxmem  63 MB
    faults  0
    
    $ time LC_ALL=en_US.UTF-8 git grep -E '\w+_PM_RESUME' | wc -l
    8
    
    real    21.651
    user    2:09.54
    sys     0.413
    maxmem  64 MB
    faults  0

ripgrep supports Unicode by default, so it's actually comparable to the LC_ALL=en_US.UTF-8 variant.

There are other reasons. It is nice to use a single tool for searching in all circumstances. ripgrep can fit that role. Maybe you don't know, but ripgrep respects your .gitignore file.

Myrmornis · on March 11, 2019

Thanks! I knew ripgrep was praised in particular for its performance but I didn't know the difference was that large. The repo I usually work in has 8.7M lines of code and I had been finding `git grep` performance very adequate (I use it in combination with the Emacs helm library where it forms part of an incremental search UI, and hence gets called multiple times in quick succession in response to changing search input.) It looks like it will be fun to try swapping in ripgrep as the helm search backend; I'll try it.

ipozgaj · on Sept 23, 2018

I wouldn’t recommend anyone to use this. Other than really poor implementation and quality of the algorithms (some of which are totally incorrect), code in that repo is anything but Pythonic - reimplementing a lot of things from the standard library, not using list, dict and set comprehensions, using indexes instead of iterators, copying things around for no reason etc. They didn’t even care to use linter to PEP8-ify it.

ipozgaj · on March 1, 2017

People commenting that $35 is too much for the content included - if you want an equivalent set of channels from Comcast/XFINITY, it will cost you almost 3x more. So it's a non brainer for me, and the fact I don't have to deal with Comcast is worth even more than saving ~60% of my monthly cable bill.

myrandomcomment · on March 1, 2017

This of course is only true if you pay for cable TV.

I have Netflix and Amazon and the reset of the internet. I had a HDHomeRun hooked to an over the air but since I moved I have not hooked that up again yet as I need a huge mast to get reception.

You can use a VPN like service to get world wide streams also.

I do pay Comcast for a business internet connection to keep away from the data caps however.

ipozgaj · on Dec 13, 2016

Not sure why are you getting downvoted, I came here to make the same comment.

QPM is a useless metric. When talking about distributed systems from engineering point of view, you always want to use QPS. QPM is simply not fined-grained enough to show whether the traffic is bursty or not. For example in this particular case, when you say 1M QPM that can mean anything - they might be idle for 50s and then get 100k QPS for the next ten seconds, or they might be getting 15k QPS all the time (like it's visible on the graph). Distributed systems are designed for the peak workload, not for the average one. Using misleading numbers like QPM leads to bad design and sizing decisions.

The only case where you would use QPM, QPD and similar metrics is when you want to artificially show your numbers bigger than they are (10M transactions a day sounds better than 115 transactions a second). But those should be used by sales, not by engineers.

sethammons · on Dec 13, 2016

I read it originally as 1M QPS, and thought that was a nice number. It was upon further inspection that I saw it was 1M QPM, and I was no longer intrigued.

Kiro · on Dec 13, 2016

What kind of stuff do you build where you have 1M QPS?

sethammons · on Dec 13, 2016

I'm not sure if my org qualifies. Depends on how you count it I guess. We have 80k+ RPS at times at SendGrid, and each request can generate 4 to 8 external events and at least a dozen internal API calls. If you count total internal QPS, that would be something in the order of 80k * 4 * 12 ~= 3.8M QPS. I'd have to check with an operations person to see if that checks out. I don't know if it is fair to count this though. So, let's go back to the 80k RPS. If someone was doing 10X that, I'd be intrigued to learn more about their set up for sure. I imagine the Googles, Facebooks, and Amazons of our industry do this level of traffic.

ipozgaj · on Sept 9, 2016

It's basically the Sieve of Eratosthenes[1] algorithm, and it's possible to do it with regular expressions because numbers here are represented in unary[2] number system, where number of characters/tokens equals the number itself. It's a common trick for testing various Turing-machine stuff.

[1] https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes [2] https://en.wikipedia.org/wiki/Unary_numeral_system

conistonwater · on Sept 9, 2016

No it's not, this is the trial division algorithm for primality testing.

ipozgaj · on Sept 7, 2016

You don't need grep for that, ss can already filter by state on its own. Use: "ss -o state LISTENING"

throwaway2016a · on Sept 7, 2016

Sometimes I find typing grep/cut/awk/etc to be easier to remember than custom flags and thus faster to type. Often times my time spent looking through the man page is better spent just writing a more verbose command line.

saynsedit · on Sept 7, 2016

+1. You can see the same effect in natural language as modern English has fewer tenses and declensions and makes heavier use of helper words, as contrasted with olde English. Same with Latin vs modern romance.

ipozgaj · on Aug 11, 2016

For a moment I thought I was reading Buzzfeed, not the Economist.

ipozgaj · on July 8, 2016

Because the whole story was bs? Zuckerberg himself commented on that https://www.facebook.com/zuck/posts/10100828955847631

djsumdog · on July 10, 2016

I agree. I hate how everyone trusted the Snowden story without question. It's pretty much US sponsored propaganda. I doubt Snowden is even in Russia.

dandelion_lover · on July 8, 2016

Do you honestly believe in his words?

ipozgaj · on June 21, 2016

One thing that I was pretty annoyed about while testing (server) beta and alpha was Cockpit web UI that is enabled by default. I know it's easy to disable it with `systemctl disable cocpit.socket`, but if you select "minimal/base install" you shouldn't get a full blown web UI management console installed and enabled by default.

ohmarmalade · on June 22, 2016

I'm interested in this issue. As I understand it, Cockpit shouldn't be included in a minimal install. It is included in the default Fedora Server install.

If you like, join us on IRC in #cockpit on FreeNode, and we can work through this there.

catern · on June 21, 2016

It's installed by default but not running by default. So the only resource it consumes is a small amount of disk space, plus one socket so it autostarts if you try to use it (by connecting to example.com:8888 or whatever it is).

emidln · on June 21, 2016

If it's listening on a socket and spins up on a request sent to the port that socket is bound on, for all intents and purposes, it is running and enabled. This is the same behavior as old inetd-based servers.

ipozgaj · on June 21, 2016

Correct, if you try to open http://<hostname>:9090/ it will get automatically started. I'm not concerned about the resources it uses, I just don't like having services that I don't use installed and listening on ports in minimal install, especially on servers.

surge · on June 21, 2016

Isn't auto starting if someone hits the port effectively the same as having it running. At least from a security standpoint.

nothrabannosir · on June 21, 2016

Why did this get down voted ? Is it not true?

ipozgaj · on April 17, 2016

Even if API gave you strong durability guarantees, it still wouldn't mean much. Disk caches, big enterprise SAN attached storages etc, they can also "cheat", saying they flushed the cache while they actually didn't.

yason · on April 17, 2016

The API allows blaming the right cog in the machinery. Now everyone gets a Get Out Of Jail Free card because they can blame the kernel, filesystem drivers, userspace libraries, application developers, disk controllers, and whatnot, and thus nothing forces a single direction in that cyclic graph of blame.