Does finding the number of unique elements in a set actually require comparison of each element with everything else? Can't you use a hashtable? For every element, add it to the table (ignore if already exists), and finally, take a count of keys.
Imagine 1PB of data and you expect 30% of it to be unique. That needs 300TB RAM to store unique elements. Keep in mind the values in a hash table are the elements themselves, so a hastable of perhaps 300TB. Doing that without that much RAM, even swapping to disk can be tough.
Using a hashtable is the "normal" approach mentioned in the article. It works of course, but requires memory to store each unique element (or their hashes). If you have less memory available, the described algorithm can still give a very good approximation.
Using a hashtable is effective because you only compare elements within their hash buckets, not the entire set. However, they can become inefficient with very large datasets due to memory usage and processing time, which is where approximate counts shine.
This algorithm is still spinning a lot of random. I would guess that this is much less overhead than hashing but still seems like it could be significant.
That is fine when you have say 1 million values and only 1000 are unique.
But when you have 1 million values and about 900 thousand are unique you are putting more or less the whole data set into memory.
options if you are concerned about journald overusing tmpfs. But yes, it seems unconditional that it writes the logs always somewhere, which is not unreasonable design imho.
NASA knew that people new that planetariums existed, so they didn't paint a realistic sky because people would think they used a planetarium. Game theory!
It's designed with the luxury of hindsight on longtime-existing ISAs. Avoiding many pitfalls in those. While not attempting to innovate in ways that may or may not work out.
Also the base ISA is very implementer-friendly. As in: requiring few transistors / FPGA LEs, (relatively) easy to write a compiler or emulator for, etc. But that is hardly unique.
32b and 64b flavours very similar. Oh and... modular.
That doesn't make it 'better' though. Eg. x86 has a looott of legacy cruft. But also a looott of high-quality software for it. RISC-V: many of those tools are still being written / adapted / optimized. Likewise, x86 & ARM have many high-performance, efficient and/or low-cost implementations. RISC-V is catching up quickly, but not (yet) head-to-head with those.
ISAs are complex things so you can't say one is technically superior to the other.
I would say you're right though in that RISCV enjoys the success it's seeing due to the open specification and licensing model. People generally aren't drawn to RISCV because of technical innovation.
The base specification (IMAFDC) has little to no innovation, simply avoiding the mistakes of the past. We've got 60 years of experience with RISC-style instruction sets, so that's about consolidation not innovation.
However RISC-V is an excellent base upon which to innovate. You can see that in things such as the Vector extension, the memory model developed by industry and academic experts world-wide, and CHERI fine-grained memory-protection.
That's largely because if you base a product on Arm or MIPS you have the choice of getting them to actively invest in and support you, or getting sued into oblivion by them.
THAT is why RISC-V is the most friendly ISA to innovation and where most future innovation will happen. Because innovation comes not only from internally inside Intel or Arm or MIPS (who have switched to RISC-V now anyway) but from a myriad of possible sources.
Superior in what way? Is English superior to French? I think you can have equally good implementations of modern CPUs regardless of the ISA. The end result is trillions of logical gates that working together will store, add, subtract, multiply at a rate of several millions per second. Logical gates don't care what language you speak to them as long as they understand.
It's waaay more modern than other ISAs in its design. It can scale from microcontroller to simple CPU to GPU-like vector processing to very powerful CPUs without having to add thousands of CISC instructions.
E.g. The ISA is modular. You can use the RV64GC set of instructions to implement a very basic Linux-capable CPU that executes one instruction at a time.
Then you can build an advanced CPU that does OOO and instruction compression and run the same binary *efficiently*.
But some random word in its response can trigger an idea in your mind. Getting an idea from a conversation is not always about getting it directly. It's already in you and you just wanted a trigger.
Some of these GPT engines maintain their own vector DB to do semantic search, others are directly hooked into Bing / Google. So pubmedisearch.com would be one component of a GPT-based engine. We actually have a GPT-based engine here: https://medisearch.io/.