More

drycabinet · on May 17, 2024

Does finding the number of unique elements in a set actually require comparison of each element with everything else? Can't you use a hashtable? For every element, add it to the table (ignore if already exists), and finally, take a count of keys.

ReleaseCandidat · on May 17, 2024

> Does finding the number of unique elements in a set

That's easy, that's all of them. Sorry, could not resist.

Yes, hashing is the usual method. In a sorted list you can compare to the following element.

seadan83 · on May 17, 2024

Imagine 1PB of data and you expect 30% of it to be unique. That needs 300TB RAM to store unique elements. Keep in mind the values in a hash table are the elements themselves, so a hastable of perhaps 300TB. Doing that without that much RAM, even swapping to disk can be tough.

jangxx · on May 17, 2024

Using a hashtable is the "normal" approach mentioned in the article. It works of course, but requires memory to store each unique element (or their hashes). If you have less memory available, the described algorithm can still give a very good approximation.

Anduia · on May 17, 2024

Using a hashtable is effective because you only compare elements within their hash buckets, not the entire set. However, they can become inefficient with very large datasets due to memory usage and processing time, which is where approximate counts shine.

datavirtue · on May 17, 2024

This algorithm is still spinning a lot of random. I would guess that this is much less overhead than hashing but still seems like it could be significant.

theginger · on May 17, 2024

That is fine when you have say 1 million values and only 1000 are unique. But when you have 1 million values and about 900 thousand are unique you are putting more or less the whole data set into memory.

xwolfi · on May 17, 2024

Imagine a million elements. How big must your hashtable be ? The article explains it very well, did you miss it ? It's a way to save memory.

But to be honest I implemented it, ran it on Hamlet, and it's very wrong, it's barely useful but maybe if you just need a vague idea...

cb321 · on May 17, 2024

How big was your thresh? I found it pretty accurate: https://news.ycombinator.com/item?id=40388878

drycabinet · on May 15, 2024

That part of the man page sounds confusing. Is it unconditional and enabled by default?

zokier · on May 16, 2024

You have also

     RuntimeMaxUse=, RuntimeKeepFree=, RuntimeMaxFileSize=, RuntimeMaxFiles=

options if you are concerned about journald overusing tmpfs. But yes, it seems unconditional that it writes the logs always somewhere, which is not unreasonable design imho.

ajb · on May 16, 2024

Yes - unconditional by default, to be clear. I think if you set storage to 'none' it will switch it off completely

drycabinet · on May 10, 2024

NASA knew that people new that planetariums existed, so they didn't paint a realistic sky because people would think they used a planetarium. Game theory!

drycabinet · on May 10, 2024

Wait until you switch to unions in rust and ask yourself whether it is a union or a struct.

Alifatisk · on May 10, 2024

Oh dear

drycabinet · on May 10, 2024

Wikipedia has something interesting on this (how unions can be implemented using "class hierarchy in object-oriented programming"): https://en.wikipedia.org/wiki/Tagged_union#Class_hierarchies...

There is a lengthy blog post about the same stuff, except that the author doesn't seem to have come across the said wiki section yet: https://nandakumar.org/blog/2023/12/paradigms-in-disguise.ht...

Kudos to the dev of datatype99 for showing the problem with such ad-hoc methods in the readme right away.

drycabinet · on April 28, 2024

Is RISC-V technically superior or is it just about the license?

__s · on April 28, 2024

It's simpler than x86/arm, while having a couple improvements compared to minimal alternatives like MIPS based on hindsight & modern chips

See chapter 2: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-...

RetroTechie · on April 28, 2024

It's designed with the luxury of hindsight on longtime-existing ISAs. Avoiding many pitfalls in those. While not attempting to innovate in ways that may or may not work out.

Also the base ISA is very implementer-friendly. As in: requiring few transistors / FPGA LEs, (relatively) easy to write a compiler or emulator for, etc. But that is hardly unique.

32b and 64b flavours very similar. Oh and... modular.

That doesn't make it 'better' though. Eg. x86 has a looott of legacy cruft. But also a looott of high-quality software for it. RISC-V: many of those tools are still being written / adapted / optimized. Likewise, x86 & ARM have many high-performance, efficient and/or low-cost implementations. RISC-V is catching up quickly, but not (yet) head-to-head with those.

gchadwick · on April 28, 2024

ISAs are complex things so you can't say one is technically superior to the other.

I would say you're right though in that RISCV enjoys the success it's seeing due to the open specification and licensing model. People generally aren't drawn to RISCV because of technical innovation.

brucehoult · on April 28, 2024

The base specification (IMAFDC) has little to no innovation, simply avoiding the mistakes of the past. We've got 60 years of experience with RISC-style instruction sets, so that's about consolidation not innovation.

However RISC-V is an excellent base upon which to innovate. You can see that in things such as the Vector extension, the memory model developed by industry and academic experts world-wide, and CHERI fine-grained memory-protection.

fanf2 · on April 28, 2024

CHERI was originally based on MIPS, then ported to ARM and later RISC-V.

brucehoult · on April 28, 2024

The place you can get a real physical commercially available CHERI implementation is RISC-V:

https://codasip.com/press-release/2023/10/31/codasip-deliver...

That's largely because if you base a product on Arm or MIPS you have the choice of getting them to actively invest in and support you, or getting sued into oblivion by them.

THAT is why RISC-V is the most friendly ISA to innovation and where most future innovation will happen. Because innovation comes not only from internally inside Intel or Arm or MIPS (who have switched to RISC-V now anyway) but from a myriad of possible sources.

rwmj · on April 28, 2024

The base RISC-V tends to avoid innovation, because we've been burned in the past (register windows, branch delay slots etc)

zik · on April 28, 2024

This article's just about the technical writing of the manual. It's nothing to do with the ISA itself.

DeathArrow · on April 28, 2024

Superior in what way? Is English superior to French? I think you can have equally good implementations of modern CPUs regardless of the ISA. The end result is trillions of logical gates that working together will store, add, subtract, multiply at a rate of several millions per second. Logical gates don't care what language you speak to them as long as they understand.

goodpoint · on April 28, 2024

It's waaay more modern than other ISAs in its design. It can scale from microcontroller to simple CPU to GPU-like vector processing to very powerful CPUs without having to add thousands of CISC instructions.

E.g. The ISA is modular. You can use the RV64GC set of instructions to implement a very basic Linux-capable CPU that executes one instruction at a time.

Then you can build an advanced CPU that does OOO and instruction compression and run the same binary *efficiently*.

drycabinet · on April 28, 2024

Might sound surprising until you remember there used to be Lisp Machines.

drycabinet · on April 27, 2024

But some random word in its response can trigger an idea in your mind. Getting an idea from a conversation is not always about getting it directly. It's already in you and you just wanted a trigger.

jprete · on April 27, 2024

Rubber-ducking is useful, but nobody gives the rubber duck anywhere near as much credit as AI enthusiasts give to chatbots.

littlestymaar · on April 28, 2024

You underestimate how much credit I give to rubber ducks ;)

drycabinet · on April 27, 2024

No pipe organ in any of these?

drycabinet · on April 27, 2024

Maybe a stupid question, but how do you compare this against GPT-based search engines?

mpmisko · on April 29, 2024

GPT-based search engines usually use some sort of a database to retrieve context for the LLM to summarize first. This is what people refer to as RAG these days: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-ge....

Some of these GPT engines maintain their own vector DB to do semantic search, others are directly hooked into Bing / Google. So pubmedisearch.com would be one component of a GPT-based engine. We actually have a GPT-based engine here: https://medisearch.io/.