Hacker Newsnew | past | comments | ask | show | jobs | submit | 7qW24A's commentslogin

I’m a database guy, not an OS guy, so I agree, obviously… But what is the micro-kernel angle?


Likely the idea that filesystems should run as userspace / unprivileged (or at least limited privilege) processes which would make them, ultimately, indistinguishable from a form of database engine.

Persistent file systems are essentially key-value stores, usually with optimizations for enumerating keys under a namespace (also known as listing the files in a directory). IMO a big problem with POSIX filesystems is the lack of atomicity and lock guarantees when editing a file. This and a complete lack of consistent networked API are the key reasons few treat file systems as KV stores. It's a pity, really.


> "Likely the idea that filesystems should run as userspace / unprivileged (or at least limited privilege) processes which would make them, ultimately, indistinguishable from a form of database engine."

"Userspace vs not" is a different argument from "consistency vs not" or "atomicity vs not" or "POSIX vs not". Someone still needs to solve that problem. Sure instead of SQLite over POSIX you could implement POSIX over SQLite over raw blocks. But you haven't gained anything meaningful.

> Persistent file systems are essentially key-value stores

I think this is reductive enough to be equivalent to "a key-value store is a thin wrapper over the block abstraction, as it already provides a key-value interface, which is just a thin layer over taking a magnet and pointing it at an offset".

Persistent filesystems can be built over key-value stores. This is especially common in distributed filesystems. But they also circumvent a key-value abstraction entirely.

> IMO a big problem with POSIX filesystems is the lack of atomicity

Atomicity requires write-ahead logging + flushing a cache. I fail to see why this needs to be mandatory, when it can be effectively implemented at a higher layer.

> This and a complete lack of consistent networked API

A consistent networked API would require you to hit the metadata server for every operation. No caching. Your system would grind to a halt.

Finally, nothing in the POSIX spec prohibits an atomic filesystem or consistency guarantees. It is just that no one wants to implement these things that way because it overprovisions for one property at the expense of others.


> "Userspace vs not" is a different argument from "consistency vs not" or "atomicity vs not" or "POSIX vs not". Someone still needs to solve that problem. Sure instead of SQLite over POSIX you could implement POSIX over SQLite over raw blocks. But you haven't gained anything meaningful.

This was an attempt to possibly explain the microkernel point GP made, which only really matters below the FS.

> I think this is reductive enough to be equivalent to "a key-value store is a thin wrapper over the block abstraction, as it already provides a key-value interface, which is just a thin layer over taking a magnet and pointing it at an offset".

I disagree with this premise. Key-value stores are an API, not an abstraction over block storage (though many are or can be configured to be so). File systems are essentially a superset of a KV API with a multitude of "backing stores". Saying KV stores are always backed by blocks is overly reductive, no?

> Atomicity requires write-ahead logging + flushing a cache. I fail to see why this needs to be mandatory, when it can be effectively implemented at a higher layer.

You're confusing durability for atomicity. You don't need a log to implement atomicity, you just need a way to lock one or more entities (whatever the unit of atomic updates are). A CoW filesystem in direct mode (zero page caching) would need neither but could still support atomic updates to file (names).

> A consistent networked API would require you to hit the metadata server for every operation. No caching. Your system would grind to a halt.

Sorry, I don't mean consistent in the ACID context, I mean consistent in the loosely defined API shape context. Think NFS or 9P.

I also disagree with this to some degree: pipelined operations would certainly still be possible and performant but would be rather clunky. End-to-end latency for get->update-write, the common mode of operation, would be pretty awful.

> Finally, nothing in the POSIX spec prohibits an atomic filesystem or consistency guarantees. It is just that no one wants to implement these things that way because it overprovisions for one property at the expense of others.

I didn't say it did, but it doesn't require it which means it effectively doesn't exist as far as the users of FS APIs are concerned. Rename operations are the only API that atomicity is required by POSIX. However without a CAS-like operation you can't safely implement a lock without several extra syscalls.


I was confused for a while about where this discussion was going, and what was the broader point. Will try to consolidate thoughts in the interest of making it clearer.

You seem unhappy with POSIX because its guarantees feel incomplete and ad hoc (they are). You like databases because their guarantees are more robust (also true). DBMS over POSIX enables all the guarantees that you like. I'd want to invoke the end-to-end systems argument here and say that this is how systems are supposed to work: POSIX is closer to the hardware, and as a result it is messier. It's the same reason TCP in-order guarantees are layered above the IP layer.

Some of your points re: how the lower layers work seem incorrect, but that doesn't matter in the interest of the big picture. The suggestion (re: microkernels) seems to be that POSIX has a privileged position in the system stack, and that somehow prevents a stronger alternative from existing. I'd say that your gripes with POSIX may be perfectly justified, but nothing prevents a DBMS from owning a complete block device, completely circumventing the filesystem. POSIX is the default, but it is not really privileged by any means.


Is there a DBMS or storage engine intended for a DBMS that does bypass the filesystem altogether? I'm not aware of any, but at the same time I don't have a full grasp of all the storage engines offered.

It almost seems like a ridiculous idea to me for a database component author to want to write there own filesystem instead of improving their DB feature set. I hear the gripes in this thread about filesystems, but they almost sound service level user issues, not deeper technical issues. What I mean by that, is the I/O strategies I've seen from the few open source storage engines i've looked at don't at all seem hindered by the filesystem abstractions that are currently offered. I don't know what a DBMS has to gain from different filesystem abstractions.


There are DBMSes that do their own thing -- superficial chatgpt queries seem to result in a few, but I'll not mention them because I don't know much about their internals. I can think of a few reasons, mostly related to wanting to have more control over how physical media is used. I don't see those arguments made here though.

This paper may be a good read: https://dl.acm.org/doi/10.1145/3341301.3359656 -- it lays down arguments for why not to build a distributed filesystem on top of a regular local filesystem, and some of those arguments could apply to DBMS'es.


The filesystem interface is only privilaged interface because it is the kernel knows about. E.g. you can already use FUSE and NFS to roll your own FS implementations, but those do not a microkernel make, because the OS is still in the way dictating the implementation.

The safest way to put the FS on a level-playing field with other interfaces is to make the kernel not know about, just as it doesn't know about, say, SQL.


Yeah but that’s an interesting technical point, more suited to 2015 HN - in 2025 we can’t let technical matters get in the way of our Sinophobia…


There is precisely zero Sinophobia in the parent thread. Conflating criticism of a country's government with discrimination against that country's people is a very old state propaganda technique that is deeply evil and you should be ashamed of yourself.


This thread is actually about criticism of Riot Games, not any country's government. But for some reason, whenever Westerners do things to other Westerners, they call each other Chinese. In the not-racist way that one does that.


> This thread is actually about criticism of Riot Games, not any country's government.

And, as anyone remotely familiar with the situation would know, Riot Games is a wholly-owned subsidiary of Tencent, a Chinese company, and all Chinese companies are subject to arbitrary amounts of control by the Chinese government.

> they call each other Chinese. In the not-racist way that one does that.

You just committed the same fallacious propaganda technique of the parent. It's extremely dishonest and malicious. Don't do it.


It’s an interesting rabbit hole to go down. If you use the BLS’s definition of productivity, then computers seem to be a net drag on productivity:

https://usafacts.org/articles/what-is-labor-productivity-and...

Even more surprising for me is that productivity growth declined during the ZIRP era. How did we take all that free money and product less?


> Even more surprising for me is that productivity growth declined during the ZIRP era. How did we take all that free money and product less?

This is not correct. Your link is only referring to manufacturing productivity, not overall productivity, which continued to rise. This Economist article has better information: https://archive.vn/6asPb. One hypothesis in the article that seems the most logical to me is that return on investment became so much better in other economic sectors that it siphoned talent from manufacturing.


This is an excellent question. My very unscientific suspicion is that the decreases in average attention span and ability to concentrate zero out the theoretical possible increases in productivity that computers allow.


This is a pretty weird hill to die on, boss. Are you suggesting that all of the software written over the years by emacs users (eg gcc) is “ivory tower”?

As a community of practitioners we should embrace the idea that not all tools have to be “ideal” for all users. Some people like hacking their editor, and some don’t. If software tools sink to the lowest common denominator, like the vast majority of commercial software, we’ll all be worse for it.


1. I ain't willing to die on any hill. In fact I was 100% certain I'll regret commenting negatively on Emacs. Pretty ardent and devoted fans, it seems.

2. The ivory tower thing is dedicated to the parent poster sounding a bit elitistic and trying to imply I am doing it wrong and he's doing it right -- which I did not deny by the way (which is the really funny part) as my central point was "too much freedom is not good".

3. I completely agree with the notion that not all tools are ideal for all users. I used this sub-thread to express a strong opinion that Emacs allows the "too much freedom" thing that actually becomes much more of a hurdle for those of us that just want to get on with it. I was sure it's going to ruffle feathers which makes me commenting on it fairly stupid, come to think of it, because I was not looking to pick fights, but just to broadcast an apparently unpopular opinion and never engage with replies. Which I failed spectacularly. :D

> If software tools sink to the lowest common denominator, like the vast majority of commercial software, we’ll all be worse for it.

Here's the part where you and I will disagree. Your statement is correct on the outset but I take issue with it because I take it as a hint that Emacs > all other editors. Which cannot be stated as a fact, ever, not for any editor, not just Emacs.


Corporations that over-hired over the past 10 years needed an excuse to cut the layers of fat and bureaucracy out, and AI came along at just the right time. It doesn’t matter if AI is increasing productivity; what matters is that people think it might be.


Maybe…

I dunno. Anyone who’s been at a big company (the kind that are inclined to over-hire) can attest to the massive population of folks who don’t seem to have any productive job. Does AI spell doom for those people? I have no idea, their job wasn’t to be productive in the first place. Being a warm body to expand the size of some middle managers fiefdom—that is not a job that can be taken by AI, right?

I guess maybe just engineering jobs are on the chopping block. Doomed, by the nature of their productivity actually being tangible, to be replaced by an AI that does their job worse.


> Being a warm body to expand the size of some middle managers fiefdom—that is not a job that can be taken by AI, right?

If you over-hired and now need to layoff people, because founding and cheap loans aren't available, then AI provides a convenient excuse. You don't have to admit that you paid people to do nothing/very little for the last ten year.


Maybe in '23 & '24 that argument made sense. But presumably the past two years they've spent laying everyone off has been enough to trim the fat.

Honestly, any company that's still laying off people that they "overhired" during the pandemic needs to take a hard look at their leadership teams, because they've had ample opportunity to be rid of such people.


I don't think it's that complicated to begin with. Free money, focus on growth. Not free money and looking recession, cut cut cut. AI is a convinent buzzword to keep shareholders happy, no matter how much you actually invest in it.

I don't think it was a grand conspiracy. Executives are just panicking and don't have much more of a clue than the workers. It's just too bad they could weather the storm regardless while people relying on paychecks suffer consequences.


The problem is that overhired personal created lots of extra process/org/code/infra complexity, so just by firing X% of people may produce significant damage because rest won't be able to handle that extra complexity or have enough institutional knowledge.


Mergers and acquisitions have been off the charts during this time. No one over hired. They over acquired.


This is just fluff for distracting from outsourcing


More accurate to say that VC and the “startup ideology” has always been at the core of HN - it just so happened that aligned with OSS ideology during the ZIRP era.


The “external index build” idea seems pretty interesting. How does it work with updates to the underlying data (e.g., new embeddings being added)? For that matter, I guess, how do incremental updates to pgvector’s HNSW indexes work?


The IVF indexing can be considered into two phases, computing the centroids (KMeans), and assigning each point to the centroids as the inverted lists. The most time-consuming part is at the KMeans stage, and can be greatly accelerated with GPU. 1M 960dim vec can be clustered in less than 10s. We did the KMeans phase externally, and the assignment phase inside postgres. The KMeans part depends only on the data distribution, not on any specific data. So we can do sampling on the data, and inserting/deleting the data won't affect the KMeans result significantly. For the update, it's just a matter of assigning the new vector to a specific cluster and appending it to the corresponding list. It's very light compared to inserting in hnsw


Marc Brooker’s blog on the topic is good: https://brooker.co.za/blog/2021/05/24/metastable.html


That's exactly what I mean, that's why I'm asking what GP meant.


Two questions:

1. What’s the best place to research FTC win rates?

2. Given that Khan is taking an “aggressive” approach to what sorts of cases the FTC should pursue, wouldn’t it make sense for the win rate to be lower under her leadership?


1. You can find the stats (and many other things, like employee counts by type/division, etc) in their annual report

2. No, actually. At least, not this much lower. The win rate is certainly limited by aggression in the extreme, but not as limited in practice as you might think.

Assume for a second the FTC historically only brought 50% of cases it had a 100% chance of winning (IE was not that aggressive).

If it now brought 100% of cases it had a 100% of winning, it would be much more aggressive (file twice as many cases), but the win rate would still be 100% :)

Now, obviously, if they were bringing 100% of cases they had a >50% chance of winning before (0% of cases with <50%), and now are bringing 100% of cases they had a >25% chance of winning before (0% of cases <25%), the win rate would go down a lot.

But the general view of experts (on both "sides", which includes some of their long time very-good very-experienced now-retired career attorneys ) is, basically

1. they were much closer to bringing 50% of the 100% win cases before.

2. they haven't moved to bringing 100% of the 100% win cases, they've moved to some totally strange and random distribution that is based more on press and politics. Microsoft/activision is a good example of this - their argument is very weak (basically that microsoft is monopolizing the cloud gaming business!?)

3. it would be a lot more effective to start by bring 100% of the 100% win cases, or 100% of the >95% win case or whatever, than what they are doing now. You can/should be a lot more aggressive, but still "feared" for winning.

I've put aside any of the other complicated factors to simplify the comment enough to answer your question effectively.

If anyone felt we were at the point they had been bringing 100% of >50% win rate case, and were now trying to bring 100% of >40% win rate cases, I think you'd see a very different perception occurring among experts.

The other complex disagreement (but is a bit of a sideshow given that) is not just on the approach overall, but the fallout from losing so much.

That argument, roughly, looks something like this on the legal side.:

A. The FTC has a tendency to file in a small number of courts (DC, California, Delaware).

B. It gets appealed to roughly the same courts as well

C. You end up in front of the same judges a lot

D. Being humans, they are not immune to bias (though again, these are much better judges than people give them credit for)

E. As a result, bringing crappy (IE 25% win rate) cases in front of them just doesn't just affect your current case, but also affects whether the 100% win rate case stays a 100% win rate case.

This is almost certainly true, and you can argue over the degree to which it's true.

We'll call this the "reputation effect on enforcement ability".

Similar argument about companies doing the calculus about their mergers and business practices, IE the "reputation effect on deterrence ability".

So even among the hawks (which i am, for example, but i'm an american school hawk, not a european school hawk :P) you end up with a view that you can be a lot more aggressive without either of these reputation effects, and you should start there, not with a randomized distribution that is in some sense more aggressive, but is unnecessarily causing serious reputation effects. This will, in turn, likely also enable you to turn 95% wins into 100% wins, etc.


RA’ing people at the top of their band means that they’ve topped out skill-wise. These aren’t senior people; it’s people who will never get promoted past 7 or 8 (which are quite junior) but have been accruing comp adjustments for a long time. Most teams will lay off a 7 that is getting paid more than a 9 by virtue of being at the company for 20 years.


Maybe this is true skill-wise, but there is also productivity to consider. I knew a band 9, who was definitely not going to be promoted STSM (band 10), but was extremely productive. I'm sure he was paid well, better than most band 10s.

Band 10 and above include good soft-skills, people who can persuade the industry and organization, give TED talks, etc.


Do you think it’s a good idea for employees to voluntarily ask for pay cuts so that they are on the low end of the pay scale for a job grade?


Better to look for another job where you think you might not be as disposable, even if at lower pay.


I never saw any significant compensation increase by staying in the company at the same level. A long tenure at level 7 will have a very stagnant compensation and much lower than a newly hired level 7 and lower than 9.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: