Hacker Newsnew | past | comments | ask | show | jobs | submit | sethev's commentslogin

Right - but SQLite handily beats the case where postgres is on the same box as well. And it's completely reasonable to test technology in the configuration in which it would actually run.

As an industry, we seem to have settled on patterns that actually are quite inefficient. There's no problem that requires the solution of doing things inefficiently just because someone said databases should run on a different host.


If you're going to run on more than one piece of hardware, something is going to be remote to your single writer database.

As an industry, we've generally decided against "one big box", for reasons that aren't necessarily performance related.


I sometimes dream of a local-first world in which all software works with local DB and only writes to the cloud as an afterthought, maybe as a backup or a way to pick up work on another machine. It just boggles my mind that more software nowadays relies on an always on internet connection for no good reason other then the design itself.

I think people's reaction to cloud vendors is to go local first. But, there's a middle ground VPS, rented server, even self hosting.

My problem with local first is it's fine for solo apps with the occasional sync. But doesn't work for medium to large datasets and the stuff I work in is generally real-time and collaborative. To me multiplayer is one of the strengths of the web.


Changing terminology is hard once a name sticks. But yeah, "eventual propagation" is probably more accurate. I do get the impression that "eventual consistency" often just means "does not have a well-defined consistency model".

Yes, I agree. I don't really believe we can change the terminology. But maybe we can get some people to at least think about the consistency model when using the term.

I suspect you're right, but it's a bit discouraging to consider that an alternative way of framing this is that companies like OpenAI have a huge advantage in this landscape and anything that works will end up behind their API.


Hah - I think it's more likely that he has it timed and it literally takes between 6-7 minutes.


These are performance optimizations. SQLite does serialize writes. Avoiding concurrent writes to begin with just avoids some overhead on locking.


"performance optimisation" --- yeees, well, if you don't care about data integrity between your reads and writes. Who knows when those writes you scheduled really get written. And what of rollbacks due to constraint violations? There's we co-locate transactions with code: they are intertwined. But yes, a queue-writer is fine for a wide range of tasks, but not everything.

It's that we need to contort our software to make sqlite not suck at writes that is the problem.


This is just FUD. The reason SQLite does locking to begin with is to avoid data corruption. Almost every statement this blog post makes about concurrency in SQLite is wrong, so it's little surprise that their application doesn't do what they expect.

>Who knows when those writes you scheduled really get written

When a commit completes for a transaction, that transaction has been durably written. No mystery. That's true whether you decide to restrict writes to a single thread in your application or not.


> When a commit completes for a transaction, that transaction has been durably written. No mystery. That's true whether you decide to restrict writes to a single thread in your application or not.

Usually this is true but there are edge cases for certain journaled file systems. IIRC sqlite.org has a discussion on this.


> there are edge cases for certain journaled file systems. IIRC sqlite.org has a discussion on this.

Can't currently find it but I guess it comes under the "if the OS or hardware lies to SQLite, what can it do?" banner?


That might have been it. Overall the whole “How to corrupt your database article” was quite a good read:

https://sqlite.org/howtocorrupt.html


You are talking about low level stuff like syncing to the filesystem; that data is journalled and ensuring atomicity is maintained and I am in actual fact not.

Dislocating DML from the code that triggers it creates many problems around ensuring proper data integrity and it divorces consistent reads of uncommitted data that you may want to tightly control before committing. By punting it to a dedicated writer you're removing the ability to ensure serialised modification of your data and the ability to cleanly react to integrity errors that may arise. If you don't need that? Go ahead. But it's not fud. We build relational acid compliant databases this way for a reason


Oh, I think you're picturing executing your transaction logic and then sending writes off to a background queue. I agree, that's not a general strategy - it only works for certain cases.

I just meant that if you can structure your application to run write transactions in a single thread (the whole transaction and it's associated logic, not just deferring writing the end result to a separate thread) then you minimize contention at the SQLite level.


> Who knows when those writes you scheduled really get written

I await the write to complete before my next read in my application logic, same as any other bit of code that interacts with a database or does other IO. Just because another thread handles interacting with the writer connection, doesn't mean my logic thread just walks away pretending the write finished successfully in 0ms.


SQLite, for the most part, uses polling locks. That means it checks if a lock is available to be taken, and if it's not, it sleeps for a bit, then checks again, until this times out.

This becomes increasingly inefficient as contention increases, as you can easily get into a situation where everyone is sleeping, waiting for others, for a few milliseconds.

Ensuring all, or most, writes are serialized, improves this.


Have you tried it?

What you're describing sounds like it would work fine to me. The blog post is misleading imho - it implies that SQLite doesn't handle concurrency at all. In reality, you can perform a bunch of writes in parallel and SQLite will handle running them one after the other internally. This works across applications and processes, you just need to use SQLite to interact with the database. The blog post is also misleading when it implies that the application has to manage access to the database file in some way.

Yes, it's correct that only one of those writes will execute at a time but it's not like you have to account for that in your code, especially in a batch-style process like you're describing. In your Python code, you'll just update a row and it will look like that happens concurrently with other updates.

I'll bet that your call to ChatGPT will take far longer than updating the row, even accounting for time when the write is waiting for its turn in SQLite.

Use WAL-mode for the best performance (and to reduce SQLITE_BUSY errors).


I haven't tried it yet - async processing (and even using SQLite) is new to me, so I'm trying to figure out solution patterns which work for the now, and also I can continue to invest my knowledge in to solve future problems.

I will look into WAL mode. I am enjoying using SQLite (and aware that its not the solution for everything), and have several upcoming tasks which I'm planning to use async stuff - and yes, trying to find the balance between how to handle those async tasks (Networky HTTP calls being different than running `ffmpeg` locally).


It's always interesting how these large organizations can bring in tens of millions of dollars in excess of expenses, yet still manage to "have no money"

Source: https://assets.mozilla.net/annualreport/2024/b200-mozilla-fo...


Joseph Hellerstein has a series of posts on CRDTs: https://jhellerstein.github.io/blog/crdt-intro/

He very much leans toward them being hard to use in a sensible way. He has some interesting points about using threshold functions over a CRDT to get deterministic reads (i.e. once you observe the value it doesn't randomly change out from under you). It feels a bit theoretical though, I wish there were examples of using this approach in a practical application.


Pijul is a version control system based on a CRDT: https://pijul.org/manual/theory.html#conflicts-and-crdts

It works like you describe, with humans manually resolving conflicts. The conflicts are represented in the data model, so the data model itself converges without conflicts...if that makes sense.


This seems like an unnecessarily negative comment. I've been a user of SQLite for over 20 years now (time flies!), what you're calling lack of polish, I would chalk up to Dr. Hipp has been consciousness about maintaining compatibility over the long term. So much so, that the Library of Congress recommends it for long-term preservation of data.

Long term compatibility (i.e. prioritizing the needs of users vs chasing inevitably changing ideas about what feels polished or pristine), near fanatical dedication to testing and quality, and sustained improvement over decades - these are the actual signs of true craftsmanship in an engineering project.

(plus, I don't agree with you that the storage layer, column format, or SQL implementation are bad).


> I would chalk up to Dr. Hipp has been consciousness about maintaining compatibility over the long term.

I agree. I am not suggesting that the SQLite team doesn't know how to make the technology better. Just that they aren't/haven't. Backwards compatibility is a good reason not to.

My original comment was contrasting craftsmanship and utility, since both are somewhat prized on HN, but they aren't the same thing at all. Look at a system like Wireguard. A huge amount of small decisions went into making that as simple and secure as it is. When most developers are confronted with similar decisions, they perform almost randomly and accumulate complexity over the long tail of decisions (it doesn't matter just pick a way). With Wireguard, every design decision reliably drove toward simplicity (it does matter, choose carefully).


I don't think they ever hesitate to make sqlite better. It's just that they have a different definition of "better" than you.


> contrasting craftsmanship and utility, since both are somewhat prized on HN

I'd say they're prized everywhere, though "craftsmanship" is really subjective. and the HN I usually [edit/add: see] seems to have more a meta of "criticize anything someone tries to build, and rave about IQ" tbh ;)

SQLite works and I don't have to think about it why it works (too much). That is IMO a true hallmark of solid engineering.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: