Hacker Newsnew | past | comments | ask | show | jobs | submit | mweidner's commentslogin

Managing "a flat-ish collection of nodes" that can be moved around (without merely deleting and re-inserting nodes) is tricky because of how paragraphs can be split and merged. Notion tackled this for their offline mode: https://www.youtube.com/watch?v=AKDcWRkbjYs

If you take that as a solved problem, do your concerns change?

> Selection & Cursors: Selection across regions is notoriously hard. If "Region A" and "Region B" aren't siblings in a tree, how do we handle a user dragging a selection across both?

You could render them in the DOM as an old-fashioned tree, while internally manipulating your "flat" IR, to make selections work nicely.

This is not too different from how Yjs-ProseMirror works already: Yjs has its own representation of the state as a CRDT tree, which it converts to a separate ProseMirror tree on each update (& it uses a diff algorithm to map local user edits in the other direction).

> Prior Art: Has anyone seen a production system (perhaps in the desktop publishing or CAD world) that successfully treated rich text as a non-hierarchical "content pool"?

This might be how Dato CMS works? https://www.datocms.com/docs/content-modelling (I say this based off of 5 minutes spent watching someone else use it.)

> Are we stuck with trees because they are the "right" abstraction, or just because the browser gives them to us for free?

For lists specifically, I would argue the latter. It's natural to think of a list as a flat sequence of list items, in parallel to any surrounding paragraphs; forcing you to wrap your list items in a UL or OL is (to me) a browser quirk.

I made some progress fighting this in Tiptap: https://github.com/commoncurriculum/tiptap-extension-flat-li... Quill.js already models lists in this "flat" way.


Your reply hits the real tension: a flat model simplifies layout changes, but it shifts complexity into how you map edits and selections. That trade‑off feels worth it if the goal is “safe structure changes” and AI‑driven transforms.

On the split/merge issue: in a flat model, the split/merge doesn’t have to be a structural operation at all. It can live entirely inside the block’s text content. The block keeps the same ID, and only its content changes. That avoids the “delete/reinsert” problem and keeps a stable identity for AI or history.

On selection: the cleanest route is to render a normal DOM tree for interaction and treat the flat IR as the truth. So the DOM is just a projection. That buys you native selection and IME behavior without building a custom cursor engine. The only hard part is deciding a consistent reading order (left‑to‑right, top‑to‑bottom, region order), so selection feels predictable even when layout is spatial.

On syncing/CRDT: a flat model can be simpler in a different way. You’re syncing text inside blocks plus lists of IDs in regions. That’s two clear problems instead of one giant nested tree. It doesn’t remove the complexity, but it makes it easier to reason about where conflicts live (content vs layout).

On lists: a flat list of items is closer to how people think. UL/OL is a browser artifact. Quill’s model already shows this is workable, and it makes the “content pool + layout map” idea more consistent.

Using TipTap/ProseMirror as the editing surface (selection, IME, rich text behavior) while keeping a separate IR is a reasonable split: the view stays tree‑shaped, the data stays flat.

So overall: this approach looks less like “throw away trees” and more like “trees become a rendering tool, not the canonical structure.” That’s a meaningful shift, especially if AI or layout transforms are first‑class.


> More specifically, if you can edit different parts of a same document on different devices, then the document should be split across multiple files that can be synced independently

A more robust idea is to store a log of changes in files that are synced with Dropbox/OneDrive/etc. To prevent conflict warnings from Dropbox, you'll want each device to write to a separate file. Readers re-assemble the logs into (some) canonical total order, then reduce over that to get the document state.

The hard part is re-architecting your app to record all changes, instead of just writing out the current state to disk. However, once you do that, it can form the basis for other features like undo/redo, a view of the file's history, etc.

(The changes don't need to be CRDT/OT messages - anything deterministic works, though it's a good idea to make them "rebase-able", i.e., they will still do something reasonable when replayed on top of a collaborator's concurrent changes.)


Indeed, Replicache works this way, using server reconciliation (one part of client-side prediction): https://doc.replicache.dev/concepts/how-it-works


As the author, same.

My best guess is:

- Central-server collaborative editing work focuses on Operational Transformation (OT), likely due to inertia (studied since 1989) and the perception that storing an ID per character is inefficient. In fairness, it is, absent the optimizations introduced by RGASplit and Yjs (~2015).

- For decentralized editing, OT is very complicated, and CRDTs took over as the solution of interest (studied since 2005). Storing every operation permanently in a log - needed to use the linked approach without a server - feels inefficient, as does server reconciliation's undo/re-apply process. So CRDT research has focused on avoiding those inefficiencies, sacrificing simplicity along the way, instead of just embracing them as the easy way out.

To me, the "inefficiencies" seem quite manageable. Storage is cheap, text is small, and you probably want a complete op log anyway for auditing and document histories (cf. git). Server reconciliation's undo/re-apply process can be batched aggressively, e.g., only do it a few times per second; that just makes remote ops take a little longer to show up.

Granted, I have not built a complete app around server reconciliation or the linked approach, so perhaps there is a hidden catch. But I am encouraged by the success of Replicache (https://doc.replicache.dev/concepts/how-it-works), which is where I learned of server reconciliation.


I wonder if you could sidestep the inefficiencies of storing ID's for every character by using redis streams to store the characters with id's "represented as delta-compressed macro nodes that are linked together by a radix tree" where the ids are "monotonically incrementing and has two parts: <time>-<counter>. The time is in milliseconds and the counter increases for entries generated in the same milliseconds"

This would seem to avoid clashes and also compress the identifier storage in an efficient way.

https://antirez.com/news/128


This sounds similar to the idea behind articulated (though with ids UUID-counter instead of time-counter): https://github.com/mweidner037/articulated

I will check out Antirez.


Even in the absence of a central server, you can still avoid CRDT/OT complexity if you have a decentralized way to eventually total order operations & apply them in that order: https://mattweidner.com/2025/05/21/text-without-crdts.html#d...

As others in the comments argue, this is technically a CRDT (though a fully general one); also, undoing/replaying ops is itself non-trivial to implement. However, I hope this is still simpler than using a traditional CRDT/OT for each data type.


Did you perhaps mean to write, “though not a fully general one”?


A decentralized, eventually consistent total order on operations is a fully general CRDT, in the sense that you can put whatever (deterministic) operations you want in the total order and clients will end up in eventually consistent states.

Whether the converged result is at all reasonable is a different question.


Ok, now I understand your reply. In retrospect, you were being perfectly clear all along and my confusion was due to me not appreciating the precise definition of a CRDT. Thanks!


Indeed, this is close to what Yjs (popular CRDT library) does: each client instance (~ browser tab) chooses a random 32-bit clientId, and character IDs combine this clientId with local counters. https://github.com/yjs/yjs/blob/987c9ebb5ad0a2a89a0230f3a0c6...

Any given collaborative document will probably only see ~1k clientIds in its lifetime, so the odds of a collision are fairly low, though I'd be more comfortable with a 64-bit ID.


I believe that Automerge internally stores all operations in an eventually consistent total order, which you can use as a substitute for the server in server reconciliation (cf. https://mattweidner.com/2025/05/21/text-without-crdts.html#d...). However, Automerge does not actually do that - instead it processes text operations using a traditional CRDT, RGA, probably because (as you point out) undoing and replaying ops is not trivial to implement.


Indeed, the simple approach of "send your operations to the server and it will apply them in the order it receives them" gives you good-enough conflict resolution in many cases.

It is still tempting to turn to CRDTs to solve the next problem: how to apply server-side changes to a client when the client has its own pending local operations. But this can be solved in a fully general way using server reconciliation, which doesn't restrict your operations or data structures like a CRDT does. I wrote about it here: https://mattweidner.com/2024/06/04/server-architectures.html...


Just got to reading this.

> how to apply server-side changes to a client when the client has its own pending local operations

I liked the option of restore and replay on top of the updated server state. I’m wondering when this causes perf issues? First local changes should propagate fast after eg a network partition, even if the person has queued up a lot of them (say during a flight).

Anyway, my thinking is that you can avoid many consensus problems by just partitioning data ownership. The like example is interesting in this way. A like count is an aggregate based on multiple data owners, and everyone else just passively follows with read replication. So thinking in terms of shared write access is the wrong problem description, imo, when in reality ”liked posts” is data exclusively owned by all the different nodes doing the liking (subject to a limit of one like per post). A server aggregate could exist but is owned by the server, so no shared write access is needed.

Similarly, say you have a messaging service. Each participant owns their own messages and others follow. No conflicts are needed. However, you can still break the protocol (say liking twice). Those can be considered malformed and eg ignored. In some cases, you can copy someone else’s data and make it your own: for instance to protect against impersonations: say that you can change your own nickname, and others follow. This can be exploited to impersonate but you can keep a local copy of the last seen nickname and then display a ”changed name” warning.

Anyway, I’m just a layman who wants things to be simple. It feels like CRDTs have been the ultimate nerd-snipe, and when I did my own evaluations I was disappointed with how heavyweight and opaque they were a few years ago (and probably still).


You could avoid the CRDT rules if you only use the LLM on the server. I.e., user comes online and sends their diff to the server, which LLM-merges it into the latest state and then sends that back to all clients.

This doesn't help you do merges client-side during live collaboration (for showing your optimistic local updates), but there the low latency reduces conflicts anyway, so you can fall back on a semantically-imperfect CRDT.


If you have a central server, you don't need CRDTs, which are designed to work even in pure peer-to-peer scenarios. Figma is one example of this [0]:

> Figma isn't using true CRDTs though. CRDTs are designed for decentralized systems where there is no single central authority to decide what the final state should be. There is some unavoidable performance and memory overhead with doing this. Since Figma is centralized (our server is the central authority), we can simplify our system by removing this extra overhead and benefit from a faster and leaner implementation.

> It’s also worth noting that Figma's data structure isn't a single CRDT. Instead it's inspired by multiple separate CRDTs and uses them in combination to create the final data structure that represents a Figma document (described below).

[0] https://www.figma.com/blog/how-figmas-multiplayer-technology...


One challenge is that the algorithms typically used for collaborative text editing (CRDTs and OT) have strict algebraic requirements for what the edit operations do & how they interact. So even if your server is smart enough to process the "Colour" example in a UX-reasonable way, it's very hard to design a corresponding CRDT/OT, for optimistic client-side edits.

You can work around this by not using CRDTs/OT. E.g., the server processes operations in receipt order, applying whatever UX logic it wants; clients use a rebase/prediction strategy to still allow optimistic edits on top (cf. https://doc.replicache.dev/concepts/how-it-works). Doing this for text editing has some challenges, but they're separate from the CRDT/OT challenges discussed here.


Author here! Just chiming in to say this is a very underrated comment which I fully cosign. :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: