I remember I met Bram Cohen (of Bittorent fame!) around 15 years ago. Around that time is when I had started building web-based distributed collaborative systems, starting with Qbix.com and then spun off a company to build blockchain-based smart contracts through Intercoin.org etc.
Anyway, I wanted to suggest a radical idea based on my experience:
Merges are the wrong primitive.
What organizations (whethr centralized or distributed projects) might actually need is:
1) Graph Database - of Streams and Relations
2) Governance per Stream - eg ACLs
A code base should be automatically turned into a graph database (functions calling other functions, accessing configs etc) so we know exactly what affects what.
The concept of what is “too near” each other mentioned in the article is not necessarily what leads to conflicts. Conflicts actually happen due to conflicting graph topology and propagating changes.
People should be able to clone some stream (with permission) and each stream (node in the graph) can be versioned.
Forking should happen into workspaces. Workspaces can be GOVERNED. Publishing some version of a stream just means relating it to your stream. Some people might publish one version, others another.
Rebasing is a first-class primitive, rather than a form of merging. A merge is an extremely privileged operation from a governance point of view, where some actor can just “push” (or “merge”) thousands of commits. The more commits, the more chance of conflicts.
The same problem occurs with CRDTs. I like CRDTs, but reconciling a big netsplit will result in merging strategies that create lots of unintended semantic side effects.
Instead, what if each individual stream was guarded by policies, there was a rate limit of changes, and people / AIs rejected most proposals. But occasionally they allow it with M of N sign offs.
Think of chatgpt chats that are used to modify evolving artifacts. People and bots working together. The artifacts are streams. And yes, this can even be done for codebases. It isnt about how “near” things are in a file. Rather it is about whether there is a conflict on a graph. When I modify a specific function or variable, the system knows all of its callers downstream. This is true for many other things besides coding too. We can also have AI workflows running 24/7 to try out experiments as a swarm in sandboxes, generate tests and commit the results that pass. But ultimately, each organization determines whether they want to rebase their stream relations to the next version of something or not.
PS: if anyone is interested in this kind of stuff, feel free to schedule a calendly meeting w me on that site. I just got started recently, but I’m dogfooding my own setup and using AI swarms which accelerates the work tremendously.
This is great! I thought of doing something like this for Karaoke, but was wondering about the copyright implications of doing it server-side.
We already do this for ingesting podcasts and cutting their clips with text being highlighted as people speak. AssemblyAI also supports speaker diarization.
For videos recorded using our own livestreaming studio, we can bypass all this by using Web STT and TTS APIs resulting in perfect timing and diarization without the need for server side models.
It's problematic even client side since you don't have a sync license to show words timed to the song. A bunch of other licenses are needed too for the lyrics themselves and to process the original file into the instrumental.
I'm not a fan of Elon's software endeavors, ever since he bought Twitter and turned it into an even worse cesspool of angry political nonsense than it used to be. I don't like how he's been biasing Grok, etc.
But, what exactly is so bad about Grokipedia? It's a different approach and I think a valid one: trying to do with AI what people have been doing manually at Wikipedia. I'm curious to hear the substantive comparisons.
I think the issue is simply this: wikipedia trends towards unbiased info through use of the crowd. Grok, with a single owner with an ax to grind, trends towards whatever elon wants. It’s poisoned information under the control of one man - cyberpunk novels have been written about less.
A concrete example: a few weeks ago, Musk was making a big deal about how most of his massive net worth was not held in cash, and by a total coincidence the phrase "primarily derived from equity stakes rather than cash" showed up on his Grokipedia page in the section about net worth. I checked the pages of several other extremely wealthy people and none of them had such a comment.
> wikipedia trends towards unbiased info through use of the crowd
See, this is why people even give a project like Grokipedia the time of day. While in theory anyone can edit Wikipedia, in practice the moderators form a much smaller and weirder cabal, and they reject edits that go against their views. The frustration with the naive assertion that Wikipedia distills the wisdom of the crowds with the reality of Wikipedia on any page of note is what provides the psychic permission to even entertain a project with such obvious flaws as Grokipedia.
> and they reject edits that go against their views
Citation needed. See what i did there ;)
They reject edits that go against their views on tone and sourcing not political views that i am aware of - i am sure it happens from time to time but unless there’s a consistant bias in one direction this isn’t a valid criticism of the political neutrality of wikipedia.
Even if there is rampant bias in wikipedia, that’s a reason to fork it and change the structure and gatekeeping - not to replace it with a techno-authoritarian ai version controlled by a single billionaire. That’s amplifying the problem from an aggregate bias of 600,000 users who have made an edit in the last 30 days[1] to just one editor who uses ai to make it seem impartial.
I would prefer to fork Wikipedia as well, but in practice I don't think that works, given the many failed Wikipedia forks of the past 20 years. On the internet, the only way to get any alternative to a widely-used source like Wikipedia is to use a significantly different approach. Otherwise, you just look like a cheap knockoff, even to people who might otherwise agree with your approach. Worse is better, after all - worse in most ways, but better or different in at least one innovative way.
It's controlled by a guy who spends all day retweeting white supremacists and lying about his companies. Why should anyone who isn't a white supremacist use it?
Anyway, I wanted to suggest a radical idea based on my experience:
Merges are the wrong primitive.
What organizations (whethr centralized or distributed projects) might actually need is:
1) Graph Database - of Streams and Relations
2) Governance per Stream - eg ACLs
A code base should be automatically turned into a graph database (functions calling other functions, accessing configs etc) so we know exactly what affects what.
The concept of what is “too near” each other mentioned in the article is not necessarily what leads to conflicts. Conflicts actually happen due to conflicting graph topology and propagating changes.
People should be able to clone some stream (with permission) and each stream (node in the graph) can be versioned.
Forking should happen into workspaces. Workspaces can be GOVERNED. Publishing some version of a stream just means relating it to your stream. Some people might publish one version, others another.
Rebasing is a first-class primitive, rather than a form of merging. A merge is an extremely privileged operation from a governance point of view, where some actor can just “push” (or “merge”) thousands of commits. The more commits, the more chance of conflicts.
The same problem occurs with CRDTs. I like CRDTs, but reconciling a big netsplit will result in merging strategies that create lots of unintended semantic side effects.
Instead, what if each individual stream was guarded by policies, there was a rate limit of changes, and people / AIs rejected most proposals. But occasionally they allow it with M of N sign offs.
Think of chatgpt chats that are used to modify evolving artifacts. People and bots working together. The artifacts are streams. And yes, this can even be done for codebases. It isnt about how “near” things are in a file. Rather it is about whether there is a conflict on a graph. When I modify a specific function or variable, the system knows all of its callers downstream. This is true for many other things besides coding too. We can also have AI workflows running 24/7 to try out experiments as a swarm in sandboxes, generate tests and commit the results that pass. But ultimately, each organization determines whether they want to rebase their stream relations to the next version of something or not.
That is what I’m building now with https://safebots.ai
PS: if anyone is interested in this kind of stuff, feel free to schedule a calendly meeting w me on that site. I just got started recently, but I’m dogfooding my own setup and using AI swarms which accelerates the work tremendously.
reply