- Bespoke software for the group including: shared embedding graph of highlights and annotations, IRC chat with @ for members and books and authors, collective bookshelf
I genuinely want to benchmark myself with fellow peers how on long they take to consume text book. I am not that into fiction or short fiction books - I generally loose interest.
How long would technical books take you to complete, say you have to read Effective Java 3rd Edition
I haven't read a technical book in a long time. I read "You Don't know JS," which is very very short, in about a week. I read "Cracking the Coding Interview" in 3 weeks.
As for non fiction, I read "Anarchist Communism" (fairly short) in a week, "Delivered from Distraction" in a month, "Masters of Doom" in about 2 weeks.
It depends on my interest more than anything. I obliterated the entire Robin Hobb "Assassin's" series, 12 books, in just about two months.
I’ll say this: between store, search, synthesize and share, store and synthesize are consistently the most difficult to nail down.
A society that wishes to succeed in creating an activated and knowledgeable populous should be interested in how to train people to notice better, and to create insightful follows.
In the words of David Deutsch (paraphrasing): knowledge consists of conjecture and error correction
> Discoveryness. Even if the needed content exists, it’s hard to guarantee that users will find it.
I'm curious as to what you'll think of the UX layer I applied to embeddings for public perusal. I call it "semantic scrolling" since it's not searching exactly, but moving through the cluster by using <summary>/<details> as a tree.
[1] is a single starting point (press the animated arrow to "wiki-hole") and [2] is the entire collection (books, movies, music, animations, etc.)
Cool. I kinda grok the idea of semantic scrolling but I'm having trouble seeing it in action in the site. I think it would be useful in cases where I want to become an expert in a given topic and therefore I want to peruse lots of related ideas and create the possibility of serendipitous new neural connections. As for technical documentation, usually people want to find certain information as quickly as possible so that they can get on with their work, so I don't think semantic scrolling would be a good fit on most docs sites. I.e. they won't have the patience to semantically scroll in order to find the info they need.
thank you, i appreciate it! yes, the software is api first, and much of it’s utility comes from working in other environments like google docs, ios shortcuts, etc..
in essence, the core of the project is a vector database that works for end users who want no fuss quick capture, semantic and fts search, the ability to create new relationships with marginalia.
im not trying to replace tools like obsidian or notion, i think ycb works better with them doing what they do best! i also plan to make the stack self hostable in the near future :)
I recently changed my thought process on the responsibility of readers and writers. I think it is the responsibility of the writer to say something interesting, and then publish said interesting thing somewhere where it can be reached by readers. I think it’s an element of the modern freneticism of the Internet that we expect instant eyeballs.
Old writing is constantly reinvigorated by new readers. In fact, that is the only way that it can continue to live. It is the responsibility of the reader to decide whether a piece of writing is important or not in their context.
As such, I’ve been blogging quite a bit. Because posting on a blog allows for the critical mass of future readers. Anything interesting that I happen to say can be found through a medium that reaches billions of people.
Today, or in the future.
HTTP is basically the printing press. And my website is my media company.
Just because billions of people aren’t consistently reading what’s written, doesn’t mean that the writer has lost their responsibility to write.
A method that has worked well for me: divorced databases.
The first database is a plaintext database that stores rows: id, data, and metadata and the second database is a vector database that stores id, embedding. whenever a new row is added the first database makes a POST request to the second database. The second database embeds the data and returns the id of its row. The first database uses that ID to store the plain text.
When searching, the second database is optimized for cosine sim with an HNSW index. It returns the IDs to the first database, which fetch the plaintext to return to the user.
The advantages of this are that the plaintext data can be A/B tested across multiple embedding models without affecting the source, and each database can be provisioned for a specific task. Also lowers hosting costs and security because there only needs to be one central vector database and small provisioned plaintext databases.
Post-co author here. This is actually something that we are considering implementing in future versions of pgai Vectorizer. You point the vectorizer at database A but tell it to create and store embeddings in database B. You can always do joins across the two databases with postgres FDWs and it would solve issues of load management if those are concerns. Neat idea and one on our radar!
From my last comment in August 2024 [1], I have made progress!
The project I'm developing is called Your Commonbase, a self organizing scrapbook built around Personal Library Science.
The big updates are:
1. From Fleeting Note to Connected Note with The Entry/Comment Model - An entry has its own marginalia (comments), which are also embedded as first class data. These comments allow your search model to improve over time, and create surprising clusters. See this video to see an example of the Entry/Comment model in a d3 graph [2]. All the connections are created automatically! Entries go from "fleeting entries" inbox (not linked or commented on) to "main entries" which spread and connect ideas from across the space.
2. Using yCb as a creator - I created a Google Docs extension and have been using it to create Zettelkasten's for my blogs. On average each blog references 12 or more books. This is a real use case of a PKM system outside doing it for the love of the game [3].
There's so much more including mobile upload, [[links]], audio upload, and more, but you can explore the Notion page in [2] to see the features I've added.
Each of us is reading sixty books over 2026, five a month, where every book is self selected by each member.
It’s small, six people, all brought in by application only.
You can check out our shared bookshelf here! (Heavy inspiration from Stripe Press)
https://bookshelf-bookclub.vercel.app/book/cmj4pfpom001gqsbj...
(swipe left/right on mobile, up/down arrows on pc :))