He has the advantage of knowing what is there and they can create an arrangement to license or sell the IP to another entity… if it’s just gathering dust, why not?
The closest explanation to a use case architecture I've seen recently was https://mattboegner.com/knowledge-retrieval-architecture-for... - it basically describes doing knowledge retrieval (keyword parsing) from LLM queries, feeding that to a vector db to do similarity search to get a top K similar documents to the parsed keywords, then feeding that list that back into the LLM as potential useful documents it can reference in its response. It's neat but it seems a bit hacky. Is that really the killer app for these things?
Amazon has their own. Walmart has their own. Target has their own.
Given a list of tens of thousands of products, how can you automatically match the product to a merchant's taxonomy?
I started with a "clever" SQL query to do this, but it turns out that it's way easier to use vector DBs to do this.
1. Get the vector embedding for each taxonomy path and store this
2. Get the vector embedding for a given product using the name and a short description
3. Find the closest matching taxonomy path using vector similarity
It's astonishingly good at doing this and solved a big problem for us which was building a unified taxonomy from the various merchant taxonomies.
You can use the same technique to match products with high confidence across merchants by storing the second vector embedding. Now you have a way to determine that product A on Target.com is the same as product A' on Walmart.com is the same as product A'' on Amazon.com by comparing vector similarity.
Could this strategy work to match products across retailers? If so, any tips on getting started with vector databases? I've heard of them but have yet to try one out.
Yes. You compute the embedding for the product name + description from Target.com and then the embedding for the product name + description from Walmart.com. They'll have a very close vector similarity.
The easiest way to get started is with Supabase since it has a free tier and the pg_vector plugin built in.
You calculate the embedding using OpenAI's embeddings API and store the result. Then it's just a vector similarity query in Postgres (trivially easy).
Another way to do this is using the pgml extension. You can run huggingface embedding models, which have surpassed OpenAI's at this point. It's pretty fast if you run it on a machine with a gpu for acceleration. I've created embeddings on my local desktop with a 3090 for ~2,000,000 tokens in chunks of ~100 (450 characters). It took around 20 min using the gte-base model including insert into indexed table.
Yes, GPU poor people are just using top k semantic search to try to fix the issues will low ram low knowledge LLMs. It's OK for some applications, but other methods need to be investigated.
It's a new type of database that can take in streaming data with very fast (~10ms) response times and output batch data with very fast throughput. To do that, it uses a new columnar file format and compression algorithm. Together, this makes its columnar files 30-50% smaller under most circumstances while decoding just as quickly. That means storage costs are lower and it's 30+% faster assuming the same network bandwidth is used to transfer the data for all columns. And this is a pessimistic scenario, since most queries have a `select column_0, column_1, ...` clause that PancakeDB can leverage better than Parquet, transferring only the exact columns needed!
You can find edge cases (e.g. very long strings of uniformly random bytes) where it's only a few % faster instead of 30%, but in every real-world-resembling scenario I've tried, the advantage is much greater.
This is really neat. One piece of feedback is it begins and ends TeX expressions by saying “dollar” which is distracting. Probably best to strip the TeX syntax while retaining the expressions. Simple ones like O(1) should be understandable aurally, even if complex expressions may not be.
Wow! When someone really believes that republicans and democrats are any different...they play with your wannabe freedom to choose, you know that right.
Most people have good health care plans through their employer that pays for most of this stuff. I paid a few hundred dollars to the hospital when my child was born and we were there for several days. There are just a lot of people who for one reason or another do not have those benefits. It’s a very multimodal experience here unfortunately.
Given my experience with both the US and Swedish healthcare systems, I'm not sure I would rate even the absolute best healthcare plans as good. Growing up, my healthcare plan was about the best possible (both parents were doctors in the same hospital so we got a sort of deluxe super plan better than even the regular doctor families got with only one doctor parent), but I would still say it was about as good as Sweden's setup.
I don't mean this as an attack on you, but I think Americans overestimate the quality of their healthcare. The US system really is pretty bad from top to bottom. There are of course people who have it _worse_ (e.g. those not insured at all), but I don't find any aspect of the system particularly impressive. And considering the fact that it's by far the most expensive per capita of any healthcare in the world, it really becomes totally inexcusable. American healthcare is a national disgrace.
But if your employer is holding your insurance they can do all kinds of things. You are afraid to quit because you will lose insurance.
I have also read here on HN someone from SF paying a $4000 fee per month and still having to pay for a large part of hospitalisation. So it appears to be not so rosy even for the well off.