Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was genuinely wondering recently why the Hackernews site search tool didn't show a very recent article with very obvious keywords, that google, for example, had in the first place in the results, when adding "Hn" to those 2 keywords in the search field. Is it a matter of the indexing, it means the article was too recent and it wasn't yet in the Algolia's(I believe) based HN's search tool memory; and in this case google copied it to memory faster; Or is this purely a matter of the Algorithms themselves? The algorithms for sure sound to be the matter, when the case is that of a search for an old article, that should be in memory already. It seems unnatural. Algolia has a free tier for open-source projects, what is very nice of them and thanks. I genuinely wonder if those algorithms are indeed so complex to justify those comparatively weaker behaviors seen at HN's internal search.


https://hn.algolia.com/ only searches the text content of the submitted stories (as in the title and url) and the comments.

It doesn't index the actual article content nor take into account links across sites and content like Google does. Algolia (as self-described) is designed to search for things (like products in a ecommerce store) rather than text with concepts, relations, and entities in a knowledge graph like Google.


Maybe too late now, but i would like to point that I went to read Elasticsearch, Algolia, and Xapialand product descriptions before I made my comment, so I know well what Algolia is for, and in the example I gave, I was searching for a Headline, not for internal content inside the comments, and It was a headline that was in the first page of results at HN on that moment, so, I think I have phrased my comment in a polite way towards Algolia, understanding that search has more moving parts then the pattern matching algorithms of the logical core. :ps I am sincerely grateful for the information you gave in your comment, about Entities, Concepts and Relations on a Knowledge graph. This is exactly the kind of info I was looking for when I made the comment, so It was enlighting to know that, and thank you again.


Google search with site:news.ycombinator.com (and optionally a time limit, which I wish wasn't limited to past hour/day/week/month/year) seems consistently superior to what Algolia provides.

Algolia is YC company, so I assume that's the main reason it's being used. But that it does such an awful job with such a simply structured site isn't compelling.


Hey latch, I've been working on the Algolia-based HN search and would love to improve it to provide you with a better search experience.

Do you think about any specific improvements? Would you mind sharing with us some non-working queries? We can follow-up here and you can also open issues on https://github.com/algolia/hn-search


I have just retested and the problem I mentioned before doesn't exist anymore, It has happened around Christmas time, back then, the search returned not very relevant results, Now it shows at the 9th position what google shows at 1th (exactly the article I was searching back then, still as first match on google), but this is a minor difference of order of the first page so I have to say the search is working pretty as expected now. Sorry for not retesting it before making that comment and thanks for keeping the search good. :all right I didn't want to say it but there: russia hypersonic


Well, if I do "activemq vs rabbitmq" I then switch to "comments", I get 2 results. The 2nd hit is reasonable: "ActiveMQ: Not ready for prime time"

Google gives many more results, and a few on the first page seem quite relevant. Most notably: https://news.ycombinator.com/item?id=5531192

but also: https://news.ycombinator.com/item?id=1657574


I just googled "elixir property testing site:news.ycombinator.com" and the first hit was what I wanted.

But it's the 3rd result in Algolia, behind stories that are both older and with fewer votes.


Thanks for sharing, this is a good example where the first 2 hits on Algolia have a "better" textual relevancy (proximity between words is better, because of the "-based" word in the middle) but where the 3rd hit is most probably the one we want to see first because it has more than 100 points while the 2 others have 3.

Let me share that to the team and see whether we can try something.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: