More

comcuoglu · on July 18, 2024

Thank you. It seems largely ignored that LLMs still sample from a set of tokens based on estimated probability and the given temperature - but not on factuality or the described "confidence estimate" in the article. RAG etc. only move the estimated probabilities into a more factually based direction, but do not change the sampling itself

comcuoglu · on July 4, 2024

Chart.js (with the geo plugin for the choropleth chart) and three.js for the bubble-chart.

comcuoglu · on July 4, 2024

Thanks, is fixed now.

comcuoglu · on July 4, 2024

Both fixed now, thanks.

DenisM · on July 5, 2024

Also SQL Server and MSSQL

egorfine · on July 5, 2024

Also perhaps MariaDB and MySQL should be joined together as well.

comcuoglu · on July 4, 2024

This kind of input is exactly what I've hoped for submitting it here, thank you. I agree!

comcuoglu · on July 4, 2024

While this made me laugh and there is some truth to it, the nice thing when running the process described in the blog post is that you don't need to know what or how you want to count - the LLM has the knowledge to classify it correctly enough to get good estimations. Go and Rust are both good examples of words that have multiple meanings and are pre-/suffix to many other words.

comcuoglu · on July 4, 2024

In total numbers I got 539 jobs saying that they want Rust experience and 695 want Go experience. I think I should have added another line-chart showing the programming language distribution over time, thanks for the idea.

jamestimmins · on July 4, 2024

Thanks for looking this up. It's especially interesting bc if I search "golang" on LinkedIn jobs, I see 5,185 results (in the US), but I only get 148 results for "rust".

Hardly scientific, but shows the risk of using Hacker News to draw overly strong conclusions of language popularity.

comcuoglu · on July 4, 2024

Could you maybe link me one of those? I've googled a bit but didn't find ready-to-use DBs with that data.

pvg · on July 4, 2024

https://news.ycombinator.com/item?id=40644563

comcuoglu · on July 4, 2024

Thank you, looks promising.

pvg · on July 4, 2024

It's handy but I think for your use case, the regular API works fine. For instance, you could have just pulled all the whoishiring posts

https://hacker-news.firebaseio.com/v0/user/whoishiring.json?...

without the googling hoops. Not that this is very helpful after you're done!

SushiHippie · on July 4, 2024

https://news.ycombinator.com/item?id=40782787

Also the clickhouse dataset, which is free.

Google BigQuery can become very expensive.

comcuoglu · on July 4, 2024

I agree, I've realized too late that I should have introduced a "Hybrid" category in this.

equasar · on July 5, 2024

Another thing to improve this, is to ask posters to add GLOBAL_REMOTE, COUNTRY_REMOTE or something that indicates is not local remote only (within the same country).

DoingIsLearning · on July 5, 2024

I would add one more category.

Beyond the in-office and the N-days-a-week-hybrid, you have within _actual_ remote roles:

- Country remote (mostly for taxation/regulatory)

- Time zone remote (remote first companies but constraint to within 2 or 3 hours of HQ time zone)

- Anywhere remote (actual remote but often as a contractor or EoR)

comcuoglu · on July 4, 2024

Yes, later this week I will follow up with something to tell a little bit about the animation and the sphere positioning, that graph was kind of the most fun in writing this blog post. Thank you for your feedback!