Hacker Newsnew | past | comments | ask | show | jobs | submit | mathisd's commentslogin

I believe the author missed another approach of the semantic layer. That is the one used by Power BI Semantic model or, and perhaps, the most interesting one Malloy. In these tools, the semantic layer is a thin layer that only define the following: - metric definition (mostly as aggregation function) - dimensions of analysis (product category, country, etc.)

This blog makes a much better argument than I would at presenting why Malloy is a really interesting and welcome innovation in Data Analytics space : https://carlineng.com/?postid=malloy-intro#blog


Thanks for this. Can you suggest any books that go into these topics with examples?


The visualisations could be improved by binning number of maintainer 1 / 2-10 / 11-n or by plotting cumulative distribution (ie. x% of projects have less than y contributors)


Even that would be mis-representative... I know of many packages with contributions from hundreds of people, but the bulk of the work was still 1 or 2 primary maintainers based on commits.


I really like DuckDB but I can't see this being a pattern used for prototyping nor for production.


Really cool


Time lost waiting due to unresponsive driver: 35 minutes Location: Lyon Airport App: Bolt Resolution method: Harassing Bolt AI support


I have had the same issue with Bolt recently in Lyon airport. Had to wait 45 minutes with a driver who wouldn't answer message or call and was waiting the other way of the airport. Bolt support was awful to reach during those 45 minutes. Driver should be held accountable of those actions by the platform too.


Positron IDE is a VS Code fork intended for R language. It feels more modern than R Studio and I was under the impression that it would replace it at some point. That raises two questions: Does GitHub Copilot or your extension works in Positron IDE ?


Right now our assistant is only available in RStudio. We do plan to develop an assistant for Positron-like IDEs in the future though.


Positron is made by Posit, which is formerly the R Studio Company. So I would say its basically the new R Studio.


Why bother now that there is newer package manager such as uv which still have a strong lead in performance ?


(for fellow JavaScript haters: https://archive.is/Hl4yJ; but this will show collapsed accordions with important content that of course don't expand. I caved and visited the original page — but seriously, people, the <details> tag is not deep magic.)

TFA documents work done for and incorporated into Pip about a year ago.

Improvements like this are still worth making because, among other things, tons of people still use Pip and are not even going to look at changing. They are, I can only assume, already running massive CI jobs that dynamically grab the latest version of Pip repeatedly and stuff them into containers, in ways that defeat Pip's own caching, and forcibly check the Internet every time for new versions. Because that's the easiest, laziest thing to write in many cases. This is the only plausible explanation I have for Pip being downloaded an average of 12 million times per day (https://pypistats.org/packages/pip).

They're also worth making exactly because Pip still has a very long way to go in terms of performance improvement, and because experiments like this show that the problem is very much with Pip rather than with Python. Tons of people hyping Uv assume that it must be "rocket emoji, blazing fast, sparkle emoji" because it's written in Rust. Its performance is not in question; but the lion's share of the improvement, in my analysis, is due to other factors.

Documenting past performance gains helps inform the search for future improvements. They aren't going to start over (although I am: https://github.com/zahlman/paper) so changes need to be incremental, and constantly incorporated into the existing terrible design.

Showing off unexpected big-O issues is also enlightening. FTA:

> This was the code to sort installed packages just before the final print.

> There was a quadratic performance bug lurking in that code. The function `env.get_distribution(item)` to fetch the package version that was just installed was not constant time, it looped over all installed packages to find the requested package.

The user would not expect an installation of hundreds of packages to spend a significant amount of time in preparing to state which packages were installed. But Pip has been around since 2008 (https://pypi.org/project/pip/#history) and Ian Bicking may never have imagined environments with hundreds of installed packages, never mind installing hundreds at a time.

Finally, documentation like this helps highlight things that have improved in the Python packaging ecosystem, even outside of Pip. In particular:

> Investigation revealed the download is done during the dependency resolution. pip can only discover dependencies after it has downloaded a package, then it can download more packages and discover more dependencies, and repeat. The download and the dependency resolution are fully intertwined.

This is mostly no longer true. While of course the dependency metadata must be downloaded and cannot appear by magic, it is now available separately from the package artifact in a large fraction of cases. Specifically, there is a standard for package indices to provide that information separately (https://peps.python.org/pep-0658/), and per my discussion with Pip maintainers, PyPI does so for wheels. (Source distributions — called sdists — are still permitted to omit PKG-INFO, and dependency specifications in an sdist can still be dynamic since the system for conditional platform-dependent dependencies is apparently not adequate for everyone. But in principle, some projects could have that metadata supplied for their sdists, and nowadays it's relatively uncommon to be forced to install from source anyway.)



Do we really need this ?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: