I've been using Python since like 2006, so maybe I just have that generational knowledge and battlefront experience... but whenever I come into threads like this I really feel like an imposter or a fish out of water. Like, am I using the same Python that everyone else is using? I echo your stance - the less overhead and additional tooling the better. A simple requirements.txt file and pip is all I need.
Isn't pip + requirements.txt insufficient for repeatable deployments? You need to pin all dependencies not just your immediate project dependencies, unless you want some random downstream update to break your build. I guess you can do that by hand.. but don't you kind of need some kind of a lock file to stay safe/sane?
Now you can install prod requirements or dev requirements or whatever other combination of requirements you have and you guarantee to have the exact same subset of packages, no matter what your transitive dependencies are doing.
You can use pip-compile from pip-tools if you want the file to include exact hashes.
This is true, but now you're explicitly depending on all of your transitive dependencies, which makes updating the project a lot harder. For example, if a dependency stops pulling in a transitive dependency past a certain version, you'll need to either recreate the constraints file by reinstalling everything, or manually remove the dependencies you don't need any more.
Also pip freeze does not emit a constraints file, it emits (mostly) a requirements file. This distinction is rarely important, but when it is, it can cause a lot of problems with this workflow. For example, a constraints file cannot include any information about which extras are installed, which pip freeze does by default. It also can't contain local or file dependencies, so if you have multiple projects that you're developing together it simply won't work. You also can't have installed the current project in editable mode if you want the simple "pip freeze" workflow to work correctly (although in practice that's not so difficult to work around).
Pip-tools does work a bit better, although the last time I used it, it considered the dependency chains for production and for development in isolation, which meant it would install different versions of some packages in production than in development (which was one of the big problems I was trying to solve).
From my experience trying basically every single option in the packaging ecosystem, there aren't really any solutions here. Even Poetry, which is pretty much best-in-class for actually managing dependencies, struggles with workspace-like installations and more complicated build scripts. Which is why I think pretty much every project seems to have its own, subtly unique build/dependency system.
Compare and contrast this with, say, NPM or Cargo, which in 95% of cases just do exactly what you need them to do, correctly, safely, and without having to think about it at all.
> This is true, but now you're explicitly depending on all of your transitive dependencies
They're constraints not dependencies they don't need to be installed and you can just update your requirements as you need and regenerate them.
> Also pip freeze does not emit a constraints file, it emits (mostly) a requirements file. This distinction is rarely important, but when it is, it can cause a lot of problems with this workflow. For example, a constraints file cannot include any information about which extras are installed, which pip freeze does by default
Pip freeze does not use extras notation, you just get extra packages listed as individual dependencies. Yes there is an important distinction between constraints and requirements but Pip freeze uses an intersecting subset of the notation.
> You also can't have installed the current project in editable mode if you want the simple "pip freeze" workflow to work correctly
That's why the workflow I gave to generate the constraints didn't use the -e flag, you generate the constraints separately and then can install however you want, editable or not.
> From my experience trying basically every single option in the packaging ecosystem, there aren't really any solutions here. Even Poetry, which is pretty much best-in-class for actually managing dependencies, struggles with workspace-like installations and more complicated build scripts. Which is why I think pretty much every project seems to have its own, subtly unique build/dependency system.
People have subtly different use cases that make a big impact on what option is best for them. But I've never been able to fit Poetry into any of my use cases completely, whereas a small shell script to generate constraints automatically out of my requirements has worked exceedingly well for pretty much every use case I've encountered.
'pip freeze' will generate the requirements.txt for you, including all those transitive dependencies.
It's still not great though, since that only pins version numbers, and not hashes.
You probably don't want to manually generate requirements.txt. Instead, list your project's immediate dependencies in the setup.cfg/setup.py file, install that in a venv, and then 'pip freeze' to get a requirements.txt file. To recreate this in a new system, create a venv there, and then 'pip install -c requirements.txt YOUR_PACKAGE'.
It was pretty bad before but now it seems like there are a bunch of competing solutions each with their own quirks and problems. It feels like the JavaScript ecosystem.
Ironically, the Javascript ecosystem is far better than the Python ecosystem when it comes to packaging and dependencies. NPM just does the right thing by default: you define dependencies in one place, and they are automatically fixed unless you choose to update them. Combine that with stuff like workspaces and scripts, and you basically have everything you need for the vast majority of use cases.
Yes, there's also other options like Yarn, which have typically had newer features and different approaches, but pretty much everything that works has been folded back into NPM itself. Unless you really want to live at the bleeding edge for some reason, NPM is perfectly sufficient for all your needs.
In contrast, the closest thing to that in the Python ecosystem is Poetry, which does a lot of things right, but is not supported by Python maintainers, and is still missing a handful of things here and there.
I'm not saying the JS ecosystem as a whole is perfect, but for packaging specifically, it's a lot better than Python.
I mean, a project needs regular care and maintenance, however you organise it. If you're never scheduling time to maintain your dependencies, you're going to be in trouble either way. But at least if you lock your dependencies, you know what will actually get installed, and you can find the buggy or insecure versions.
We found a bug on a Python project I worked on recently that only seemed to happen on certain machines. We couldn't reproduce it in a dev environment, and one machine the was affected suddenly stopped being affected after a while. It turns out the issue was a buggy dependency: one particular build of the project happened to have picked up the buggy version, but later builds used the fixed version and so didn't have a problem. So we'd only see the bug depending on which build the machine had last used, and if someone put a different build on there, it would reset that completely. On our development machines, we used slightly different builds that just happened but to have been affected.
Pinning dependencies wouldn't necessarily have prevented the bug in the first place - sometimes you just have buggy dependencies - but the debugging process would have gone much more quickly and smoothly with a consistent build environment. We could also have been much more confident that the bug wouldn't accidentally come back.
That's definitely a solution, but it comes with its own problems, in particular that you add a significant dependency on what is essentially a middleman organisation trying to manage all possible dependencies. This doesn't scale very well, particularly because there's a kind of M×N problem where M packages can each have N versions which can be depended on. In practice, most distros tend to only support one version of each package, which makes the job easier for the distro maintainer, but makes things harder for everyone else (library authors get bug reports for problems they've already fixed, end users have less ability to choose the versions they need, etc).
In particular, it also makes upgrading a much more complex task. For example, React releases new major versions on a semi regular basis, each one containing some breaking changes, but not many. Ideally there wouldn't be any, but breaking changes are inevitable with any tool as situations change and the problem space becomes better understood. But because the NPM ecosystem generally uses locked dependency lists, end users can upgrade at their leisure, either with small changes every so often, or only upgrading when there's a good reason to do so. Both sides can be fairly flexible in how they do things without worrying about breaking something accidentally.
Under a Linux distribution model however, those incremental breaking changes become essentially impossible. But that means that either projects accumulate cruft that can't ever be removed and makes maintainers' and users' lives more complex, or projects have to do occasional "break everything" releases a là Python 2/3 in order to regain order, which is also more work for everyone. There is a lot less flexibility on offer here.
I don't think these sorts of problems disqualify the Linux distribution model entirely - it does do a lot of things well, particularly when it comes to security and long-term care. But there's a set of tradeoffs at play here, and personally I'd rather accept more responsibility for the dependencies that I use, in exchange for having more flexibility in how I use them. And given the popularity of language-specific package repositories that work this way, I get the feeling that this is a pretty common sentiment.
What happens when your distribution only have old versions, or worse, no versions of the libraries you need? You hoop distribution? You layer another distribution like Nix or Anaconda over your base distribution? You give up and bundle another entire distribution in a container image?
Updating packages should be strictly left to the developer's discretion. That schedule is up to the developer using the packages, not upstream.
Not to mention that dependencies updating themselves whenever they like to "fix vulnerabilities" is a sure-fire way to break your program and introduce breakage and vulnerabilities in behavior...
The "Javascript ecosystem" on my personal experience seems to prefeer installing everything in the global environment "for ease of use convenience" and then they wonder how did a random deprecated and vulnerable dependency get inside their sometimes flattened, sometimes nested, non-deterministic dependency chain (I wish the deterministic nested pnpm was the standard...) and (pretend) they did not notice.
That being said, the Javascript ecosystem has standarized tooling to handle that (npx) that Python doesn't (I wish pipx was part of standard pip), they just pick the convenient footgun approach.
I don't think so. Python is battery included, and most packages in the Python ecosystem are not as scattered as npm packages. The number of packages in a typical Python project is much smaller than a Nodejs project. I think that's the reason why people are still happy with simple tools like pip and requirements.txt.
There's a PEP to get a part of it right [1] - at least the installation of dependencies and the need for virtualenv side, but atm the packaging nonsense is still as bad as it always has been.
>> Are pip maintainers on board with this?
> Personally, no. I like the idea in principle, but in practice, as you say, it seems like a pretty major change in behaviour and something I’d expect to be thrashed out in far more detail before assuming it’ll “just happen”.
As if the several half-arsed official solutions already existing around packaging (the several ways to build and create packages) had deep thinking and design behind them...
Twice bricking my laptop’s ability to do python development because of venv + symlink bs was the catalyst I needed to go all-in on remote dev environments.
I don’t drive python daily, but my other projects thank Python for that.
Lol. You put "simple" and "requirements.txt" unironically next to each other...
I mean, I think you genuinely believe that what you suggest is simple... so, I won't pretend to not understand how you might think that. I'll explain:
There's simplicity in performing and simplicity of understanding the process. It's simple to make more humans, it's very hard to understand how humans work. When you think about using pip with requirements.txt you are doing the simple to perform part, but you have no idea what stands behind that.
Unfortunately for you, what stands behind that is ugly and not at all simple. Well, you may say that sometimes it's necessary... but, in this case it's not. It's a product of multiple subsequent failures of people working on this system. Series of mistakes, misunderstandings, bad designs which set in motions processes that in retrospect became impossible to revert.
There aren't good ways to use Python, but even with what we have today, pip + requirements.txt is not anywhere near the best you can do, if you want simplicity. Do you want to know what's actually simple? Here:
Store links to Wheels of your dependencies in a file. You can even call it requirements.txt if you so want. Use curl or equivalent to download those wheels and extract them into what Python calls "platlib" (finding it is left as an exercise for the reader) removing everything in scripts and data catalogues. If you feel adventurous, you can put scripts into the same directory where Python binary is installed, but I wouldn't do that if I were you.
Years of being in infra roles taught me that this is the most reliable way to have nightly builds running quietly and avoiding various "infra failures" due to how poorly Python infra tools behave.
What are specific problems you have with pip + requirements.txt, and why do you believe storing links to wheels is more reliable? Your comment makes your conclusion clear, but I don't follow your argument.
Pip is a huge and convoluted program with tons of bugs. It does a lot more than just download Python packages and unpack them into their destination. Obviously, if you want something simple, then HTTP client, which constitutes only a tiny fraction of pip would be a simpler solution, wouldn't it?
In practice, pip may not honor your requirements.txt the way you think it would. Even if you require exact versions of packages (which is something you shouldn't do for programs / libraries). This is because pip will install something first, with its dependencies, and then move to the next item, and then this item may or may not match what was already installed.
The reason you don't run into situations like this one often enough to be upset is because a lot of Python projects don't survive for very long. They become broken beyond repair after few years of no maintenance. Where by maintenance I mean constant chasing of the most recent set of dependencies. Once you try to install and older project using pip and requirements.txt, it's going to explode...