Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Semantic Versioning - A technique for avoiding "dependency hell" (semver.org)
78 points by mojombo on Dec 15, 2009 | hide | past | favorite | 24 comments


While this is a reasonable (and common) version numbering scheme, it does little to address dependency hell.

A trivial bug fix can be backwards incompatible with systems that have been built to rely on the bug.

The presence of new, unused features in a library can impact the runtime behavior of the library in important ways.

All changes of any kind must be considered to be potentially backwards incompatible for some consumers.

So the rules for deciding whether to increment a version number by 0.0.1, 0.1.0, or 1.0.0 don't hold up. A producer cannot predict the impact of a change to all consumers.


Yes.

Additionally, this system dies horribly once you introduce branching.

Versioning is another problem I want to introduce to the Test Case Wiki. The test case wiki is an idea I've had for a public wiki for hairy problems with numerous, poor implementations. I.e. a page on implementing a library for handling times and dates. "Have you thought of the following corner cases?". Software Versioning belongs there too.


One way to deal with this is by renaming the artifact itself. So for a project foo-1.0.0 with a branch for Initech, instead of trying to figure out how to increment the version, you make foo-initech-1.0.0, and start incrementing its versions independently until you can get back in sync. This seems to work well as long as foo-initech doesn't itself start getting used as a dependency by anyone other than Initech.


Can you elaborate on the kind of branching you're doing, and what your current favorite solution for that situation is?


At the day job, we commonly see situations where the customer has a bug they need fixed, and they're not willing to wait for the next scheduled release. We give them a one-off release containing only that fix. What do you label it? If you label it 1.0.1, then the next "official" release will be 1.0.2. Worse, what is the label for another release off the branch, after your trunk release of 1.0.2? I find this unsatisfactory because the two releases are not necessarily related to each other. AFAICT, all version systems that involve only numbers and dots will either fail to handle branches, or become horrifically complex.

I've seen products where releases were branches off branches. This happened because the customer is extremely risk adverse and we had new code in trunk that they didn't want to test; they wanted known-good code plus a subset of trunk. Management went along with it because the customer is several orders of magnitude larger. We strayed from trunk for so long that trunk got dropped, and one of the branches was designated the new trunk.

My general goals for a version system are:

    1) provide a unique version for every build
    2) Give an indication of where this code came from.
I currently advocate a Git-like solution. I don't promise that this is 100% free of corner cases, but I believe it's better than everything else I've seen. Give every build a SHA1 / GUID. If you're running Git, this is just the SHA1 of the tree, when you build. That is the build's "official version". If you want to see where the build came from, git can draw you a pretty picture. When doing the build, allow the user to specify a human-friendly label, as a string. This can be anything, from "v1.0.1" to "v1.0.0 + one-off fix for MegaCorp" or "Bob's developer build for testing foo". In the product, display both versions.


Interesting. Can you explain why you're doing one-off fixes instead of creating a general release with the fix that everyone can use? This seems like a bit of a complicated edge case that most people don't need a solution for. Semantic Versioning is very simple and has no intention of solving every possible versioning problem.


I never thought it (one-off fixes and branching) was a good idea. It happened because one customer constitutes a large percentage of our revenue, and the business guys made the decision with little care about the effects it would have on the software.

Yes, this is all a complicated edge case, but in certain contexts (corporate software, where you aren't calling the shots, or, you are calling the shots but aren't willing to tell a customer constituting 40% of your revenue to fuck off), you need a better solution. Additionally, at the start of the project you don't always know whether you'll need to support branching. Often, once you figure out you need branching it's too late to fix. Fixing requires changing the build process, testing time, educating QA and Support about the new process, etc.

I'd rather have a well-understood, general purpose solution ahead of time. IMO, "git versioning" accomplishes this, and it's just as simple.


Consider the other side. (These are all things that happened before I arrived.)

Vendor comes in and says, "yes, our software can do it." Then, everyone finds out that it can't. In comes some expensive customization that doesn't quite work right, but everyone has figured out how to work around it well enough. The company is using the database behind the scenes to provide the functionality we need. As such, it's massive heartache to even think about 5 years of upgrades.

However, a fatal error comes up that can't be worked around because of the custom work done by the vendor at implementation time. We need a patch to the executable. However, the idea of just moving up to the new version has everyone here in a cold sweat. Our peers tell us the product has gotten worse than better. It's a 9 month implementation effort, with new training, new workarounds for the bugs, new everything.

You still have the code. You're telling me a simple patch is unfeasible?

This is a very common problem in consultingware. I've seen variations on the theme from both sides of the fence.


> hairy problems with numerous, poor implementations. I.e. a page on implementing a library for handling times and dates

You could start with http://naggum.no/lugm-time.html


Not to mention the fact that many open source projects spend a long time in the volatile pre-version 1 stage.


That's the killer for me. I wish projects that release version 0.17.2 would just get on with it and give us a 1.0.

Actually, I guess I'd be happy if the constraint was simply "don't break shit in a patch release, even if you're 0.x".


Hardly a new concept, and one that many companies and projects have used for ages. I'm not criticizing this policy at all, but the impression of taking credit rubs me the wrong way. Yes, there's a couple sentences in the middle: "This is not a new or revolutionary idea. In fact, you probably do something close to this already." But, actually, I and many others have done _exactly_ this, not just close to it, for a long time.


> Hardly a new concept, and one that many companies and projects have used for ages.

Certainly. And I don't claim to take credit for any of these concepts (and explicitly say so). The whole point here is to give this idea a name and clear spec so that I can tell people my software uses Semantic Versioning instead of writing it all out every time. It's useful to me and my coworkers, so I decided to share it in case others find it useful as well.


This is called Unix shared library version numbering convention.

* http://apr.apache.org/versioning.html * http://www.freebsd.org/doc/en/books/developers-handbook/poli... * http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries... * http://home.bolink.org/ebooks/unix3/mac/ch05_04.htm

If you can make it popular in Ruby world you can call it whatever you please :-) You should be able to find 10 (and probably 20) year old rants about how technology X does not do version numbering right, and should use Unix shared library standard.


In that case, can I suggest making that clear in the initial section of the document? For example, changing the sentence "As a solution to this problem, I propose a simple set of rules and requirements that dictate how version numbers are assigned and incremented" to something that makes it more clear that you're codifying and naming something which was already a widely recognized scheme.

(I'm also having a hard time believing that we don't already have a name for this system, although I have to admit that a short amount of searching did not come up with one.)


I think this is a solution without a problem.

Version numbers are effectively meaningless to most package managers. Sure, they parse the numbers and can do basic comparison about whether something is less than, equal to or greater than a given version. But all a package manager cares about (afaik) is its requirements and conflicts. If you have a way to ensure its requirements exist, and it does not conflict with anything already installed, everything is fine. (Oh, and that it provides what something else requires, which is somewhat the same as a requirement but in a different context)

To solve this you simply build your packages so they will never conflict, both logically and physically on disk. This way you can have any version of any software installed all at the same time. The software is compiled to link against the location to the software it depends on. Symlinks create the basic structure of what the "default" paths should be for a given application, and an alternate path can always be specified manually to call a different version.


It solves a significant problem that I've encountered several times in real world packaging setups. The problem it solves is outlined on the linked website.

Semantic Versioning is also about more than just dependencies. It's about transparently and accurately communicating the impact that a new version will have on your existing code. As a user of a large number of libraries, a more rigorous approach to versioning would make my life immensely simpler.


Interesting idea. I could imagine whole environments of development stacks being defined: e.g. a Rails 2.3.5 production stack, or a Django stack, or a Clojure webdev/ data mining stack, and all according to exact version numbers.

I could see that for Big Ticket projects such as Rails/ Clojure/ Django etc. people would maintain an up-to-date 'semver' database.

The only thing is that the name 'semantic' is not really a winner (it sets people on edge). Maybe something else will survive in the end, unless people are comfortable enough with the pragmatic value of the idea and will look past the name.


Why is there an arbitrary requirement in the middle of it about the format of tags used in a version control system?

Symbol versioning is a better approach, on systems that support it.


I think a lot of the issues of "dependency hell" come from the non-obvious way to specify a specific version of a shared library to link against (in fact, I can't even find out how to do this in the documentation for gcc or ld now, but I know I've done it, and that it's possible).

Take libncurses for example; I'm using it as an example because there was a time when multiple versions were installed on many linux distros. Both ncurses4 and ncurses5 could be available, and in fact, this used to be case with a lot of linux distros, ncurses4 was often installed for binary compatibility with stuff that was previously compiled.

So you end up with the following files on your machine:

   libncurses.so.5.7
   libncurses.so.5 (symlink to .5.7, maintained by ldconfig)
   libncurses.so (symlink to .5.7, mainted by ldconfig)
Ideally, you'd be able to install ncureses4 on the same machine, and you'd end up with these files also:

   libncurses.so.4.x
   libncurses.so.4 (symlink to .4.x, maintained by ldconfig)
the .so entry is a link, maintained by ldconfig, to the latest version. There is no conflict because shared linkage is against a specific A.B.C version, resolved at runtime (see below).

So if you compile/link with -lncurses, you get the latest one (which is a reasonable default), as that's what the .so points to. But if you have ncurses4 installed, and you know you want to link against that API, you need to link against the specific file (with a full path and version number in it), rather than use ld.so.cache and gain the advantages of runtime library location resolution.

So the work around for the inability to specify an exact version number to link against without having to give the full path to the library is to move some of the version number to before the .so and make it part of the library name. So you now you link with -llibraryX-2.4 which looks for liblibraryX-2.4.so. No one calls this, or thinks of it as, libraryX-2.4, they think of it as libraryX. Since there is no standard way to do name or indicate version numbers of with this work around, you sometimes end up with these different formats:

   libXXX.so.1.2.3
   libXXX-1.so.2.3
   libXXX-1.2.so.3
   libXXX-1.2.so.1.2.3
   libXXX-1.2.3.so
Each library maintainer making their own version number format. This could all be the exact same library version.

A quick demo to show that when using -llibrary (and -shared), the full path isn't stored in the binary, just the filename and version number, which is resolved at runtime by the dynamic linker in ld.so to an actual full path using the cache created by ldconfig (hardly conclusive, but you get the gist -- could also use binutils stuff here to see this):

   $ strings /etc/ld.so.cache | grep libmagic
   libmagic.so.1
   /usr/lib64/libmagic.so.1
   $ ldd `which rpm` | grep magic
        libmagic.so.1 => /usr/lib64/libmagic.so.1 (0x00007f9dfed95000)
   $ strings `which rpm` | grep libmagic
   libmagic.so.1
In this case, we see that ld.so.cache maintains a library name to filename mapping, rpm mentions just the library name (with version number), and ldd resolves that to a full path, and that there is no mention of the full path in the rpm binary itself.

I'd really like to see a -llibrary=version option that helps with this, allowing a specific A, A.B, or A.B.C (or whatever string is after .so in the filename)... maybe I was thinking of some linker other than gnu ld that works like this.


If you know what version of a library you want to build against and you know the path to it you can link against it. The compilation and linking is no longer automatic because you have to specify your dependent library specifically, but you can put it anywhere you want and it won't conflict with a different version of the same name.

One way to simplify this is to customize your library builds and associate a unique pkgconfig .pc file with them. When you build your application, reference the .pc file for the library version you want. If the application does not support pkgconfig you could write a wrapper that parses it and provides the build/link options you desire. If you're building packages you're doing enough work by hand that this should not be unnecessarily complex.


If you know what version of a library you want to build against and you know the path to it you can link against it. The compilation and linking is no longer automatic because you have to specify your dependent library specifically, but you can put it anywhere you want and it won't conflict with a different version of the same name.

But that's my point: we should be able to get automatic linking. Different versions of the same library already don't conflict because the file names are different:

   libncurses.so.4.0.1
   libncurses.so.5.1.1
When you build your application, reference the .pc file for the library version you want. If the application does not support pkgconfig you could write a wrapper that parses it and provides the build/link options you desire.

True, I see what you're saying. It always seemed to me that custom pkconfigs for this case, that generate full paths to the libraries instead of -llibrary options, is a bandaid on not having automatic linking against a specific version.


i too hope one day all this will be automatic (though we still have to specify a version, and keep up to date on what version supports what, so its hardly automatic for the dev or packager). the real fix to me is to embed all relevant information in the binaries and let the dynamic linker figure out which one should be used at run time. compile time would merely embed what version was used to build the app. maybe this is already possible? i'm curious...


While that would be cool, it doesn't seem very pragmatic. I think I'd rather have control over exactly which version is chosen (where I can use implementation details in making a decision), rather than have it decide automatically based on claims expressed in the library metadata.

though we still have to specify a version, and keep up to date on what version supports what, so its hardly automatic for the dev or packager

I think the common case is for new development, you'd most likely develop against the most recent release version, but you don't need to keep up to date on which version supports what because you can continue to use the old versions with long-since compiled binaries as long as it can be installed along side the latest version. I think people and distros, in general, are too quick to remove older versions from being installed (or even available), which increases the on-going maintenance requirements of still popular binaries.

On the other hand, during transition periods, distros have been pretty good about this, like the libc5 vs libc6 transition, and by providing compat packages. But this has mainly been an issue with closed source abandonware (Skype for a long time was still using OSS and needed an ancient version of some audio libs).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: