Out of curiosity, what makes you say that LXC has better security than Docker? What aspects of LXCs design or implementation provide additional security controls over those of Docker?
When GP said unprivileged containers, they meant user namespaces (that's the terminology LXC uses). Docker doesn't default to user namespaces being used (LXC does) and within Docker it has many limitations that LXC/LXD do not. LXC/LXD can also isolate containers from each other by mapping different uid_maps, but with Docker all containers use the same mapping.
Disclaimer: I'm a maintainer of runc, the runtime Docker uses.
I understand that Docker's user namespace support is relative basic, but the point I was looking at was that if you don't run your contained process as root (e.g. unprivileged) by specifying a USER in the Dockerfile, then I wasn't aware of major differences in the security of an LXC container as against one running under runc
ofc happy to be corrected, as I'm aware you'd know more about this :)
It really depends whether you want to compare similarly-configured containers or the defaults.
If you compare the defaults, LXC wins overall because they have rootless containers and user namespaces by default (runc has them too -- I implemented them -- but it's not the default in Docker). To be balanced, LXD's isolation of individual containers is not on by default either (because of backwards compatibility requirements) -- but Docker doesn't have an equivalent feature. If you configure a Docker setup to be as-close-as-possible to an LXC setup, then it's much harder to give a definitive answer. Generally, the containers we set up look almost identical from the kernel's point of view so we have similar kernel 0day problems. So it comes down to the security of the runtime in particular.
I am currently working on solving several pretty fundamental security issues that exist both within LXC and runc (and many more programs generally)[1], so it's not like either is perfect (though LXC does have more code to defend against the attacks I'm working on fixing). LXC does make use of more of the kernel hardening work that we (both the LXC folks and myself) have worked on. A trivial example is that LXC uses TIOCGPTPEER (a feature I originally implemented that allows you to avoid certain theoretical attacks by container processes against the runtime) but Docker doesn't use it (and because runc doesn't have a container manager by design we can't implement it in runc). LXC also supports using pidfds (a new feature in Linux 5.1 that Christian Brauner has been working on for a while) which allow much nicer methods of avoiding PID recycling race conditions -- with runc we still use the old pid+starttime method which is prone to well-known (though usually harmless) attacks.
Funnily enough, I'm actually giving a talk about this topic at the end of this week[2] and was writing slides when I saw this thread. :P
That’s correct. Lxc is not “more secure” than Docker in any meaningful way. They are equivalent in their use of the Linux containment “plumbing”: cgroups, namespaces, capabilities, etc.
LXC supports user namespaces where containers are isolated from one another. LXD furthers this by remapping containers on restart (so you can change mappings). And you can "punch out" individual mappings for shared volumes and so on. Docker doesn't support this. In fact the recent expansion of the number of mappings allowed by the kernel is work done by Christian Brauner (an LXC maintainer who I collaborate with) -- this feature is so useful to LXC that they had to add the ability to have more mappings.
LXC also supports unprivileged operation (which I named "rootless mode"). Docker gained support for this very recently as an experimental feature in 19.03 (still not released), but LXC has supported it (and defaulted to it whenever possible) for years. Though, the Docker one is arguably better in some respects of lack-of-privilege (thanks to great work from Akihiro Suda and Guissupe Scrvano) but it's still new.
LXC also has put a lot more work into fundamental security work (both in-kernel and within LXC).
Disclaimer: I maintain runc, the runtime Docker uses. There is no question that LXC has better engineering in this department. I collaborate with them quite often, but they have more engineers working on fundamental problems within containers.
Thanks for the detailed answer. A few follow-up questions if I may.
1. Docker does support user namespaces today, correct? Your reply seemed to imply that it doesn’t.
2. Once rootless mode is released in Docker stable, the only difference in available security features between lxc and Docker will be the more flexible uid mapping for user namespaces, correct?
3. The flexible uid mapping feature, compared to user namespaces with static mapping as implemented by Docker, is an additional protection against container-to-container attacks, but not against container-to-host attacks. Did I get that right?
4. User namespaces, with or without flexible uid mapping, are considered a less secure containment method than seccomp and selinux/apparmor, all of which Docker/runc and lxc support equally well, correct?
> 1. Docker does support user namespaces today, correct? Your reply seemed to imply that it doesn’t.
Yes (though I don't agree my comment implied that Docker doesn't support user namespaces at all), but it doesn't support having different mappings for individual containers. This has both usability problems (--volume is painful to use) and security problems (inter-container attacks are still possible if you can "break out" of the container or otherwise disrupt the other container).
> 2. Once rootless mode is released in Docker stable, the only difference in available security features between lxc and Docker will be the more flexible uid mapping for user namespaces, correct?
Security features, (arguably) yes. But I would still argue that LXC has more security hardening work put into it than Docker. Of course they've had their own security issues but there definitely are arguments to be made that it isn't identical. I outlined some examples here[1].
Also the default configuration is still going to be run-as-root-without-user-namespaces with Docker (meaning the vast majority of users are running hideously insecurely). LXD and LXC defaults to using user namespaces. To be fair, both use seccomp and AppArmor/SELinux policies by default -- but depending on seccomp and AppArmor/SELinux is a much worse security position than
> 3. The flexible uid mapping feature, compared to user namespaces with static mapping as implemented by Docker, is an additional protection against container-to-container attacks, but not against container-to-host attacks. Did I get that right?
Yes.
> 4. User namespaces, with or without flexible uid mapping, are considered a less secure containment method than seccomp and selinux/apparmor, all of which Docker/runc and lxc support equally well, correct?
That's not quite true. User namespaces are arguably a much better containment method for containers. There are hundreds of user-namespace related hardening checks within the kernel (as well as the obvious "the euid space is different" protections) which you don't end up taking advantage of if you run in &init_userns. In fact, most kernel developers working in this space (namely Eric Biederman) don't consider security issues to be as serious if you can't exploit them without disabling user namespace protections. CVE-2019-5736 and CVE-2016-9962 were both blocked by using user namespaces.
But yes, there are some breakouts that user namespace support in your kernel have historically caused (and we have seen that many times) -- but that's why both Docker and LXD block unshare(CLONE_NEWUSER) with seccomp. But you can have all three! And (once Docker is configured) then all three support them all equally effectively.
I appreciate that you have a nuanced position on the topic of Docker security, based on deep expertise. Sadly, that nuance is lost on 99% of the people I see shouting that "Docker is insecure", the same people who presumably downvoted my original comment into oblivion. They are calling Docker insecure not because they understand what you explained (they don't), but because they have heard half-truths or outright fabrications, and are repeating them with absolute conviction, without bothering to argue their point or check even the most basic facts. As someone who has a lot of actual first-hand experience with Docker I find that very frustrating.
So, although I agree with everything you said, and appreciate that you took the time to write it down; I believe that your answer has unintentionally vindicated the many people lurking on this site who hold the widespread, almost cult-like belief that Docker is very insecure - insecure to the level of gross negligence, in a way that you and I understand it isn't.
LXC is daemonless, there is no process hanging around after the container start, so it starts the container and uses any privileges required to setup things like networking, mounts etc and then drops privileges.
LXC had unprivileged container support since 2013 so that part is fairly mature now. 'Unprivileged' in this case means the container process itself is running as a normal user.
LXC does have a container manager though, which is a single process that stays alive for the life of a single container. Within runc (the runtime Docker uses), we don't have a container manager but the downside is that now the upper level needs to keep alive the descriptors and other kernel objects that allow for safe container management by the runtime.
[I maintain runc, and collaborate with the LXC folks.]