> The Docker engine default seccomp profile blocks 44 system calls today, leaving containers running in this default Docker engine configuration with just around 300 syscalls available.
...preventing devs/ops people to run tools like iotop, unless extra capabilities are added.
I'm all in for containers, cgroups/namespaces but at the moment it's namespace isolation for the price of less features. Unless namespaces become first-class citizens in the Linux kernel, it will always be more efficient to just run on VMs or even Bare Metal. At least for non-planet scale workloads. :-)
This is because docker makes the fundamental mistake of conflating packaging with isolation. "Packaging" is achieved in docker by using the OS to do an amount of sandboxing and then letting the user perform whatever non-reproducible crap they like before balling the whole thing up and calling it a package.
If instead you make an app author actually figure out what their dependencies are and how to fetch/build them - in a system such as Nix, you not only get reproducible packages, you also get to decide to apply actual os-level isolation on a case by case basis - a developer doesn't necessarily need/want these barriers on their dev machine.
So a system like Android's? Honest question, I don't know of that's a good model to exist in general purpose Linux systems. There's also Fuchsia but I'm not sure if it's POSIX.
This is wrong on a fundamental level. Containers are nothing more than regular processes that are launched leveraging some of the kernel’s built-in namespacing features.
When we talk about applying seccomp profiles to containers it just means applying them to the process — exactly how the rest of your system uses them. They are about limiting what the process itself can do, not you as an admin. Denying the ability for sshd to run iotop with SELinux doesn’t stop you from running it.
Running containers on a bare-metal host is exactly the same as running processes on a bare-metal host.
...preventing devs/ops people to run tools like iotop, unless extra capabilities are added.
I'm all in for containers, cgroups/namespaces but at the moment it's namespace isolation for the price of less features. Unless namespaces become first-class citizens in the Linux kernel, it will always be more efficient to just run on VMs or even Bare Metal. At least for non-planet scale workloads. :-)