> Doesn't scale. Having passwords in a plain text file is not a scalable solution for users directory. Can probably go up to a hundred users, but not much more.
This simply cannot be true. A users directory that is frequently read will be in cache. The file is also fairly simple to parse. Even on old hardware you should be able to parse thousands of users.
It's more of a problem of the number of nodes your users will be accessing than the number of users. It's a PITA to make mechanisms for password changes, for adding and removing users, for unlocking accounts, etc. when you have a distinct file on each server that needs to be changed. As well as adding new nodes to the environment and bringing them up to date on the list of users and etc.
Fair point, networking always makes things more complicated. I'm not sure it's really that much more complicated though. Like any concurrency problem with shared state, a single-writer and multiple readers keeps things simple, eg. a single host is authoritative and any changes are made there, and any file changes trigger automated scripts that distribute the changed file over NFS, FTP, etc.
As long as new nodes start with the same base configuration/image this seems manageable. Simple to do with inotify, but even prior to that some simple C programs to broadcast that a change is available and listen for a change broadcast and pull the changed file is totally within the realm of a classic sysadmin's skillset.
You would have to lock the file, or guarantee consistency in some other way. Right now, I don't believe Linux does anything about consistency of reads / writes to that file... which is bad, but we pretend not to notice.
So... the system is kind of broken to begin with, and it's kind of pointless to try to assess its performance.
Also, it would obviously make a lot of difference if you had a hundred of users with only a handful being active users, or if you had a hundred of active users. I meant active users. Running programs all the time.
NB. You might have heard about this language called Python. Upon starting the interpreter it reads /etc/passwd (because it needs to populate some "static" data in os module). Bet a bunch of similar tools do the same thing. If you have a bunch of users all running Python scripts while there are some changes to the user directory... things are going to get interesting.
> You would have to lock the file, or guarantee consistency in some other way.
I think the standard approach to atomicity is to copy, change the copy, then move that copy overwriting the original (edit: file moves are sorta atomic). Not perfect but generally works.
I agree that this approach is not good for a users directory, I'm just disagreeing that the reason it's not good is performance-related.
Moves are atomic. During the move, at no time is it possible to get the contents of file 1 and file 2 confused when reading from the file descriptors. (Confusion by the human operating things is eminently possible.)
Most systems come with "vipw" which does the atomic-rename dance to avoid problems with /etc/password. In practice this works fine. Things get more complicated when you have alternate PAM arrangements.
A whole bunch of standard functions like getpwents() are defined to read /etc/password, so that can't be changed.
`getpwents()` is not defined to only read `/etc/passwd`. There is only a requirement that there is some abstract "user database" or "password database" (depending on if you're reading the linux man pages or the Single Unix Specification man pages).
In practice, `getpwent` on linux uses the nsswitch mechanism to provide return values from `getpwent`. One can probably disable using `/etc/passwd` entirely when using glibc if: all users do use `getpwent`, and you remove `files` from the `passwd` entry in `/etc/nsswitch.conf`.
I had >30k users on a bunch of systems in 2001 (I inherited that approach, mind, I'm not -recommending- it).
We moved to LDAP a couple years later because it was a much nicer architecture to deal with overall, but performance and consistency weren't problems in practice.
This file is a public interface exposed by Linux to other programs. So what if Linux caches its contents when eg. Python interpreter on launch will read this file. And it's not coming from some "fake" filesystem like procfs or sysfs. It's an actual physical file most of the time.
This simply cannot be true. A users directory that is frequently read will be in cache. The file is also fairly simple to parse. Even on old hardware you should be able to parse thousands of users.