Puppet requires both a puppet server (Rails) and client, SSL key exchange, firewall rules for the puppet server, proper DNS records for everything, and a host of dependencies, all of which you need to set up before you can actually do anything.
Any system management solution that requires anything more than a bare machine with SSH and sudo is, in my book, not terribly practical, because these are the lowest common denominator on what you'll get from any hosting provider or OS install.
In a nutshell: I shouldn't need to configure my servers before I configure my servers.
Let me endorse both of the above posts. Having now spent many months tinkering with Puppet, I wish to god I had fully understood the value of not using any of Puppet's server features but had instead just shipped the manifests via some other method and applied them locally.
If you need a super-performant version of that strategy (which you probably don't) try googling up Twitter's "murder" project. (Not as violent as it sounds! It's the crow kind of murder.)
Until I read this thread, I didn't even think of that as an option -- everything in the Puppet documentation, and the community, and in what they offer as far as training courses, indicates that Puppet's client-and-server are the light and the way to rightness.
I'll have to take a second look at the Puppet manifest system coupled with Capistrano (which I really like), but it still irks me that there isn't one tool to handle configuration management across a large number of servers.
Any solid, good Howto's; we're using puppet as well and had the same issues. We use Capistrano + Puppet + some custom scripts; we would like to get rid of the latter.
Is there any howto for setting up a completely clean/empty Linux (Ubuntu / Debian) with RVM as Ruby installer? But thanks for this; at least now I know there is a solution :)
Puppet requires both a puppet server (Rails) and client, SSL key exchange, firewall rules for the puppet server, proper DNS records for everything, and a host of dependencies, all of which you need to set up before you can actually do anything.
Not to join the chorus of "that's not actually true" I do want to take a step further and say that's not even the way I'd recommend using puppet.
The puppet server gives you an alternate authentication method (SSL vs SSH is a toss-up imho), a fileserver (rsync is better), a "dumb client" model where clients are only given the files and configs they actually need, and a master server to process all the manifests and load them into memory, etc., which might under some circumstances (which I have never encountered) help performance. That's about it. If anyone else has other insights on the benefits of deploying a master I'd be happy to hear them.
If you don't need any of those things, you don't ever need to deploy a puppet server. You need ruby and its dependencies, you need facter and its dependencies, and possibly a couple of other libraries (ruby-augeas, etc.), most of which are built in to modern distributions. Ideally you will use revision control as soon as possible. Rsync your manifests and run the puppet client on them directly. That method scales up or down really well and is generally more flexible (see the comment below about difficulty testing changes)
> You need ruby and its dependencies, you need facter and its dependencies, and possibly a couple of other libraries (ruby-augeas, etc.),
But how is that nice? Perl, sh, bash are built in to every distribution. Learn from the RVM setup; it's very good. Or better; just deliver OS packages; we didn't get this far with aptitude to have everyone roll their own!?
On Ubuntu Lucid EC2 images there is no Ruby and if there was, I don't want it; I want RVM. So basically, you are saying that to provision my server I need to provision my server...
Another comment said it can be done auto; is that true or do we still need to install Ruby etc? We have it all automated now, but the use for Puppet is kind of gone; by the time Puppet is up and running, > 80% of the required software is already installed...
Puppet is not a server provisioning tool. It's not a "roll-your-own" aptitude, it's a different approach to the problem. Puppet is primarily meant to manage the sort of end-user configuration for which packages only provide defaults. If you're handling all of that with the package manager, or if your need to manage configuration is minimal, then don't use puppet. It doesn't have lots of dependencies but it does have some, and it's language is syntaxy.
I'm not sure why you'd say puppet isnt a server provisioning tool. This managing of configurations you talk about, the way puppet interacts with package managers to install packages, these are the tasks I think about when I hear server provisioning.
Provisioning and configuration management are converging, but aren't the same thing. I agree that you can use puppet to help a lot with provisioning servers, but that's not its reason for being.
Provisioning is the process of taking a baremetal server and making it usable. Configuration management is about organizing ongoing changes to the infrastructure.
Red Hat Kickstart and Solaris Jumpstart are examples of provisioning tools that don't help much with configuration management. Cobbler is provisioning tool built on Kickstart. Systemimager is a cloning-style provisioning tool.
These tools all help you get an OS up and running on a machine. Optionally, most of them can be used to do a great deal of software installation and system configuration beyond the basic OS install. But most of them aren't, for example, especially interested in helping with issues like adding a new apache virtualhost to httpd.conf, or a new DNS zone to named.conf, or a new root cron entry to run on all application servers, or reconfigure iptables on every system. Tools like puppet and cfengine are very interested in these scenarios.
The line is fuzzy, but when standing on one side or the other its obviously there somewhere.
Yes, I liked the way it can be scripted, but I guess I did understand the purpose then. When you search on Google for provisioning servers + rails, you come onto Puppetmaster almost exclusively. I'll keep an eye on it anyway. Thanks for the insights.
I have one issue with this: I actually like clients being given only the files and configurations they need. Using rsync or git to propagate the puppet manifests implies that each remote host gets all the files, and in those there may be sensitive informations for the other hosts (database passwords for example).
In this scenario, if one of the hosts is compromised, much more information is leaked than in the case of a puppet server. How do you deal with this problem?
We've been using Puppet to manage our servers for some time now. As a group of developers doing our own operations work, we've found puppet both good and bad.
Setting up puppet was relatively straight forward. We had the puppetd auto-updating our servers for a while, but ultimately decided to manually run it when deploying changes. Managing zero-downtime changes was more error prone with it running.
Some aspects of Puppet have over time proved frustrating to us. The top annoyance is we never quite figured out a good way to test our puppet changes before checking them into git to deploy them to our puppetmaster. That has lead to a number of "fixing errors" type commits. The second annoyance we've found is actually highlighted as a feature: no implicit ordering of operations. While it might sound great to be able to reorganize your configs without fear of breaking the deployment, we've found that the tradeoff is that you don't find out that your configuration doesn't define its dependencies correctly until you try to kick a new server after spending months incrementally adding to your existing servers. For us, at least, having an implicit top to bottom ordering would lessen that headache.
Despite some of these headaches, simply having our configuration in version control is a huge win for us. We can setup a box much more easily, and we have a comment trail of why changes were made.
If I had to do it again I would probably ditch the puppetmaster altogether and use an rsync server to distribute the entire configure repository to every server, then run puppet locally to apply changes. This way you can simply modify any local repository and run puppet to apply the desired configuration to any machine you want. When you're happy with the changes you can check them in.
Using the puppetmaster and the puppet fileserver was trickier, essentially I would use FACTER_var="value" to pass in a value to puppet that would use local files rather than central files (which came pretty close to the purely decentralized model anyway).
It's essentially a headless puppet that centers around a workflow of testing changes from an individual checkout of the puppet code on a target server, testing no-op applies of the manifests, and applying the manifest until you're happy enough to commit, push, and roll-out.
This won't help with your second annoyance, sadly, but it should definitely help with the first in quickly pinpointing these sorts of issues without having a messy commit history.
Regarding your troubles with "ordering of operations". I've found it varies on installation, but that each team works to avoid these issues by setting a standard on module development. So that when you "include 'ntp'" you know exactly what you are getting. I've seen many different ideas on how to accomplish this, all of which made it really easy to include without ramifications.
Also, regarding testing. I think this is an issue with both Chef and Puppet. Something I hope someone addresses at some point in the future. I've seen some custom tools with some promise (Chef focused) but perhaps a Vagrant setup might be the best answer these days.
Regarding the non-ordering of dependencies I wonder if it would help to have some sort of shuffle flag. I know that redo has implemented a --shuffle flag to tease out missing dependencies.
You know what I can't even express the amount of dislike I have for puppet from variables that have 4 purposes (who ever thought up :ensure that can be "latest", a version, a requirement of being available or not or if a service is running needs to be shot buried and encased in cement) to the DSL that tries to be declarative yet puppet isn't and allows for half installs due to failures. Chef isn't any better as it's extremely opinionated - AMQP have to use it, deprecated merb have to use it. Cfengine is in it's own world of suck (ever write unportable scripts with no abstractions? well you do now). I'm not being snarky, I gave each a fair shot while evaluating them by implementing a provider for a distribution.
There have been a lot of suggestions (on here and other sites) to run puppet with locally-rsync-ed (rsunk?) copies of manifests, but there are a few things which won't work if you do this, unfortunately. Most importantly is the `storedconfigs' which (afaict) require the puppet server to work.
This means that you lose a large amount of the power of using puppet, by which you can use configurations across machines to do things like collect up all the services you run on a set of machines and generate a nagios config, or firewall config, or whatever. Without using stored configs for this I assume it's possible but will require more explicit configuration rather than the rather more elegant solution provided when using a puppet server.
Side note : I've used puppet on a fairly small scale of up to ~50 machines, and just started using it for VMs, and it's pretty straightforward to integrate into a bootstrap install to get ruby and puppet installed so that you can use it to install all the rest of the dependencies. But of course, the most use is for changes later on rather than at install-time when there are already a huge number of tools to set up or image machines or whatever.
Side side note : I've not used Chef to compare this with.
I'm setting up a single server, and even there, puppet and chef come in very handy. I can reuse the recipes on a local vagrant-managed virtual box OS and test both the server configuration and the deployment.
At the moment I like chef-solo a bit better (because it uses an internal dsl).
Just because I beginning puppet standalone and chef-solo - are there some longer term experiences, pitfalls, etc, you can share?
I'll write up more about Chef later, but I really look at the two differently. Puppet is really great at managing infrastructure and server state. Chef is really good at integrating with your application (especially if you are using Ruby). I typically think of Chef as a framework to program your infrastructure against. Puppet is more of the middle manager. :)
Both are easy to test, with Puppet winning slightly with 'puppet apply <manifest>'. Chef-solo is nice but takes a little bit more to setup (solo.rb and node.json for example). Either way, test and see what you think will work best for you.
We use puppet at yelp. It's okay, but not perfect (we're using 0.25 on a mix of Centos and Ubuntu). Here are some gotchas and pitfalls I've run into:
It uses tremendous amount of memory, both the puppetd clients and the puppetmaster server. We were experiencing regular crashes (unrelated to memory usage, AFAICT), when we were on 0.24, that we have init/upstart/ubuntu-process-management-du-jour manage it.
Puppetmasters seem to stop responding and (from what I can tell from lsof) forget about some file descriptors every so often, and we need to hard-restart them, usually using kill -9.
There isn't solid support for distributing files via any method other than the puppet:// scheme (although supporting http is in the works), which means puppetmaster must both evaluate the configuration and serve files, and it doesn't seem like a very efficient when serving files.
The documentation is less than stellar. Valid examples are not included, and there are exceptions to exceptions in the DSL. For example, the defined() function determines if a class or resource has been defined; for resources you do defined(ResourceType[title]), and for classes you do defined("class::name") (defined(Class['class::name']) doesn't work here, even though you specify dependencies using Class["class::name"] syntax). I had to find this out by digging deep in the bug tracker and mailing list. I find the documentation difficult to navigate, there's no unified "here's the syntax" document, and there aren't enough indications of which version of puppet supports which language constructs.
The certificate management is extremely subpar. By default the puppet clients connect to a host named puppet. But the puppetmaster generates certificates with a CN of the hostname of the puppetmaster. This made setting up multiple, interchangable, load balanced puppetmasters problematic -- the puppet clients then complain that the server identity changed between runs. The CN of the puppetmasters should be "puppet". There are options to override the CN and the Alternative Names when the CA and PM certs are generated, but we had trouble getting them to work -- figuring out the problem was easy once you realize the fields in the certificates were always being generated wrong. We had to settle on generating a puppetmaster certificate once with the right values, then copying that to all our puppetmasters (really, this is how you manage SSL a cluster of web servers, you don't have a certificate for each web server with its own hostname in the CN, you have one for *.example.com or www.example.com and every server serves that name). We also had to turn on autosigning and we clean out the certificate store on the puppetmasters periodically to avoid certificate signing conflicts between puppetmasters. The SSL is a nice feature, and I definitely see it as a necessity for security purposes, but it could be cleaner.
You definitely need multiple puppetmasters if you have a largish environment. I don't consider our environment especially large, but we've had load issues when we ran one puppetmaster. Even distributing the puppet runs using the splay option didn't help.
A guy on my team wrote a function to recursively template a directory of files. This made mass file management easier, otherwise you need to specify each file individually in a file {} stanza.
We have scripted setting up a puppetmaster and a puppet client, and modified the default (I believe ubuntu provided) init.d script to give the command line options related to the next point...
I had issues with the defaults specified in the puppet.conf (and puppetd.conf and puppetmaster.conf or something) and the section names in the files (they are in .ini format), and getting the command line to override them. It's been a while since I had to deal with this (since we worked around it), but there's a thread at http://www.mail-archive.com/[email protected]/msg0... about the --config command line option. Related to this, we run puppetmaster with a config dir of /etc/puppetmaster and a vardir of /var/lib/puppetmaster. This had made things a lot easier; by default, everything goes in /etc/puppet and /var/lib/puppet, and the files for the puppet client and the puppetmaster get mixed in together when running puppet on the puppetmaster. Since we've scripted both client config and puppetmaster config, it's easy to just blow one away and recreate it.
We didn't used custom facter facts or custom functions on the puppetmaster initially, but I recently setup our environment to support those and if you know ruby (or can muddle through it), it's reasonably easy to extend the capabilities.
We mainly use it to distribute files and create user accounts, we've had problems on and off with anything else more advanced (even service management has been a problem at times--things stopping and starting when they shouldn't--but I attribute this to general issues with ubuntu moving to different versions of upstart at times). Having modules that do things like manage apache config, or sudoers or nagios config might come in handy if you started with it using puppet, but when you're moving an already established config to puppet, it's easier to just distribute the files. Especially when distributions like ubuntu (debian?) support a subdir of config files for apache that are managed with symlinks.
I don't mean to present it like it's all bad. It has allowed us to centralize and version most of our config and bring new machines into service relatively fast. We were throwing around using it to configure EC2 instances, but really, I think it would be easier (and faster) to use custom AMIs. We have not had to do this yet, though.
Some of these issues may be fixed in 0.26, we have not gotten around to playing with it yet.
So it has its quirks, and it's not so bad if you really spend time learning it and have enough experience to come up with work arounds for where you'll experience pain points -- this is no different than any other software package. Considering it's what I know and I'm aware of the quirks, I'd use puppet on other networks. And I'm sure some of the problems we've had with it are because we're doing something non-standard or using it in a unique or unsuggested way.
I really should write up some of our recipes to help out other people.
Puppet is one of those things I like in principle but is too much of a PITA to set up. Take for example the class definitions, the class definitions don't appear to offer a great deal more than a shell script, but in the example shown in TFA for the price of about 6 lines of puppet code we could've just run rsync -e ssh -avz ntpd.conf puppet@server:/etc/ntpd.conf && chown root:root /etc/ntpd.conf && chmod 644 /etc/ntpd.conf.
Of course in the real world you'd have a tarball you'd rsync over, then use SSH to extract and run the base script and robert's your father's brother. A lot simpler and the way I'd automated Solaris admin years ago. Puppet's drawback is that it doesn't offer anything sufficiently compelling for people to change from what they use, and presents an awful lot of work in it's syntax for people getting started. Once it's up and running it's brilliant, I've seen it. But it just seems like so much hard work to get there it's like a barrier to entry.
The original example is actually kind of bad and doesn't demonstrate puppet's abstraction facilities.
You can define a custom resource, for example "system_file" that provides a default "root:root, mask 444" permissions such that you just have to define a source and destination for every file, overriding the default permissions when you want.
Where $configfiles might be the puppet server or some other location. One of the things you get with puppet is access to any of the local host properties that can be discovered with facter, so you can dynamically configure something like a source file.
Then you could override that again in individual file resources. If you need several file resources with the same attributes you can override them within the scope of a class:
The real point that should be made about Chef and Puppet is that they are so similar, it really doesn't matter which you use. Using one or the other is a much more important choice than which one you use.
I don't know the best way to express this sentiment (feels like there should be a word for it). But really, just use something to automate your infrastructure and your life will be measurably better.
It's been extremely interesting to watch these meta server tools evolve. We're reaching the point where there's not too much of a difference between a scripted network graph and a suite of VMs and cloning abilities. Each technique would have it's advantages, though. Perhaps somebody with large scale infrastructure experience could do a side-by-side comparison?
We configure our production servers and push new releases there with Puppet. I like Puppet: its fail-safe and reliable.
There is, however, one thing I don't fancy in it. Puppet does not support insecure client–master communication. Requiring SSL communication is OK, but one should be able to switch it off if it brings no value.
We are running our our servers on AWS, and we rely solely on AWS security groups to grant and deny accesses. Puppet's SSL traffic brings no additional security to us; it only complicates matters. For example: we would like to shut down the Puppet master EC2 instances when they are not needed. However, this is not possible, since after start-up the EC2 instances have new IPs, and this breaks the Puppet-signed SSL certificates.
That's correct. The problem is that the EC2 internal IPs change even though the instance would have an Elastic IP. EC2 instances use internal IPs when communicating with other EC2 instances (this is a feature of AWS DNS). As a consequence, Puppet clients cannot access the master by using the master's Elastic IP.
That sounds rather inconvenient. Are the external elastic IPs non-routable internally? I mean, if you add the IPs explicitly to hosts files then will traffic to those IPs not work?
Assuming you use a single type and version of the OS (say ubuntu vXXX) -- Does it make sense to use the OS native packaging system instead of something like Puppet?
I.e., maintain a private packages repository where you add your custom packages, and have the various servers pull from that repository?
Obviously, this doesn't work if you have different types of servers - but for many servers configured identically, it should work.
Puppet requires both a puppet server (Rails) and client, SSL key exchange, firewall rules for the puppet server, proper DNS records for everything, and a host of dependencies, all of which you need to set up before you can actually do anything.
Any system management solution that requires anything more than a bare machine with SSH and sudo is, in my book, not terribly practical, because these are the lowest common denominator on what you'll get from any hosting provider or OS install.
In a nutshell: I shouldn't need to configure my servers before I configure my servers.