Having gone from managing several thousand physical to virtual/cloud instances, ...

movedx · on Dec 10, 2024

> On premise in my opinion needs a dedicated team managing hardware and leverage solutions to provide that as VM's/Containers/etc to teams.

You're assuming that "On premise" equates to "inside our building, in racks we've installed, using power and networking we have to manage." You're correct if that's the case for your business, but my argument is based around the idea that you can use _managed_ hosting providers of physical hardware that'll be either next door to you, in the same city, or close to your users (i.e, you're a business in Germany but your customer base is in London, so you host the servers using a London based provider.)

The idea that you have to manage hardware is greatly diminished when you consider the availability of managed providers that are dirt cheap.

hmmm-i-wonder · on Dec 11, 2024

That's a good point, and at small and medium scales those are very cost effective alternatives to cloud or fully managed. Not many managed providers can provide a full equivalent to an on-premise team, and it quickly becomes cheaper to run it yourself once you scale into large dedicated instances and high network traffic. Before then though its often better than the cloud for many situations.

antonvs · on Dec 10, 2024

> On premise in my opinion needs a dedicated team managing hardware and leverage solutions to provide that as VM's/Containers/etc to teams. Another team focused on OS level security and base image, then your dev teams can effectively focus on their app and leverage the automated tools provided by the hardware and OS teams.

Exactly. At which point, you’re essentially reinventing a cloud, usually not very well. If you have access to really good people you can pull this off, and that’s why you see so many people on HN doing the “who needs cloud” flex.

But the reality is that for most companies, managing non-trivial amounts of hardware is not a core competency, and they regularly shoot themselves in the foot by trying it.

apelapan · on Dec 10, 2024

If you are in the cloud, you are going to need a team that understands cloud networking, storage, deployment, security etc. You will need enough people to maintain support rotations and survive normal churn.

It seems like many people/organizations belived that they would be rid of the whole "operations problem" once they shifted all their workloads from on-prem to cloud. They believed that they paid a full team for running cables and replacing broken fans/hard drives/PSU:s, when that aspect of on-prem is a tiny (but non-zero) amount of work.

movedx · on Dec 11, 2024

I don't believe a lot of this is required.

OS level security? So, "apt update && apt upgrade", then? I mean, what else are you doing, writing patches for the kernel? Checking every line of code that runs? Are you aware of how effective SELinux and systemd containers are? Just a simple firewall at the OS level? Maybe even just using Tailscale (or the open source Headscale) to introduce zero trust access capabilities.

There's a Terraform provider for Proxmox, which is an excellent hypervisor. Making a template takes less than an hour with configuration.

You do need an Ops person for sure, but an entire _team_?

hmmm-i-wonder · on Dec 11, 2024

>"apt update && apt upgrade",

Across 10k-100k+ servers, all running services and needing to orchestrate restarting across the whole fleet, while providing 0 downtime or impact to thousands of clients with terabytes of data being processed and analyzed at any given time.

Sure whats so hard about changing a tire? Well try to do it on an 18-wheeler while its driving down the highway without any impact to its speed.

> Are you aware of how effective SELinux and systemd containers are? Just a simple firewall at the OS level?

Part of a layered and in-depth system but one that introduces complexity.

>Maybe even just using Tailscale (or the open source Headscale) to introduce zero trust access capabilities.

Tailscale in an enterprise production environment? Not going to pass any sort of security audit and probably violates a number of certifications customer require at the enterprise level for network access controls, visibility and auditing.

Just managing the git/jenkins/spinnaker/terraform infrastructure in dozens of locations deploying to and maintaining tens of thousands of servers/pods requires a 24x7 team on top of the hundreds of teams and tens of thousands of devs using it.

If you're small enough that doesn't make sense, then you might be small enough one Ops person can handle the load (One is never enough if you're smart but...), but you are dealing with a very small amount of infrastructure and services at this point.

CRConrad · on Dec 20, 2024

> Across 10k-100k+ servers

If you "need" that many servers (and aren't Google), you've built your systems massively wrong.

hmmm-i-wonder · on Dec 10, 2024

Absolutely.

My issue is really on the other end of that scale, where getting C-suites to recognize when owning that core competency is actually beneficial to the company even if its not the focus of the company.

I grew up around companies leveraging vertical integration at the right scales to improve costs, seeing companies go the opposite direction trading all those advantages for often never-materializing benefits is... frustrating.