I have experience running Kubernetes clusters on Hetzner dedicated servers, as well as working with a range of fully or highly managed services like Aurora, S3, and ECS Fargate.
From my experience, the cloud bill on Hetzner can sometimes be as low as 20% of an equivalent AWS bill. However, this cost advantage comes with significant trade-offs.
On Kubernetes with Hetzner, we managed a Ceph cluster using NVMe storage, MariaDB operators, Cilium for networking, and ArgoCD for deploying Helm charts. We had to handle Kubernetes cluster updates ourselves, which included facing a complete cluster failure at one point. We also encountered various bugs in both Kubernetes and Ceph, many of which were documented in GitHub issues and Ceph trackers. The list of tasks to manage and monitor was endless. Depending on the number of workloads and the overall complexity of the environment, maintaining such a setup can quickly become a full-time job for a DevOps team.
In contrast, using AWS or other major cloud providers allows for a more hands-off setup. With managed services, maintenance often requires significantly less effort, reducing the operational burden on your team.
In essence, with AWS, your DevOps workload is reduced by a significant factor, while on Hetzner, your cloud bill is significantly lower.
Determining which option is more cost-effective requires a thorough TCO (Total Cost of Ownership) analysis. While Hetzner may seem cheaper upfront, the additional hours required for DevOps work can offset those savings.
This is definitely some ChatGPT output being posted here and your post history also has a lot of this "While X, Y also does Z. Y already overlaps with X" output.
I'd like to see your breakdowns as well, given that the cost difference between a 2 vCPU, 4GB configuration (as an example) and a similar configuration on AWS is priced much higher.
It is my ouput, but I use ChatGPT to fix my spelling and grammar. Maybe my prompt for that should be refined in order to not alter the wording too much.
While using ChatGPT for enhancing your writings is not wrong by any means, reviewing the generated output and re-editing when necessary is essential to avoid robotic writing style that may smell unhuman. For instance, these successive paragraphs: "In contrast, using AWS.." and "In essence, with AWS.." leaves a bad taste in your brain when read consecutively.
Why would you want to restrict contributions from people with relevant experience and willingness to share, just because the author ran a spelling and grammar check?
Unless the spelling and grammar is HORRENDOUS people won't really care. Bad English is the words most used language, we all deal with it every day.
Just using your browser's built-in proofreader is enough in 99.9% of the cases.
Using ChatGPT to rewrite your ideas will make them feel formulaic (LLMs have a style and people exposed to them will spot it instantly, like a code smell) and usually needlessly verbose.
You can tell it's AI when it refuses to take a side and equivocally considers issues first on one hand and then the other hand, but can't get the number of fingers right.
Or as ChatGPT would put it:
Precise grammar and spelling are undeniably important, but minor imperfections in English rarely obstruct communication. As the most widely used language in the world, English is highly flexible, and most people navigate small errors without issue. For the majority of cases, a browser’s built-in proofreader is entirely sufficient.
On one hand, tools like ChatGPT can be valuable for refining text and ensuring clarity. On the other hand, frequent reliance on such tools can result in writing that feels formulaic, especially to those familiar with AI-generated styles. Balancing the benefits of polished phrasing with the authenticity of your own voice is often the most effective approach.
I could actually hear the different voices in my head as I read the second and third paragraphs, distinct from the first. Your assessment of the unable-to-take-a-side is spot on for OpenAI, possibly Gemini too, but not for all LLMs.
It’s overkill for this audience. HN is pretty forgiving of spelling and grammar mistakes, so long as the main information is clear. I’d encourage anyone that wants to share a comment here to not use an LLM to help, but just try your best to write it out yourself.
Really - your comment on its own is good enough without the LLM. (And if you find an error, you can always edit!)
If we really wanted ChatGPT’s input on a topic (or a rewording of your comment), we can always ask ChatGPT ourselves.
Everyone claims it’s a spelling and grammar check, but it’s the OP trying to spread “we tried running self-managed clusters on Hetzner and it only saved us 20% while being a chore in terms of upkeep” into a full essay that causes all that annoying filler.
You’d assume people would use tools to deliver a better and well composed message; whereas most people try to use LLMs to decompress their text into an inefficient representation. Why this is I have no idea, but I’d rather have the raw unfiltered thought from a fellow human rather than someone trying to sound fancy and important.
Not to say I still find the 20% claim a little suspect.
I've never operated a kubernetes cluster except for a toy dev cluster for reproducing support issues.
One day it broke because of something to do with certificates (not that it was easy to determine the underlying problem). There was plenty of information online about which incantations were necessary to get it working again, but instead I nuked it from orbit and rebuilt the cluster. From then on I did this every few weeks.
A real kubernetes operator would have tooling in place to automatically upgrade certs and who knows what else. I imagine a company would have to pay such an operator.
I run BareMetalSavings.com[0], a toy for ballpark-estimating bare-metal/cloud savings, and the companies that have it hardest to move away from the cloud are those who are highly dependent on Kubernetes.
It's great for the devs but I wouldn't want to operate a cluster.
Ceph is a bastard to run. Its expensive, slow and just not really ready. Yes I know people use it, but compared to a fully grown up system (ie lustre[don't its raid 0 in prod] or GPFS [great but expensive]) its just a massive time sync.
You are much better off having a bunch of smaller file systems exported over NFS make sure that you have block level replication. Single address space filesystems are ok and convenient, but most of the time are not worth the cost of admin to get reliable at scale. like a DB shard your filesystems, especially as you can easily add mapping logic to kubernetes to make sure you get the right storage to the right image.
I agree that it is hideously complicated (to anyone saying “just use Rook,” I’ll counter that if you haven’t read through Ceph’s docs in full, you’re deluding yourself that you know how to run it), but given that CERN uses it at massive scale, I think it’s definitely prod-ready.
Oh it probably is prod ready, I just wouldn't use it unless I had to (ie I had the staff to look after it and no money to buy something better)
whether is a good fit for general purpose storage of stuff at a small scale is harder question. Its not easy to get good performance at small scale, and to get good performance requires a larger than you'd like number of storage nodes.
I mostly agree, but it surprises me that people don't often consider a solution right in the center, such as openshift. Have a much, much less burden for devops and have all the power and flexibility of running on bare metal. It's a great hybrid between a fully managed and expensive service versus a complete build your own. It's expensive enough. Todd, for startups it is not likely a good option, but if you have a cluster with at least 72 GB of RAM or 36 CPUs going (about 9 mid size nodes), you should definitely consider something like openshift.
I dunno, I've had to spend like two or three hours each month on updating mine for its entire lifetime (of over 5 years now), and that includes losing entire nodes to hardware failure and spinning up new ones.
Originally it was ansible, and so spinning up a new node or updating all nodes was editing one file (k8s version and ssh node list), and then running one ansible command.
Now I'm using nixos, so updating is just bumping the version number, a hash, and typing "colmena apply".
Even migrating the k8s cluster from ansible to nixos was quite easy, I just swapped one node at a time and it all worked.
People are so afraid of just like learning basic linux sysadmin operations, and yet it also makes it way easier to understand and debug the system too, so it pays off.
I had to help someone else with their EKS cluster, and in the end debugging the weird EKS AMI was a nightmare and required spending more time than all the time I've had to spend on my own cluster over the last year combined.
From my perspective, using EKS both costs more money, gives you a worse K8s (you can't use beta features, their ami sucks), and also pushes you to have a worse understanding of the system so that you can't understand bugs as easily and when it breaks it's worse.
if the "couple of bucks" ends up being the cost of an entire team, then hire a small team to do it.
Then get mad at them because they don't "produce value", and fold it into a developers job with an even higher level of abstraction again. This is what we always do.
> Determining which option is more cost-effective requires a thorough TCO (Total Cost of Ownership) analysis. While Hetzner may seem cheaper upfront, the additional hours required for DevOps work can offset those savings.
Sure, but the TLDR is going to be that if you employ n or more sysadmins, the cost savings will dominate. With 2 < n < 7. So for a given company size, Hetzner will start being cheaper at some point, and it will become more extreme the bigger you go.
Second if you have a "big" cost, whatever it is, bandwidth, disk space (essentially anything but compute), cost savings will dominate faster.
Not always. Employing Sysadmins doesn't mean Hetzner is cheaper because those "Sysadmin/Ops type people" are being hired to managed the Kubernetes cluster. And Ops type people who truly know Kubernetes are not cheap.
Sure, you can get away with legoing some K3S stuff together for a while but one major outage later, and that cost saving might have entirely disappeared.
GPT-4 is, but ChatGPT is fine-tuned to emit sentences that get rated well (by humans, and by raters trained to mimic human evaluation) in a conversational agent context.
I have no idea how fast it was going when it hit, but as it is more than 10x the mass of a 50 cal, I wouldn't rely on the assumption that protection from the latter is also protection from the former.
I think you can pull much of the data you need for such a project from the GH Archive https://www.gharchive.org/ They basically have captured every event that happened on the platform starting from 2011.
IMDb works mainly by user reviews who rate videos from 1 to 10. But personally, I think any system can be gamed, just like GitHub Stars. When I'm interested in the GitHub Top 250 (as equivalent to IMDb Top 250), I just do a GitHub search with a filter for the language I'm interested in, e.g., Python, and then sort by stars. This works good enough for me.
We have so many distributed X applications nowadays that all try to solve the same problem, either in the same or different ways. I think we first have to come up with a simple, distributed, open-source storage solution. In the cloud, we have things like AWS S3, which is a very reliable distributed storage, but for self-hosting, we have:
Ceph, with which I have much experience, is a very solid and quite bulletproof storage solution that offers S3 protocol and FS. However, maintaining it in the long run is really challenging. You better become a Ceph expert.
SeaweedFS struggles with managing large data groups. It's inspired by an outdated Facebook study (Haystack) and is intended for storing and sharing large images. However, I think it's only average—it has poor documentation, underwhelming performance, and a confusing set of components to install. Its design allows each server process to use one big file for storage, bypassing slow file metadata operations. It offers various access points through gateways.
MinIO has evolved a lot recently, making it hard to evaluate. MinIO relies on many small databases. Currently, it's phasing out some features, like the gateway, and mainly consists of two parts: a command line interface (CLI) and a server. While MinIO's setup is complex, SeaweedFS's setup is much simpler. MinIO also seems to be moving from an open-source model towards a more commercial one, but I have not closely followed this transition.
All of these solutions are not simple enough to be the base for a distributed database application. What we really need would be something like an Ext4 successor, let's call it Ext5, with native distributed storage capabilities in the most dead-simple way. ZFS is another good candidate. ZFS has already solved the problem of how to distribute storage across multiple hard drives within one server very well, but it still lacks a good solution on how to distribute storage across different hard drives on different servers connected via a network.
Yes, I know there is the CAP theorem, so it is really a hard challenge to solve, but I think we can do better in terms of self-hosted solutions.
> In the cloud, we have things like AWS S3, which is a very reliable distributed storage
Yes, but S3 is basically a standardized protocol at this point. There are many both open and commercial alternatives, like Cloudflare R2 (no egress). So depending on the reason for self-hosting (such as preventing lock-in), S3 might be the least important thing to actually move away from. It’s way more difficult to migrate away from eg a proprietary db, sometimes by design.
I only tested for software latency (monitor, keyboard and other hardware latency is not included in Typometer benchmarks). I ran the test on Arch Linux with Xorg + bswpwm without compositor. You can find the full results on by blog https://beuke.org/terminal-latency/.
Compared to a similar 6yo [1] and 3yo[2] (by zutty maker) comparisons, VTE terminals still (at least pre-46) bad in latency front. (They're as high as VS Code based beuke article.) Xterm still rules it. (Pointed in [2], this is due to direct rendering via Xlib which comes with the downside of having poor throughput.) Alacritty significantly improved, Konsole got worse. About Alacritty, it's pointed in [2], there were various opened tickets related to its poor performance and wasn't an easy to solve problem. So kudos to Alacritty devs for succeeding and GNOME devs for improving in the new version.
Alacritty, Kitty, Zutty, GNOME, others, quite a rejuvenation in terminal development.
>However, if we custom-tune the settings for the lowest latency possible I chose minlatency = 0 and maxlatency = 1 then we have a new winner. Applying this custom tuning results in an average latency of 5.2 ms, which is 0.1 ms lower than xterm, and that’s with having a much more sane terminal without legacy cruft.
Huh, the devs really weren't lying, Alacritty really got better on the latency front. I started using it for supposed better security than Xterm, but at the time I think it was quite a lot worse on latency, but the throughput was way better.
Alacritty feels fast but they refuse to add support for tabs or tiling. They just say to go use tmux but that isn't the answer at all.
Kitty is quite nice but if you SSH into machines a lot, all hell breaks loose if they don't have the kitty terminfo files installed, and doing that isn't always possible. You can override TERM, but honestly don't have the patience for it.
It doesn’t bother me, I was just interested in whether the benchmark is fair in this respect (it is xorg only, so the answer is yes). I personally believe that 120+ hz gives barely any benefit, though.
Only if it's backed by security-reviewed and fuzzed asm. Go's gzip implementation is slow enough that klauspost is worth importing, following that pattern, I would probably still use klauspost's zstd even if Go had a slow reference implementation in the standard library.
I remember being a kid and wanting to play Far Cry in high quality when it came out. I did not have the money to buy the best graphics card to play it smoothly on high settings. So, I could either play it in low quality with something like 60 FPS or in high quality with something like 15 FPS. Of course, 15 FPS is not enough to play the game properly, but to capture all the beautiful details, I just went with the highest settings and very slowly explored the beach, astounded by all the details. Good memories.
Same. And 20 years later I'm an aging man but do the exact same thing in Cyberpunk 2077. Despite spending a small fortune on a graphics card, it's still either smooth framerate or the gorgeous path traced environments, because I haven't splurged on the 3090/4080 yet. Nothing has changed.
Out of genuine curiosity, what GPU are you running? Friends have recommended Cyberpunk 2077, but I'm only running a 3080. Wondering how many compromises I'll need to make.
I have an (equally aging) 2070 Super. It runs the game just fine at 1440p and perhaps not maxed out. Your 3080 will of course run it even better. No need to upgrade unless you want 60fps path traced or 4K.
nah but physics were. I remember a level in a hospital with one of those privacy curtains between two beds that you could walk through and that thing would move so beautifully
Yes, you can rent a few dollar VPS from e.g. Hetzner (since Germany is mentioned in the blog post), and run a few wget commands in parallel in a loop on their 200MB setup file to easily reach 1TB a day.
For a company, this should definitely not be something to worry about. However, if I were able to single out individual IPs that are attacking me, then I would simply block them, report them (use the abuse form from the hoster of the attacking IP), and call it a day. This way, you can at least hope that the hoster will do something about it, either by kicking the hacker off its platform or, if it is some kind of service reflection attack, inform the victim to close the security loophole on their server and remove themselves from the botnet. If your attacks originate from a vast amount of different IPs from Russia and China, consider geoblocking.
On Hetzner, you receive an abuse email with the directive to respond appropriately if your root server or VPS is involved in some kind of abuse related issue. In larger companies this happens quite frequently. I'm not sure what would happen if you ignore such email.
From my experience, the cloud bill on Hetzner can sometimes be as low as 20% of an equivalent AWS bill. However, this cost advantage comes with significant trade-offs.
On Kubernetes with Hetzner, we managed a Ceph cluster using NVMe storage, MariaDB operators, Cilium for networking, and ArgoCD for deploying Helm charts. We had to handle Kubernetes cluster updates ourselves, which included facing a complete cluster failure at one point. We also encountered various bugs in both Kubernetes and Ceph, many of which were documented in GitHub issues and Ceph trackers. The list of tasks to manage and monitor was endless. Depending on the number of workloads and the overall complexity of the environment, maintaining such a setup can quickly become a full-time job for a DevOps team.
In contrast, using AWS or other major cloud providers allows for a more hands-off setup. With managed services, maintenance often requires significantly less effort, reducing the operational burden on your team.
In essence, with AWS, your DevOps workload is reduced by a significant factor, while on Hetzner, your cloud bill is significantly lower.
Determining which option is more cost-effective requires a thorough TCO (Total Cost of Ownership) analysis. While Hetzner may seem cheaper upfront, the additional hours required for DevOps work can offset those savings.