Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Noob question: if I rent one of these dedicated servers, what happens if some hardware fails? Do I need to contact support or will they detect that automatically? If something needs to be fixed manually (e.g., hard drive, cpu, network), how long does one need to wait for dedicated servers?

I'm used to cloud VMs where if one dies, I can quickly spin up another one effortlessly (I never have to contact support or anything like that).



I don't know about Hetzner but my experience with OVH on dedicated servers and failures is like this: they detect when the server is down, mainly when it's off or doesn't ping, and then they try to boot on their debug distribution and perform some checks on the machine. They don't monitor other health issues however (how would they since you are running your own system?) and therefore don't do anything before they detect a "down" status.

Some failures I experienced and had to monitor/detect myself were: overheating (they replaced thermal paste when I told them I saw strange readings from the CPU stats), raid disk failure or ssd high burn (ie. partial failure, server still running, they replaced the failed disks after I told them).

Most of the time the issues have been resolved within 1-4 hours on low-cost Kimsufi and SoYouStart offers, even on weekends and nights. Often when the server is running they can require a shutdown.

I'm quite happy with this as I am highly technical in those subjects and like to look under the hood, but with dedicated servers you really have to do some more maintenance/monitoring/planning yourself.


I don't think this is accurate. I have rented an OVH dedicated server (through SoYouStart) for about the last decade, let me share some maintenance experiences:

> They don't monitor other health issues however (how would they since you are running your own system?) and therefore don't do anything before they detect a "down" status.

My server has a hardware raid card. I have had one incident where OVH contacted me and said there was an issue with one of the drives, and that they will reboot the server at X time to replace it. They did so, and the problem was solved with no requests or intervention on my part.

I had another incident where I was told the motherboard died. IIRC, it died around 1am my time and was replaced by 5am my time. They of course turned the system back on for me. I was asleep the whole time, and this was likewise solved with zero requests or intervention on my part.

Besides this, I can count the number of times an internet or power issue made my server unreachable on a single hand. IMO, a great experience for a dirt cheap host.

That all being said: OVH's ipv6 solution is laughably bad and is the single reason why I would switch hosts, if a better one with a north American presence appears.


What you describe are hardware failures, and as I said they detect hardware failures. When the server goes down, they are on it by themselves.

But somes issues are not failures and you have to work on them on your side.

Most of the time the raid is software nowadays for example.

IPv6 works fine for my many servers at OVH.


I don't see that in what you wrote but I did see the mention of them not monitoring raid. Hence one reason why I thought your comment is inaccurate, per my experience.


Technically stuff like overheating they could monitor via IPMI (which they most likely use for OOB control anyway)


The thing is, it physically wasn't overheating because there was a process on the system (intel/ubuntu) that was hugely throttling the CPU to make it not overheat. So the machine was almost useless, very slow, but the temperature was ok. When the throttling mechanism was disabled, it indeed overheated. It's only because of those throttling system processes that I found out about the physical problem.


Just like others already replied, hardware monitoring on OS level is up to you. For disks, they even provide documentation on how best to set up smartd on Linux, and I'm sure they have some similar docu for Windows. In fact, their technical documentation is excellent in general.

But they often go even above and beyond for you. I rent several servers from them for many years, and I've had it happen once or twice that I got an e-mail from their datacenter team telling me that they noticed an error LED blinking on one of my servers, and actively offered to plan a repair intervention. All I had to do was to come up with a downtime window and communicate it to them. Very slick.

I'd say about half of overall value of Hetzner is in their quality support.


In my experience support will gaslight you into thinking it is your problem. I had a Hetzner server that was shutting off at random hours several times per week.

I showed them the sudden loss of power events in the logs. "It must be a problem with your OS modifications that we don't support".

OK, I wiped the machine to the stock image that you provide and it's still having power loss events. "Sure, we'll run a stress test for a couple minutes ... stress test passed OK, it's still your fault!".

The events happen randomly during the week, a stress test is not going to show that. Can you just move me to a different physical machine? "No."

This was over the course of several days, when I had an event coming up that I NEEDED the server for. I ended up going back to Azure and paying 10x the cost, but at least it worked great.


I am not much of a conspiracy theorist, but after going to the Hetzner site to look at my support history I was presented with this:

https://i.imgur.com/3DKc9OC.png

I have never seen this page before when trying to login. Make of that what you will.


To be honest, that's a incredulous leap in logic: Assuming someone from Hetzner is just name searching their brand, found your comment, looked up your account, and then "blocked" your client.

That's some dedicated client response team if so!


I just did this recently with a dedicated server in RAID 6 where a disk failed. They have a page on their wiki to walk you through it, but basically you boot up into the rescue system (network boot, activated on user panel). Then you figure out the failed disk based on existing or missing serial number, input that into the support form, request replacement. This was done in about 20m, then I rebuilt the RAID, rebooted it was fine


Yes, you contact support provide them a description of what is faulty (disk with the exact serial number) and they gonna replace it usually within 30-60 mins.

Provisioning of servers was always quite fast. Same day or the next business day.

My experience is a little dated, I used to order bunch of dedicated boxes from them for our clients and with Hetzner we always had the best experience. Also the most bang for the buck.


After spending years with Rackspace, Host Europe and renting a cage I never want dedicated servers. That's all I'm saying.


Hardware inside server is up to you to notify Hetzner. After more than 10 years never have an issue but disk degradation after many use.

Then you contact support, appoint disk change, you first deactivate disk on raid (save geometry etc), they replace disk and then you rebuild raid in new disk. That's it. With SSD you may not even need to do this anymore.


> Then you contact support, appoint disk change, you first deactivate disk on raid (save geometry etc), they replace disk and then you rebuild raid in new disk. That's it.

I imagine this would take time, right? Like not 5 minutes, but maybe 3 hours top? So, if I pretend to run a saas (that shouldn't be down more than 1h/day), then renting only 1 dedicated server could be qualified as "risky"?


RAID is not a replacement for backups anyway. There are many other ways to lose data besides physical disk failures. There are also ways to lose the RAID in face of physical disk failure. It's an availability solution (a box can probably keep running without an outage without resorting to backups).


You should be running at least three, preferrably four (3 + hot spare) mirrored instances if you require that level of uptime, regardless of provider or tech (bare metal/VM).


Well, yes, having a single server is risky in any production context if it's storing any sort of state that can't be easily brought back up on another server


I'm not sure what "deactivating RAID" means.

They will all be hot-swap disks. You remove the old disk and slide in the new one (or in this case, tell them to do it). The RAID system rebuilds the array in the background over the next few hours.

During that time you will lose data if it's RAID 5 and another disk fails.


> I'm not sure what "deactivating RAID" means.

mdadm --manage <array> --remove <failed disk>

so your machine doesn't have a fit when the disk is detached. Or equivalent.


Presumably he means detaching the disk from your RAID solution so it doesn't freak out when it's physically removed and replaced.


I am using software raid servers, as other commenters say is not a hot swapping disks operation, is cold swap :D


Rebuilding raid doesn't take the server down but disk performance i.e. IPOS will be reduced while it takes place Running service on 1 server always carries some risk whatever form that server takes

For example I have loads of stuff on Linode but always make sure I keep backups off-linode, incase I get a random TOS account shutdown and they stop speaking to me etc


Absolutely, some redundancy is necessary if uptime is critical (as well as backups).


Is risky, 3 hours top is a good estimation. You can appoint for an out of business hours change. But better choose another strategy start cloning disk at first failure notice and cancelling old server.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: