> I’m referring to running everything inside a single VM that you would have tot...

aidos · on Aug 19, 2024

So the devs don’t have the ability to ssh to your cloud instances and change config? Other than the size issue, I’m still not seeing the difference. Take your point on it needing to start before you have control, but other than that a VM on a dev machine is functionally the same as one in a cloud environment.

In terms of needing to reset, it’s just a matter of git branch, push, reset, merge. In your world that sync complexity happens all the time, in mine just on reset.

Just to be clear, I think it’s interesting to have a healthy discussion about this to see where the tradeoffs are. Feels like the sort of thing where people try to emulate you and buy themselves a bunch of complexity where other options are reasonable.

I have no doubt Stripe does what makes sense for Stripe. I’d also wager than on balance it’s not the best option for most other teams.

PS thanks for chiming in. I appreciate the extra insights and context.

bhuga · on Aug 19, 2024

> So the devs don’t have the ability to ssh to your cloud instances and change config?

They do, but I can see those changes if I'm helping debug, and more importantly, we can set up the most important parts of the dev processes as services that we can update. We can't ssh into a VM on your laptop to do that.

For example, if you start a service on a stripe machine, you're sending an RPC to a dev-runner program that allocates as many ports as are necessary, updates a local envoy to make it routable, sets up a systemd unit to keep it running, and so forth. If I need to update that component, I just deploy it like anything else. If someone configures their host until that dev runner breaks, it fails a healthcheck and that's obvious to me in a support role.

> Just to be clear, I think it’s interesting to have a healthy discussion about this to see where the tradeoffs are. Feels like the sort of thing where people try to emulate you and buy themselves a bunch of complexity where other options are reasonable.

100% Agree! I think we've got something pretty cool, but this stuff is coming from a well-resourced team; keeping the infra for it all running is larger than many startups. There's tradeoffs involved: cost, user support, flexibility on the dev side (i.e. it's harder to add something to our servers than to test out a new kind of database on your local VM) come immediately to mind, but there are others.

There are startups doing lighter-weight, legacy-free versions of what we're doing that are worth exploring for organizations of any size. But remote dev isn't the right call for every company!

aidos · on Aug 19, 2024

Ah! So that’s a spot where we’re talking past each other.

I’d anticipate you would be equally as able to ssh to VMs on dev laptops. That’s definitely a prerequisite for making this work in the same way as you’re currently doing.

The only difference between what you do and what I’m suggesting is the location of the VM. That itself creates some tradeoffs but I would expect absolutely everything inside the machine to be the same.

bhuga · on Aug 19, 2024

> I’d anticipate you would be equally as able to ssh to VMs on dev laptops. That’s definitely a prerequisite for making this work in the same way as you’re currently doing.

Our laptops don't receive connections, but even if they could, folks go on leave and turn them off for 9 months at a time, or they don't get updated for whatever reason, or other nutty stuff.

It's surprisingly common with a few thousand of them out there that laptop management code that removes old versions of a tool is itself removed after months, but laptops still pop up with the old version as folks turn them back on after a very long time, and the old tool lingers. The services the tools interact with have long since stopped working with the old version, and the laptop behaves in unpredictable ways.

This doesn't just apply to hypothetical VMs, but various CLI tools that we deploy to laptops, and we still have trouble there. The VMs are just one example, but a guiding principle for us been that the less that's on the laptop, the more control we have, and thus the better we can support users with issues.

codethief · on Aug 19, 2024

Maybe I'm missing something here but couldn't you just track the whole VM setup (dependencies, dev tools, telemetry and everything) in your monorepo? That is, the VM config would get pulled from master just like everything else, and then the developer would use something like nixos-shell[0] to quickly fire up a VM based on that config that they pulled.

[0]: https://github.com/Mic92/nixos-shell (not to be confused with nix-shell)

bhuga · on Aug 20, 2024

Yes, but this still "freezes" the VM when the user creates it, and I've got no tools to force the software running in it to be updated. It's important that boxes can be updated, not just reliably created.

As just one reason why, many developers need to set up complex test data. We have tools to help with that, but they take time to run and each team has their own needs, so some of them still have manual steps when creating a new dev server. These devs tend to re-use their servers until our company-wide max age. Others, to be fair, spin up a new machine for every branch, multiple times per day, and spinning up a new VM might not be burdensome for them.

codethief · on Aug 21, 2024

Isn't this a matter of not reusing old VMs after a `git pull/checkout`, though? (So not really different from updating any other project dependencies?) Moreover, shouldn't something like nixos-shell take care of this automatically if it detects the VM configuration (Nix config) has changed?

bhuga · on Aug 21, 2024

> Isn't this a matter of not reusing old VMs after a `git pull/checkout`, though?

Yes, but forcing people to rebase is disruptive. Master moves several times per minute for us, so we don't want people needing to upgrade as the speed of git. Some things you have to rebase for: the code you're working on. Other things are the dev environment around your code, and you don't want that to be part of the checkout as much as possible. And as per my earlier comment, setting up a fresh VM can be quite expensive in terms of developer time if test data needs to be configured.

codethief · on Aug 22, 2024

You seem to assume you would have to rebuild the entire VM whenever any code in git changes in any way. I don't think you do: You could simply mount application code (and test data) inside the VM. In my book, the VM would merely serve to pin the most basic dependencies for running your integration / e2e tests and I don't think those would change often, so triggering a VM rebuild should produce a cache hit in 99% of the cases.

bhuga · on Aug 23, 2024

> I don't think those would change often

I think this is where our contexts may differ, and so we end up with different tradeoffs and choices :) The services running on our dev servers are updated dozens of times per day, and they roughly correspond to the non-code parts of a VM.

codethief · on Aug 24, 2024

Or maybe we just used terminology differently. :) Why wouldn't those services be part of the code? After all, I thought we were talking about a monorepo here.