Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

First off, no, a full rewrite is not only not necessary, but probably the worst possible approach. Do a piece at a time. You will eventually have re-written all the code, but do not ever fall into the trap of a "full re-write". It doesn't work.

But before you re-write once line of code - get some testing in place. Or, a lot of testing. If you have end-to-end tests that run through every feature that is currently used by your customer base, then you have a baseline to safely make changes. You can delete code as long as the tests pass. You can change code as long as the tests pass.

Once you are at that point, start picking off pieces to modernize and improve.

Also, respect the team. Maybe they aren't doing what you would, but they are keeping this beast alive, and probably have invaluable knowledge of how to do so. Don't come in pushing for change... come in embracing that this beast of a codebase makes 20 million a year. So talk about how the team can improve it, and modernize their skills at the same time.

Because if you walk in, saying, "This all sucks, and so do you, lets throw it out", do you really have to wonder why you are hitting resistance?



I fully agree with this, but I think it misses a key step:

As the team’s manager, it’s your job to get buy-in from the executives to gradually fix the mess. You don’t need to tell the team exactly how to fix it, but you gotta get buy-in for space to fix it.

One approach is just to say “every Friday goes to adding tests!” (And then when there’s some reasonable test coverage, make fridays go to refactoring that are easy with the new tests, and so on).

But this often fails because when Friday comes, something is on fire and management asks to please quickly squeeze this one thing in first.

The only other approach I know of is to get buy in for shipping every change slightly slower, and making the code touched by that change better. Eg they want to add feature X, ok add a test for adjacent existing functionality Y, then maybe make Y a little better, just so adding X will be easier, then build X, also with tests. Enthusiastically celebrate that not only X got shipped but Y also got made better.

If the team is change averse, it’s because they’re risk averse. Likely with good reason, ask for anecdotes to figure out where it comes from. They need to see that risk can be reduced and that execs can be reasonable.

You need the buy-in, both from the execs and the team. Things will go slightly slower in the beginning and it’s worth it. Only you can make sell this. The metaphor of “paying off technical debt” is useful here since interest is sky high and you want to bring it under control.


Before anything else, getting buy-in for any kind of major change from the execs is key. Explain the situation and the effects. Have everything in writing, complete with date and signatures. Push back hard every time this commitment gets sabotaged because something is supposedly on fire. Get a guaranteed budget for external trainings and workshops, again in writing. Then talk to the team.

If you cannot get those commitments in writing, or later on get ignored multiple times: run. Your energy and sanity is better spent elsewhere. No need to fight an uphill battle alone – and for what? The company just revealed itself for what it is and you have no future there.

First I’d do that, then think about the engineering part.


To be fair if I was an exec at a company and the new IT lead wants me to commit, in writing, to XYZ, I’d not keep them around long. You can’t run a company on that kind of deep mistrust.

Nothing in the OP suggests abusive management. Incompetence, maybe, but I see no reason to assume that they’ll backtrack on agreements, and a new management hire who immediately starts sewing mistrusts is not someone I’d trust to get things to a higher level.


Clearly you've been in a very positive bubble. I envy you but that's not an experience shared by many.

As a programmer contractor and a guy who sometimes gets called to save small businesses due to stalled development (happened 6 times in my 20y career) I'm absolutely not even opening my laptop anymore -- before I see a written commitment from execs (email is enough; I tag/label those and make sure I can easily find them in the future).

Reasons are extremely simple and self-defensive in nature: execs can and do backtrack from agreements all the time. At the time we arrive in an oral agreement they made 20 other invisible assumptions they never told me about and when one of them turns out to be not true (example: they thought you can get onboarded in 2 days into a system with 5000+ source files and be productive as a full-blown team member on day #3) they start backtracking faster than you can say "that's not professional".

I don't dispute your positive experience. But please be aware that it's not the norm. Most execs out there treat programmers as slaves with big salaries and nothing more, and we get exactly the treatment you might expect when they have that mindset.

Sorry not sorry but I have to save my own arse first; I've been bound to extremely awful contracts when I've been much younger and stupider and I am not allowing that ever again.

I can single-handedly make a business succeed with technology, and I have done so. I am not staying anywhere where execs hand-away everything with "should be simple and quick, right? k thx bye".


Thanks, that’s what I was aiming for. It’s kind of a litmus test what kind of professionalism you can expect in a place – if any. Especially when they have shown prior incompetence, as in OP’s example.

In all honesty, given that example, if I didn’t get immediate buy-in, I’d throw the towel right then. Over 15 years of experience show that train wrecks only ever get fixed when they are recognized as such from the start.


> You can’t run a company on that kind of deep mistrust.

Trust has to be earned in some ways (but you can expect some base-level). But I want to argue on another point: as an exec, you can use this kind of writing to also get commitment from the team, to balance things out. But ofc for that there needs to be a fair discussion of priorities and once you have that, there is usually no reason to contractify the outcome.


>if I was an exec at a company and the new IT lead wants me to commit, in writing, to XYZ, I’d not keep them around long. You can’t run a company on that kind of deep mistrust.

Emails are writing, if you're imagining the IT lead walking in with a paper contract I see why you would say that.


That's essentially what the GP was implying, "Have everything in writing, complete with date and signatures."


Nowhere were contracts mentioned. A proper proposal, for example, always has a date and to sign it if agreed to is just professional conduct. I’d be wary of any exec not willing to do that. Instant red flag.


That’s what a proposal is too, it’s not necessarily a demand


That's fair. I've worked at more established places with formal design doc/RFC and sign off processes and it can work well.

After reading the description of the SOP at this shop, the idea that the OP would be able to introduce an additional layer of process requiring multiple stakeholders and management seemed like a bridge too far in my mind :).


Do leads not write proposals or RFCs? I’m not sure why you wouldn’t keep them around long if they laid out their plans in a clear way, and then pitched it to others


"Have everything in writing, complete with date and signatures"

It is possible that the executives won't take it well to all of the formality here (writing and signatures). How would you convince them that this is necessary?


"Have everything in writing" is a bad mindset and is not going to save you.

Exeutives are looking at you as the expert to deliver a good outcome. Which means making good decisions, managing expectations and keeping everyone in the loop.

Generally, if it gets to the point of having to dig up who signed off on what, you've already failed. Often you won't even get the chance to dig up those emails, because delivering a bad outcome is enough for execs to write you off without even needing to hear your excuses.


> because delivering a bad outcome is enough for execs to write you off without even needing to hear your excuses.

What makes you think they are excuses? Constantly chasing moving targets and not having even one of them agreed upon in writing is heaven for bad execs. I've seen it happen a good amount of times, my colleagues too.

I don't view the "you changed requirements 20 times the last month and I can't keep up with your impossible imagined schedule" statement as an excuse.


If the goal is to remove bad execs, then a document trail can help, although I'd suggest starting with some statistics like "over the last 3 months, we moved the goalpost 8 times, which led to an effective throughput of 4 weeks of work being done rather than the expected 12 weeks. How do you think we could improve these conditions?" Collaboration first.

Keeping email threads for reference is probably plenty data enough, btw; "signatures" sounds like the wrong approach. Maybe even just summarize the direction given in a wiki document with a change log with time stamps and requesting person, which you can review once in a while, and the sheer length of it might be enough to bring the point across.


Thank you -- good advice to put collaboration first. I sometimes have a problem that I assume the worst right away. But I've met some true villains in my life and career so maybe that's why. I'll do my best to implement your advice.

> and the sheer length of it might be enough to bring the point across.

This one sadly hasn't been true -- I tried it but I get blank stares and sometimes grumbling about making people read long stuff that I can just summarize to them. Maybe there's a way out of this conundrum as well.


Your job is to deliver what the execs consider to be a good outcome.

That includes helping the stakeholders come up with a stable set of requirements. Most of the time when teams are dealing with a lot of requirements change, it's because they never captured the true requirements which usually change at a much slower rate.

Secondly, your job is also to manage expectations, so that execs know what the impact of any changes will be when they request them.

Changes aren't an excuse to deliver late or over budget. These parameters are flexible and new targets should have been agreed when the requirements change was requested.

Execs will usually assess your performance without discussion. There is no venue to bring your cache of documents to prove your innocence after the fact.


We all know the ideal theory. I am talking execs that constantly change requirements, refuse to sign under any stable requirements, and think everything is "quick and easy", and take offense when you try to manage their expectations.

Reasonable people I easily work with. It's the rest who are the problem.


Sounds like you haven't worked in an environment where this happens. You get regarded as 'the expert to deliver a good outcome' sure. But you're ALSO expected to deliver an aggressive roadmap of a while load of other stuff that people already committed to. Someone's something's got to give


Dates and signatures are theatrical overkill.

I've yet to work at a place where meeting minutes, sent out to all attendees post-meeting, aren't sufficient for the same purpose (ass covering & continued adherence to The Plan as originally agreed).

I'm sure signature and date places do exist... but, I'd probably be looking for a new job if I worked at one.


The dates and signatures bit is nonsense, but it does help to have things in writing to ensure everyone's on the same page. That just means that when you're discussing things not in writing, you send a written follow up to everyone that's involved immediately afterwards. If it's a meeting, take detailed notes and send them around afterwards. If it's a one on one conversation, just send a follow up email that says something like, "Hi x, I just wanted to memorialize our conversation - here are the main notes that I took. Please let me know if any of this sounds off to you. Thank you."

That doesn't preclude them from not reading that email and later telling you they said something completely different, but at that point you should probably be heading for the door anyway.


Having stuff in writing is essential, for accountability on all sides. The exact format does not matter, neither does what passes as a signature in a company. My example was for broadest possible applicability. The point is the willingness to commit to something in writing and to take the time to reflect on the implications of doing so. If you cannot get that, you’ve already lost. There will be moving targets.

It’s interesting to see how all responses focus on the signature part as problematic due to its supposed formality. Is this an American work culture thing? I see signing off on an agreement as a signal of professional conduct and reliability.


> But this often fails because when Friday comes, something is on fire and management asks to please quickly squeeze this one thing in first.

There's a solution to this problem: nothing goes live on Fridays.

> and making the code touched by that change better.

Getting buy-in from management on this always appeared to me as weird. The alternative is a codebase that can only ever get worse over time. So you either gotta gold plate everything, which will take way longer than allowing for some after-the-fact improvement as needed, or your codebase turns into a pile of shit very quickly and your velocity grinds to a halt very quickly.


> Getting buy-in from management on this always appeared to me as weird. The alternative is a codebase that can only ever get worse over time.

Well that's just the thing: they have no notion of a "bad code base". To them that's an excuse and a negotiation leverage by the programmer to ask for more money. They judge others by themselves I guess.


It just feels like an amateur hour thing.

If my plumber came to me to ask if he can just dry assemble the pipes and leave them that way I'm gonna get a new plumber.


That’s assuming you know something about plumbing. If you don’t, you’ll just nod your head and say ok, that sounds good. The same thing is happening in these businesses. The business owners generally don’t know programming. Terms like “refactoring” mean nothing to them at best and sounds like “rewrite from scratch” at worst.


It's scary out there, man. A lot of people in HN judge by US companies and startups but I've only been in that bubble once for a few months and the rest of my 20 years of career has been everywhere else. And it's insanely bad in many places.


Do not waste time with a company that is going to collapse unless they are willing to do whatever it takes.


They are going to collapse making 20M a year, sure.


Revenue is not profit


Not the team’s manager. OP says so in the first line.


Yeah, there's a process. It's something that I've done a bunch of times for a bunch of clients.

There's so much low-hanging fruit there that's so easy to fix _right now_. No version control? Good news! `git init` is free! PHPCS/PHP-CS-fixer can normalise a lot, and is generally pretty safe (especially when you have git now). Yeah, it's overwhelming, but OP said that the software is already making millions - you don't wanna fuck with that.

I've done it, I've written about it, I've given conference talks about it. The real bonus for OP is that the team is small, so there's only a few people to fight over it. It's pretty easy to show how things will be better, but remember that the team are going to resist deleting code not because that they're unaware that it's bad, but because they are afraid to jeporadise whatever stability that they've found.


Personally, I would never run a linter of any kind on a full codebase that doesn't have tests. After having been bitten by all kinds of bugs over the years, I wouldn't suggest auto-linting any file that you aren't actively working on.

It's rare that linting will actually make the code work better. Granted, it could catch some security bugs. But they can - and will - introduce new bugs. You just have to ask if it's worth the risk.


This. It's so tempting when a linter warns "This code is misleading; it would be clearer to do it this other way" to think "Easy fix: change it the way the linter suggests." But, make the change, and you may discover (hopefully before delivery) that the code functionality depends on the confusing behavior.


And also starting by fixing the js/css/html front end is likely the safest, as it wont corrupt any customer data & it will be visible when something breaks. That can probably be the next best candidate to do a major overhaul. I'd also hope that a $20M/year project can afford to hire someone senior in addition to these 3 juniors?


> hope that a $20M/year project can afford to hire someone senior

Never underestimate the ability of management to look a gift horse in the mouth while shooting it in the foot.


why would someone senior even want to join this team? Especially someone senior enough to fix this. The productivity is horrible and there's no kudos for fixing something that's lived for 12 years like this.


I was added to a team because, to quote the VP, "they're good but they need some adult supervision"

Mixing skill levels in a team is healthy.


> why would someone senior even want to join this team?

Well, money. Why would someone even want to join any team?


Theoretically a company that's making $20m/year on this can afford to make it worth someone's while to come in and fix it. The problem isn't finding someone who will do it, it's that the company assumes they can continue to get by indefinitely on paying too little.


For the love of refactoring.


git init seems like job #1 because at least then you can delete every commented out line and start a little cleaner.


to loose all the comments? :D that would make it even harder to read


In a project without version control (or one that doesn't trust it enough) there are always whole sub-programs made up of dead code. It's usually some combination of commented-out blocks and functions that are only called from within those commented-out blocks. Removing commented code (not real, descriptive comments) is the first step to eliminating all this dead code, and eliminating dead code buys a ton more flexibility in what you can change safely.


Fully agreed, I was tasked with using an old library and my first order of business was to make an analysis of dead code branches. The GIT commit removed 17 out of 80 files and about 10-11% of the code in some other files (that were not deleted) and the library works 100% the same -- confirmed by tests that I painstakingly added during the last weeks.

Less code, less confusion.


> It doesn't work.

That's simply not true. I've inherited something just as bad as this. We did a full rewrite and it was quite successful and the company went on to triple the revenue.

> get some testing in place

Writing tests for something that is already not functional, will be a waste of time. How do you fix the things that the test prove are broken? It is better to spend the time figuring out what all the features are, document them and then rewrite, with tests.


The problem with people new to the company starting a rewrite from scratch is that they often are poorly informed on why things were the way they were before. If you start big, you can have bad outcomes where the new system might be objectively worse than the old one... but you are stuck trying to get the new thing out for the next 5 years because too many people sunk too much political capital into it.

As an example, I worked at an ad-tech startup that swapped it's tech team out when it had ~100 million in revenue (via acqui-hire shenanigans). The new tech team immediately committed to rewriting the code base into ruby micro-services and were struck by strange old tech decisions like "why does our tracking pixel return a purple image?". The team went so far as to stop anyone from committing to the main service for several years in a vain attempt to speed up the rewrite/architecture migration.

These refactors inevitably failed to produce a meaningful impact to revenue, as a matter of fact the company's revenue had begun to decline. The company eventually did another house cleaning on the tech team and had some minor future successes - but this whole adventure effectively cost their entire Series D round along with 3 years of product development.


You're making a silent assumption that the original team is well informed about why the things are like they are and that they know what they are doing. I think it is not always the case.

I've been to a project once where the mess in the original system was the result of the original team not knowing what they were doing and just doing permutation based programming - applying random changes until it kinda worked. The situation was very similar to that described by the OP. They even chose J2EE just because the CTO heard other companies were using it, despite not having a single engineer knowing J2EE. Overall after a year of development the original system barely even worked (it required manual intervention a few times per day to keep running!), and even an imperfect rewrite done by a student was already better after 2 weeks of coding.

So I believe the level you're starting the rewrite from is quite an important factor.

Then of course there is a whole world of a difference between "They don't know what they are doing" vs "I don't like their tech stack and want to master <insert a shiny new toy here>". There former can be recognized objectively by:

- very high amount of broken functionality

- abysmal pace at which new features are added


The original team may not have been the best at the task, but they still managed to deliver 100 MM in revenue. Sometimes the things they leave behind/ignore simply don’t matter to the business/useful tech.

Particular to ad tech, the lifespan of any particular software is lower than you’d expect (unless your google/Facebook). Technology that pays out big one year will become pretty meh within 3 years. In the case above I’d argue that the new tech team didn’t really understand this dynamic and so they focused on the wrong things such as rewriting functionality that didn’t matter for the future. Or making big bets on aspects of the product which were irrelevant.

To the OP, we don’t know that the lifespan of any of these php files is greater than an individual contract. If the business can be modeled as solve a contract by putting a php file on prod - rewriting may be entirely worthless as the code can be “write once, read never”.


Revenue is a crazy kpi for technical excellence. You should never let a high revenue rely on extremely bad code.


We (my good friend and I who both have 20+ years of experience) were brought in specifically do to the rewrite. We were new to the company. We actually had to rebuild the entire IT department while we were at it as well.

> new tech team immediately committed to rewriting the code base into ruby micro-services

well... sigh.

> These refactors inevitably failed to produce a meaningful impact to revenue

It sounds like less about the refactor itself and more about the skills of the team doing the refactor. You certainly can't expect a refactor to go well if the team makes poor decisions to begin with.


> We were brought in specifically do to the rewrite.

That's the key difference. The stakeholders should always be in on the rewrite.


> It sounds like less about the refactor itself and more about the skills of the team doing the refactor. You certainly can't expect a refactor to go well if the team makes poor decisions to begin with.

This has been my biggest struggle with rewrites where I’m currently working. We have several large, messy old codebases that everyone agrees “needs a rewrite” to (1) correct for all the early assumptions in business needs that turned out wrong, (2) deal with old PHP code that is very prone to breakage with every major new PHP version released, and (3) add much needed architectural patterns specific to our needs.

I’ve seen rewrites of portions of the project work when they involve myself and one other mid-level dev who has a grasp on solid sw engineering practices, but when the rest of the (more senior) team get involved on the bigger “full rewrite”, they end up quickly making all the same mistakes that led to the previous project being the mess that it is.

Sure, it will be using fancy new PHP 8 features, and our Laravel framework will force some level of dependency injection, but the you start seeing giant God classes being injected over here, but duplicated code copy-pasted over there, all done by “senior” devs you feel you can’t question too strongly.

To that end, an open and collaborative culture in which you start the rewrite with some agreed upon principles, group code reviews and egos kept in check, are all necessary for this to work.


You have a great experience and did a great job indeed. My only question is how does one get 20 years of such experience without horrific flashbacks of “let’s just rewrite it” decisions. Do you do rewrites/redesigns often? What’s your success rate?


I've done what I would consider as four rewrites that I can remember as large events in my life (although not fully what you'd expect). But all are good stories in my opinion.

First one was the above example. It was for the largest hardcore porn company on the planet. Myself and my good friend Jeff rebuilt an already successful business IT department from the ground up and made it even more successful. Ever heard of 'the armory in sf'?

Second was that jeff and I were hired as contractors by Rob @ Pivotal Labs (ceo) to help the CloudFoundry team rewrite itself after he had bought the team and trimmed it down to only the good people. That one was a huge mess. We spent a lot of time deep in shitty ruby code using print statements trying to figure out what 'type' of an object something was and, of course, backfilling tests. It was a fun project and both Jeff and I learned the Pivotal way, which was probably the most enlightening thing I had ever learned about how to develop software correctly from a PM perspective. If you want to improve your skills beyond just slinging code, spend some time figuring their methodology out. Much of it is documented in Pivotal Tracker help documentation and blog posts.

Third one was not really a rewrite, but the original two founders, who were not technical, had tried to hire a guy and got burned because the guy couldn't finish the job. Sadly, they had already paid the person a ton of money and got really nothing functional out of it. We (jeff and I again!) just started over. We did a MVP in 3 months (to the exact date, because we both know how to write stories using pivotal tracker and do proper estimates) and ended up doing $80m in revenue, in our first year with an initial team of 5 people.

Fourth one was three guys (who were also not technical) I kind of randomly met after I moved to Vietnam. They were deploying litecoin asic miners into a giant Vietnamese military telco (technically, they are all military). They had hired another guy to do the IT work and he was messing it all up. They invited me out to help install machines, I came out, rebuilt their networking layout and then proceeded to eventually fix their machines because the software 'firmware' that was on them was horrible. I also added monitoring with prometheus so that I could 'see' issues with all these machines. That first day on the job, they fired the other guy and made me CTO. We ended up deploying in another datacenter as well. It was a really wild experience with a ton more stories.

Life has been, um, interesting. Thanks for reading this far.


Please tell me that you've retired now due to your incredible billing rates and track record of success.


Not everything has been a success. For example, unless you're stupid rich and can afford years of losses, never start/own a night club or you might end up working for the rest of your life to pay off your debts.


The problem is that most developers are crap and self centered on working with the tech they like.

You need to work with someone who doesn't care about filling up their CV with "ruby microservices" and get stuff done.

If I went into a business to do a rewrite and decided to use $shinyNewTech because I want to build up rust experience I'd probably end up wasting years with little results.


The existing app was a large rails monolith. This wasn’t a small 10 person team but a 50 person org. Groups can get funny ideas sometimes.


> why does our tracking pixel return a purple image?

Now I'm really curious, is there some exciting non-obvious reason for a tracking pixel to be purple? Was it #FF00FF or more like #6600DD?


This definitely needs an answer.

In fact, until OP can give us the right answer, we immediately need even wrong answers!

You reading this. Yes, you. Give your best wrong answer below.


My best wrong answer is that there were different colored pixels for different front-end versions, and the app had some radically different responses depending on the version. Maybe MENA would return white, SE Asia green, people who signed up during a sale would return blue, whatever. After a while, the other pixels were removed and only one shade of purple were used for everyone, but the code for processing them was not removed. So now, if the tracking pixel is not a precise shade of purple, some unexpected shenanigans ensue.


I worked on an app once that used two different, equally ancient libraries to a) generate thumbnails and b) create a png from a pdf. While modifying part of this process I started realizing that there were conditions where you'd get a PDF thumbnail at the end, but its output had a red tint to it.

Input looked fine and invoking each step manually worked fine as well.

Come to find out that certain PDFs contained color calibration information that, combined with how we were calling it, would treat ARGB as RGB. The input would have transparency info defined and the thumbnail generator would happily repurpose the alpha channel as the red channel instead.


The tracking pixel was made my scaling the company logo down to a 1x1 image.


That's brilliant! That way nobody could accuse you of spying. "It's just our logo. What's all the fuss about?"


Obviously it's because mauve has the most RAM.


Page background where pixel displayed was purple


Accessibility. Protanopia affects cones perceiving red color.


Obviously !! The anti-doppler shift trick /s :)


I did a rewrite of a 30 year old bit of perl/php2 over the last year. Not knowing why things were the way they were was really useful for the younger team members and me to get familiar with the codebase and the business context.


Anecdotal: I asked people why they keep incorrectly using jQuery methods and produce ambiguous, difficult-to-maintain code in the year of 2022. (we still have jQuery as a dependency for legacy code.) The response was that they were not aware that native counterparts like document.querySelectorAll exist in the browser. They just copied the old jQuery code, modified them, and it worked.

I am pretty sure this kind of thing exists in any large legacy codebase.


You don't need comprehensive tests for tests to start delivering value.

Figure out the single most important flow in the application - user registration and checkout in an e-commerce app, for example.

Write an automated end-to-end test for that. You could go with full browser automation using something like Playwright, or you could use code that exercises HTTP endpoints without browser automation. Either is fine.

Get those running in GitHub Actions (after setting up the git scraping trick I described here: https://news.ycombinator.com/item?id=32884305 )

The value provided here immense. You now have an early warning system for if someone breaks the flow that makes the money!

You also now have the beginnings of a larger test suite. Adding tests to an existing test suite is massively easier then starting a new test suite from scratch.


You're assuming the existing flow is working perfectly and I agree with you that testing is a godsend. I constantly yell that testing is great. Heck, I even worked for Pivotal Labs that does TDD and pair development, and loved it.

Let's say you start to write tests and start to see issues crop up. Now what? How do you fix those things?

Github actions!? They don't even have source control to begin with. There are so many steps necessary to just get to that point, why bother?

If the existing code base already has extremely slow movement and people are unwilling to touch anything for fear of breaking it... you're never going to get past that. Let's say you do even fix that one thing... how do you know it isn't breaking something else?

It is a rats nest of compounding issues and all you are doing is putting a bandaid on a gushing open wound. Time to bring in a couple talented developers and start over. Define the MVP that does what they've learned their customers actually need from their 'v1' and go from there. Focus on adding features (with tests) instead of trying to repair a car that doesn't pass the smog test.


> Let's say you start to write tests and start to see issues crop up. Now what? How do you fix those things?

I assumed the tests wouldn't be for correctness, but for compatibility. If issues crop up, you reproduce the issues exactly in the rewrite until you can prove no one depends on them (Chesterton's fence and all).

The backwards-compatibility-at-all-costs approach makes sense if the product has downstream integrations that depend on the current interface. If your product is self-contained, then you're free to take the clean slate approach.


> I assumed the tests wouldn't be for correctness, but for compatibility.

You're assuming that the people coming in to write these tests can even make that distinction. How do you even know what the compatibility should be without really diving deep into the code itself? Given how screwed up the codebase already is, it could be multiple layers of things work against each other. OP mentioned multiple versions of jquery on the same page as an example.

Writing tests for something like that is really a waste of time. Better to just figure out what's correct and rewrite correct code. Then write tests for that correct code... that's what moves things forward.


> How do you even know what the compatibility should be without really diving deep into the code itself?

You can pretty much black-box the code and only deep dive when there are differences. Here's what I've done in the past for a rewrite of an over-the-network service:

1. Grab happy-path results from prod (wireshark pcap, HTTP Archive, etc), write end-to-end tests based on these to enable development-time tests that would catch the most blatant of regressions.

2. Add a lot of logging to the old system and in corresponding places in the new system. If you have sufficient volumes, you can compare statistical anomalies between the 2 systems

3. Get a production traffic from a port mirror, compare the response of your rewritten service against the old service one route at a time. Log any discrepancies and fix them before going live, this is how you catch hard-to-test compat issues

4. Optionally perform phased roll out, with option to roll back

5. Monitor roll out for an acceptable period, if successful, delete old code/route and move to the next one.

The above makes sense when backwards compatibility is absolutely necessary. however, the upsides is once you've set up the tooling and the processes, subsequent changes are faster.


All of that, while technically correct and possible, is vastly more complicated and time intensive than a rewrite of what the OPs description of the codebase is.


Yes, it absolutely is - but the trade off is a far lower risk of introducing breaking changes. Depending on the industry/market/clients - it may be the right tradeoff


In my eyes, a rewrite won't be introducing breaking changes. It would be to figure out what functionality makes money, then replicate that functionality as best as possible so that the company can continue to make money as well as build upon the product to make even more money.

We're talking about a webapp here, not rocket science.


The biggest problem isn't even the codebase in this situation.

When you keep finding bugs like that while refactoring and making things better, it will demoralise you. The productivity will stop when that happens.

It also require above average engineers to fix the mess and own it for which there is not much benefit.

Your refactoring broke things? Now it's your turn to fix it and also ship your deliverables which you were originally hired for. Get paged for things that weren't your problem.

If I was a manager and assigned this kind of refactoring work, I will attach a significant bonus otherwise I know my engineers will start thinking of switching to other places unless we pay big tech salaries.

People keep quoting Joel's post about why refactoring is better than rewrite but if your refactor is essentially a rewrite and your team is small or inexperienced - it's not clear which is better.

Parallel construction and slowly replacing things is a lot of unpaid work. Just the sheer complexity of doing it bit by bit for each piece is untenable for a 3 person team where most likely other two might not want to get into it.


> It also require above average engineers to fix the mess and own it for which there is not much benefit.

That's not true, it doesn't require above average engineers. It requires a tech lead that has the desire and backing to make a change, and engineers willing to listen and change. It doesn't require a 10x engineer to start using version control, or to tell their team to start using version control for example..


Source control seems like a straightforward first step, regardless of what approach is going to be taken going forward


One would think, but how do you go from source control to deployment on the production server though? If they were editing files on the server directly, there could be a whole mess of symlinks and whatever else on there. Even worse, how do you even test things to see if you break anything?

It is a can of worms.


Just start somewhere. These guys are making changes, actual functional changes and bug fixes in that environment meaning they already have all the problems you imagine are going to get in the way of fixing this mess. So stop fretting and just start small with one tiny thing. It doesn't really matter with what. You don't even need automated tests necessarily. It's a small simple flow that needs 10 minutes to run the same test steps manually for? Write them down and do it manually, I don't care. Just Do it.

Been there, done that. Slightly differently where they had a test server and prod server. So already better except one day I made a change and copied to prod. Yes it was manual. Just scp the files over to prod. And stuff broke. Turned out someone had fixed a bug directly in prod but never made the change on the test server.

First thing I did was to introduce version control and create a script to do make deployment automatic meaning it was just a version control update on Prod (also scripting languages here). Magically we never had an issue with bugs reappearing after that.

Pretty simple change and you can go from there.

The above code base was over 20 years old and made use of various different scripting languages and technologies including some of the business logic being in stored procedures. Zero test coverage anywhere. You just 'hide' small incremental changes to make things better in everything you do. Gotta touch this part because they want a change? Well it could break anyhow so make it better and if it breaks, it breaks and you fix it. It needs judgment though. Don't rewrite an entire module when the ask was adding a field somewhere. Make it proportional to the change you need to make and sometimes it's not going to be worth it to make something better. Just leave it.


Not sure the little hammer will fix much. And making folks use a method in new code pisses them off. "You say important I do this your way this time, even though there are 1000 examples of doing it the other way. I feel persecuted and your way is pointless, because it doesn't fix everything anyway. And its slowing me down and making me look bad."

Not rational but folks don't have to explain their feelings. You will be hated.


The little hammer definitely fixes. It does it in the same way as water cut the grand canyon. The beauty is that it works over time.

Now as for how to get the other devs on board, I agree with you that you can't just barge in and tell them everything they are doing is wrong etc. I never said to do that and I'm replying to a specific comment in the thread not the original Ask HN.

I.e. when I write about what I've done in the past, I got buyin from my boss and my colleagues on what I was going to do. But I didn't just sit there and kept doing what they had done over the past years. I changed lots of other little things too in the same manner.

So if we do want to talk about the original Ask HN and how to get the existing employees not to hate you, you can start by letting them tell you about what they think the problems are. What are their pain points. They might just not know what to do about them but actually see them as problems too. Maybe they've tried things already but failed or got shot down by others in the company. Maybe they did try to introduce version control but their non tech boss shot them down.

Of course it may not work out. Some people really are just stupid and won't listen even if you try to help them and make them part of the solution.


Startups have runway and can die when big-company processes forced up on them. It can sink them.


I'm not sure where you're pulling that from. There's no mention of startup here. Neither in the original (actually the opposite I'd say, 12 years and just a business unit).

None of what I said is a big-company process in any way. If in your book using source control is a big company process that will sink a startup then be my guest and I will just hope we never have to work together. Source control is a no-brainer that I even use just for myself, have used in teams of two and teams dozens to hundres. The amount of process around is what scale with the kind of company. Source control is useful by itself in every single size of company.


Source control is necessary and simple, yes.

Code review, coding standards, required tests for everything, multiple stages of deployment - are not simple and can stall development. Done wrong they can sink a company.

It's easy to read the worst possible construction on what other people write here. It's never a good idea.

Btw I worked at a startup for 8 years. It was still a startup, depending on new investment to meet the monthly. In any case the described dev group was behaving in a way that used to be typical of startups. And even business units in larger organizations have runway.


Yeah, lot of worms...and if while refactoring things break. You are on the hook for scanning through that complex monster at 3 am and finding the issue and fixing it for no additional pay in most cases.


They can literally copy the whole directory from their local machine to production as a first step for all I care.

How do they test things on production? If there’s a bug how do they revert to the previous version? There are way more issues without source control than with.


Doesn't Git support symlinks? Empty directories could be trouble though. One would have to put a .GITKEEP into every directory before checkin, and a step at deployment time to remove them again.


"Github actions!? They don't even have source control to begin with."

Right: no point in adding any tests until you've got source control in place. Hence my suggestion for a shortcut to doing that here: https://news.ycombinator.com/item?id=32884305


How do 2 junior devs manage to rewrite the entire product while also meeting the ongoing goals of the business?

You're trying to spec features on a moving target.

Even if they were able to do 50% time on the rewrite you'll never actually get to feature parity.

The only viable plan, unless the company has an appetite to triple the dev headcount, is to set an expectation that features will have an increased dev time, then as you spec new features you also spec out how far you will go into the codebase refactoring what the new features touch.


But it is functional. Grandparent post is suggesting that all the currently used functionality should have tests written for it. It makes sense, as that way they can gather the requirements of a rewrite at the same time.


We don't know that it is functional... maybe the company is only making $20m and should be making $60m. Like I said, we tripled the revenue with a rewrite.

What we did was make the case that we could increase revenue by being able to add valuable features more easily/quickly. We started with a super MVP rewrite that kept the basic valuable features, launched, then spent the rest of our time adding features (with tests). Hugely successful.

The key, of course, will be to get 1-2 top notch developers in place to set things up correctly from the beginning. You're never going to be effective with a few jr's who don't have that level of experience.


> We don't know that it is functional... maybe the company is only making $20m and should be making $60m. Like I said, we tripled the revenue with a rewrite.

It's $20m functional. It's possible it could be better but unless this is the kind of huge org where 20m is nothing (doesn't sound like it) you really need the behaviors documented before you start screwing with it. It's very likely this thing has some pretty complex business logic that is absolutely critical to maintain.


> you really need the behaviors documented before you start screwing with it. It's very likely this thing has some pretty complex business logic that is absolutely critical to maintain.

Nothing I said suggested otherwise. Absolutely critical for whomever is doing a rewrite to understand everything they can about the application and the business, before writing a single line of code.


You sound frustrated that you've joined a company with an absolute stinker of a codebase, because you're confident you could deliver much better results having refactored it first. You're managing a group of people probably enormously under-productive because of the weight of the technical debt they're under. Every change takes months. It's riddled with hard-to-fix bugs. It's insecure. There are serious bus factor problems.

Many of us have been in this exact position before, multiple times. Many of us have seen somebody say "our only choice is a full rewrite" - some of us were the one making that decision. Many of us have seen that decision go disastrously wrong.

For me, the problem was my inability to do what I'm good at: write tests, write implementations that pass that test, etc. Every time I suggested doing something, somebody would have a reason why that would fail because of some unclear piece of the code. So rather than continuously getting blocked, I tried to step into my comfort zone of writing greenfield code. I built a working application that was a much nicer codebase, but it didn't match the original "spec" from customer expectations, so I spent months trying to adjust to that. I basically gave up managing the team because I was so busy writing the code. In the end, I left and the company threw away the rewritten code. They're still in business using the shitty old codebase, with the same development team working on it.

If you really want to do the rewrite, accept how massively risky and stressful it will be. The existing team will spend the whole team trying to prove you were wrong and they were right, so you need to get them to buy into that decision. You need to upskill them in order to apply the patterns you want. And you need to tease apart those bits of the codebase which are genuinely awful from those that for you are merely unfamiliar.

Personally, I would suggest a course for you like https://www.jbrains.ca/training/course/surviving-legacy-code, which gives you a wider range of patterns to apply to this problem.


Maybe this was meant as a reply to the main post?


“I won the lottery, you can too. If you don’t buy a ticket, you’re never gonna win right…?”

There is a lot of evidence rewrites are hard to do well, and especially prone to failure.

…you might pull it off, it’s not impossible, sure. …but are you seriously saying it’s the approach everyone should take because it worked for you once?

Here my $0.02 meaningless anecdotal evidence: I’ve done a rewrite twice and it was a disaster once and went fine the second time. 50% strike rate, for me, personally, on a team of 8.

What’s your rate? How big was your team, how big was the project? What was the budget? Did you do it on time and on budget? It’s pretty easy to say, oh yeah, I rewrote some piece of crap that was a few hundred lines in my spare time.

…but the OP is dealing with a poorly documented system that’s very big, and very important and basically working fine. You’re dishing out bad advice here, because you happened to get lucky once.

Poor form.

Good advice: play it safe, use boring technology and migrate things piece by piece.

Big, high risk high reward plays are for things you do when a) things are on fire, or b) the cost of failure is very low, or c) you’re prepared to walk away and get a new job when they don’t work out.


> How do you fix the things that the test prove are broken?

Uhm. The tests don’t do any such things.

> It is better to spend the time figuring out what all the features are, document them

Yes. And the tests you should write are executable documentation showing how things are. It is like taking a plaster cast of a fossil. You don’t go “i think this is how a brachiosaurus fibula should look like” and then try to force the bones into that shape. You mould the plaster cast (your tests) to the shape of the fossil (the code running in production). Then if during excavation (the rewrite) something changes or get jostled you will know immediately that it happened, because the cast (the tests) no longer fit.


> We did a full rewrite and it was quite successful and the company went on to triple the revenue.

Which sure beats some other company coming along and "rewriting" the same or similar functionality in a competing product and killing your own revenue. But it does come down to how big the codebase is and how long it would take for an MVP to be a realistic replacement. If there are parts that are complex but unlikely to need changing soon you can usually find ways to hide them behind some extra layer. Is there any reason you couldn't just introduce proper processes (source control, PRs, CI/CD etc.) around the existing code though?


Kudos to you for successfully delivering in a similar situation. That said, I think your advice is a bit cavalier. The industry is littered with the carcasses of failed rewrites. The fact that you have done it in one context does not mean that this team can pull it off in another.

I'll also say there's a lot of semantics at play here. What is a "rewrite", what is a "test" vs a "document", what is "functional"? I read your main point being that one should avoid sunk-cost fallacy and find the right places to cut bait and write off unsalvageable pieces. The art of major tech debt cleanup is how big of pieces can you bite off without overwhelming the team or breaking the product.


> Writing tests for something that is already not functional, will be a waste of time.

This is not TDD; it's writing tests to confirm the features that work now. Then, when you make changes, you can get an early warning if something starts going south.


Of course a full rewrite can be successful. This is the problem when people base their entire critical thinking on blog posts. They then go on to preach it everywhere as well!


The blog posts are warnings about what not to do. People, naturally, when they don't fully understand something or can't grasp the complexity of something want to rebuild. Because writing also helps us understand that is what we are building. But its a trap, what you've rewritten will never be the same as before and there lies the footguns.

The blogs are plainly stating, "even though you feel you should rewrite, you probably shouldn't."


Or some of us have experienced failed rewrites. It can be a potentially expensive mistake.


> get some testing in place

What is really needed (and almost definitely doesn’t exist) is some kind of spec for the software.


Exactly. If they write tests, they will be just doing TDD where the specification becomes a problem in itself.


It is a 12 year old legacy product. What specification exists other than, "Yesterday it did X when I clicked the button, but now it does not do that anymore."


This is the point: I don’t TDD, but i am a big fan of tests. I’m this case the incorrect spec can be flagged, but all the other incorrect specs will also be there. If your Fix doesn’t break a spec, great, but if it does you can check if that spec was correct. It’s a back and forth between code and business requirements


You must have missed the part where it makes 20M revenue per year.

I gotta love hacker news, people who think the fact a backend is written in horrid PHP means it is "already not functional" while they spend their days learning something like Haskell that make them negative revenue per year.


Who knows if that 20m revenue should be 60m? They could be held back greatly by the fact that the developers are not motivated to change anything.

I also don't know Haskell and have no desire to learn it. I prefer to build products in static compiled languages where I can more easily hire developers.


Yep.

It's also a juggling job from hell so keep a cool head and seek support and resources for what needs to be done.

A big first step is to duplicate and isolate, as much as possible, a "working copy" of the production working code.

You now need to maintain the production version, as requests go on, while also carving out feasible chunks to "replace" with better modules.

Obviously you work against the copy, test, test again, and then slide in a replacement to the live production monolith .. with bated breath and a "in case of fubar" plan in the wings.

If it's any consolation, and no, no it isn't, this scenario is suprisingly common in thriving businesses.


This approach is a trap.

Management need to know that this needs a rewrite, and a more capable team, and that persuing on aggressive roadmap while things are this bad is impossible.

If they say no, and you try to muddle your way through it anyway, you are setting yourself up to fail.

If they say yes, ask for the extra resources necessary to incrementally rewrite. I would bring in new resources to do this with modern approaches and leave the existing team to support the shrinking legacy codebase.


Why would the existing team stick around knowing their jobs would be slowly rewritten into oblivion by others?


Where else are they going to go if they prefer this mess?

Why would they need to be replaced if they’re ultimately convinced to enter the 21st century?


Your suggestion sounds like the strangler fig pattern. While a valuable strategy in some cases, it does present the risk of duplicating poor architecture choices into the new code.

I would normally opt for your suggested approach too. However, based on the description given, I’d most likely recommend a complete rewrite in this case. The architecture appears to be quite poor and the risk of infecting new code with previous bad decision-making may be too great.


Yeah, I agree, full rewrite from scratch are almost never the good approach. It will start a tunnel when you cannot add anything useful to production for months, and you will have no idea when you can finally ship the whole thing and when you do, it will be very risky.

Do things progressively. Read the code, figure out the dependencies, find the leaves and starts with refactoring that. Do add tests before changing anything to make sure you known if you change some existing behaiors.

Figuring out such code base as a whole might be overwhelming, but remember that it probably looks much more complicated than it is actually.


In a team with only two people working on the monster it seems reasonable that they’d be able to manage two development streams at the same time.


This is the correct answer.

For an additional perspective see this classic: https://dhemery.com/articles/resistance_as_a_resource/


All good points, but…

This is a clear case where he needs to look for another job IMMEDIATELY.

Here’s why…

1. The problems listed are too technical and almost impossible to communicate to a non-technical audience meaning business or c-suite.

2. The fixes will not result (any time soon) in a change that’s meaningful to business like increased revenue or speed to market. Business will not reward you if you are successful or provide resources to get the job done unless the value is apparent to them (See #1).

Employment is a game. Winning that game means knowing when to get out and when to stay.

It’s time to plan your exit both for your own sanity and the good of your family.


+1. also start by adding git first and have a test env set up.

A new person who complains about existing code and proposes "Rewrite everything" on week one, will not met with __respect__


+1. came here to say this! it's in prod, making money; bring up the discussion of full rewrite with the management at your own peril. learn to tame the beast by pruning one dead/redundant function at a time, that's the best you can do, both for the project and for yourself!


My first instinct was "get some testing in place" too. That served me well in recent projects where I was in a similar situation. I was wondering if anyone has any advice on how to make sure your tests are... comprehensive? I was fortunate enough to have full flow tests in place from the beginning and a great team which knew the intricacies of the subject matter. We made lists of usecases and then tried to find orthogonal test cases. But that was my naive approach wondering if there are better methods out there. Especially if there is zero testing.


One more thing I’d add; for the love of all that is holy make sure the tests run lightning quick.

What you want to do is first reduce the cost and risk of making changes, to a close to zero as possible.

Then, come up with a broad system design that defines higher levels of abstraction. Your goal is not to redesign the system from scratch but to specify the existing hierarchies which are currently implicit in the code. Are there different modules that naturally emerge? Ok, what are they?

Once you have a sense of what the destination will look like, make tiny changes to get just one module done. Move in little bits at a time, to build up evidence that things can work.

The way to change a culture is to set such a strong positive example that people naturally went to follow. Telling other people their work sucks is not that example, but first pitching in to speed up development cycles can make everyone happy.

And lastly you have at least some responsibility to inform management of the risk they aren’t aware of. Things will go much better for you if you tell your manager that the codebase was built in a way that makes future changes expensive and risky, and this is fine for where the business was but at some point it makes sense to invest in shifting the development velocity/risk curve of the business.


> The way to change a culture is to set such a strong positive example that people naturally went to follow. Telling other people their work sucks is not that example, but first pitching in to speed up development cycles can make everyone happy.

This is the part I'm having the most trouble with. What if you are at a place which is not software minded? Any tips on making them understand?


“Never rewrite” is a popular cargo-cult that sprang from a well known blog article that made the rounds some years ago. The urge to rewrite can be a naive impulse for sure, but there are LOTS of cases where new and better technology can result in tremendous gains, or where a code base is simply too far gone to redeem. The biggest successes of my career have almost all been ground up rewrites of existing products using new technology or techniques that resulted in orders-of-magnitude improvements in performance and ROI. If you can make incremental improvements that’s great, but sometimes it’s just not possible to rewrite “a piece at a time” because there are no pieces, just one big ball of mud. To the original author: If you don’t rewrite this mess, your competitors will. I’d say: lay out the case for an overhaul, stand your ground, don’t implement any new features until you’ve got a clear path to reducing technical debt, and if you can’t get buy-in to an overhaul just leave. What you’re describing sounds like a textbook scenario for burnout and there are lots of other opportunities where you can work on things in ways that you’ll actually enjoy.


This. So much.

I'd argue that the first order of business is getting the code committed to SCM. Then you can coach the team on new branches (features/bugs), and build the culture of using the SCM. Do this before going to the execs and giving the 10,000 meter view.

Go to the execs and get buy in on the scope of what you need. I'd recomment articulating it in terms of risk reduction. You have a $20M revenue stream, and little control/testing over the machinery that generates this. You'll work on implementing a plan to get this under control (have an outline of this ready, and note that you need to assess more to fill in the details). You need space/time/resources to get this done.

Then get the testing in place. Make this part of the culture of SCM use. Reward the team for developing sanity/functionality tests. Get CI/CD going (a simple one, that just works). From this you can articulate (with the team's input), coding/testing standards to be adhered to.

After all this, start identifying the problematic low hanging fruit. Work each problem (have a clear problem statement, a limited scope for the problem, and a desired solution). You are not there to boil the ocean (rewrite the entire thing). You are there to make their engineering processes better, and move them to a more productive environment. Any low hanging fruit will have a specific need/risk attached to it. Like "we drop tables/columns regularly using user input." Based upon the culture you created with SCM/testing, you can have the team develop the tests for expected and various corner cases. From that, you can replace the low hanging fruit.

Keep doing that until the fruit is no longer low hanging. Prove to the execs that you can manage/solve problems. Once you have that done, you can make a longer term roadmap appeal, that is, start looking at what version 2.0 (or whatever number) would look like, and what internal bits you need to change to get there.

Basically, evolution, not revolution. Easier for execs under pressure to deliver results to swallow. Explain in terms of risks/costs and benefits, though in the near term, most of the focus sounds like it should be on risk reduction.


Not a good advice. I have been at Op's shoes, and I inherited a project that was a clusterf, and did a full re-write. It was a lot of work (more than anticipated), but eventually it was very successful.

The original code was just not salvageable. (It was quickly done as a fast hack, and it would break left and right, causing outages).

Just make sure the OP needs to understand what the OG system is trying to do, and what it will take to re-write it to something sane. Don't start it, before understanding all the caveats of the system/project you are trying to re-write.


Do it in small pieces and you'll be there forever - it'll never get done.

Map out the functionality related to the (hard) requirements and kick off replacing the product(s) with something modern and boring.


> Also, respect the team. Maybe they aren't doing what you would, but they are keeping this beast alive, and probably have invaluable knowledge of how to do so. Don't come in pushing for change...

Yes, 3 people creating a revenue of $20 million/year is impressive.

But what if 1, let alone 2 of them quit and/or fall ill? That's way too much risk for this type of revenue.

If a new team member needs a year to just understand how the code is organized, then a well structured and documented rewrite certainly is necessary.


Something this messy is highly likely to have many security vulnerabilities. Maybe start with a scan or pentest and use that as additional justification to get things in order. 20M a year also means that this company can't afford for this application to be compromised.


The strangler pattern of rewriting individual pieces is also what leads to 3-4 incompatible versions of Jquery. You could start with one key page and rewrite it in React or whatever your preference but if you never manage to kill one of the old dependencies you are just making even more of a tangled web.

I would try to identify how entangled some of the dependencies are and start my rewrite with the goal of getting rid of them. But yeah I agree that version control and testing is going to be key here as you any backsliding will probably result in the idea of future refactoring being viewed negatively.


This sounds like solid advice. A rewrite would be a world of hurt, particularly if you don’t have buy-in from the existing team.

Regarding the team, junior they may be, as he says, but they’re rolling with a multi-mullion dollar product. If they’re keeping the product going and continuing to add business value, then they’re doing something right. Their engineering practices might be questionable, but they seem to have a solid product.

However, getting testing in place is going to be a challenge. I’ve encountered systems that sound similar to this one (perfectly functional, zero discernible architecture, not remotely designed with any kind of testing in mind.) It’ll be difficult to convince the suits that introducing testing has any real value when you’re starting from zero.

The first thing than comes to mind is the strangler fig pattern. Sounds like a useful idea in this instance.

> …an alternative [to a re-write] is to gradually create a new system around the edges of the old, letting it grow slowly over several years until the old system is strangled.[0]

[0] https://martinfowler.com/bliki/StranglerFigApplication.html


This is exactly the right advice. Full rewrite might look good on the resume but will be a late error prone disaster.

Start with tests can't emphasize this enough.


> You can delete code as long as the tests pass.

It's true that poorly maintained code contains a lot of pieces which should be deleted but if tests where added post-hock it is hard to be sure that they cover all use cases.

After adding basic tests I would suggest to improve logging to get good understanding of how the software is used. Better to store them in a database which allows quick queries over all data you have (I'd personally would use ClickHouse but there are other options). But even with good logs you need to wait and collect enough data otherwise you can miss rare but important use cases. E. g. something which happens only during the tax season.


Basically every time I decided for a full rewrite I ended up thinking "thank god I made that decision, the new architecture is much simpler" (and no, it didn't just seem simpler to me).


The big rewrite works - but only if you have a team you can trust. You need a new team of seniors to pair with the current team, promise a promotion to the current team at the end of the task.

Committing to an iterative approach is what I do when I don't have enough authority/ political tokens and I can't afford a rewrite.

Over time it gets less and less priority from the business and you end up with half a codebase being crap and half codebase being ok and maintaining stuff is even harder.


Agreed, full rewrite is a horrible idea. Source: worked on a rewrite of a project that was like this: PHP from 2003, 7 figures in revenue, written by someone who was not a developer, no version control or testing. And it failed horribly.

I have tactical suggestions, but the strategy is simple: move toward more modern software practices, one step at a time.

But first, the elephant in the room. You say you need to help the project

> without managing [the team] directly

Who does? How can you help them?

Because you don't have direct authority, all the tactics and suggestions mentioned here won't be as helpful as they would if you were the manager in charge. And it's hard to offer concrete advice without knowing exactly how you are connected. A principal in the same company and want to help? A peer of the manager? A peer of the team members? Each of these would have different approaches.

And how much time do you have to help? Is this something you are doing in the shadows? Part of your job? Your entire job?

With that said, here's my list of what to try to influence the team to implement. Don't worry about best of breed for the tools, just pick what the company uses. If the tool isn't in use at the company, pick something you and the team are familiar with. If there is nothing in that set, pick the industry standard (which I try to supply).

1. version control. Git if you don't have any existing solution. GitHub or GitLab are great places to store your git repos

2. bug tracker. You have to have a place to keep track of issues. GitHub issues is adequate, but there are a ton of options. This would be an awesome place to try to get buy-in from the team about whichever one they like, because the truth it is doesn't matter which particular bug tracker you use, just that you use one.

3. a build tool so you have one click deploys. A SaaS tool like CircleCI, GitHub actions is fine. If you require "on prem", Jenkins is a fine place to start. But you want to be able to deploy quickly.

4. a staging environment. This is a great place to manually test things and debug issues without affecting production. Building this will also give you confidence that you understand how the system is deployed, and can wrap that into the build tool config.

5. testing. As the parent comment mentions, end to end testing can give you so much confidence. It can be easy to get overwhelmed when adding testing to an existing large, crufty codebase. I'd focus on two things: unit testing some of the weird logic; this is a relatively quick win. And setting up at least 1-2 end to end tests through core flows (login, purchase path, etc). In my experience, setting up the first one of each of these is the toughest, then it gets progressively easier. I don't know what the industry standard for unit testing in php is any more, but have used phpunit in the past. Not sure about end to end testing either.

6. Documentation. This might be higher, depending on what your relationship with the team is, but few teams will say no to someone helping out with doc. You can document high level arch, deployment processes, key APIs, interfaces, data stores, and more. Capture this in google docs or a wiki.

7. data migrations. Having some way to automatically roll database changes forward and back is a huge help for moving faster. This looks like a viable PHP option: https://laravel.com/docs/9.x/migrations which might let you also introduce a framework "via the side door". This is last because it is least important and possibly more intrusive.

None of these are about changing the code (except maybe the last one), but they all wrap the code in a blanket of safety. There's the added bonus that it might not trigger sensitivities of the team because you aren't touching "their code". After implementing, the team should be able to move faster and with more confidence.

Since you are not the direct manager, you want to help the team get better through your influence and through small steps. That will build trust and allow you to suggest bigger ones, such as bringing in a framework or building abstraction layers.


Agree with this approach 100%


Yes same. Sometimes you see a frankenstein code and devs get all emotional and wants a full rewrite or die attitude. Maybe take a step back and migrate piece by piece.


> First off, no, a full rewrite is not only not necessary, but probably the worst possible approach. Do a piece at a time. You will eventually have re-written all the code, but do not ever fall into the trap of a "full re-write". It doesn't work.

I've seen systems where the entirety of the codebase is such a mess, but is so tightly coupled with the business domain, that a rewrite feels impossible in the first place. Furthermore, because these systems are often already working, as opposed to some hypothetical new rewrite, new features also get added on top of the old systems, meaning that even if you could rewrite them, by the time you would have done so, it would already be out of date and wouldn't do everything that the new thing would do (the alternative to which would be making any development 2x larger due to needing to implement things both in the old and new versions, the new one perhaps still not having all of the building blocks in place).

At the same time, these legacy systems are often a pain to maintain, have scalability and stability challenges and absolutely should not be viewed as a "live" codebase that can have new features added on top of it, because at that point you're essentially digging your own grave deeper and deeper, waiting for the complexity to come crumbling down. I say that as someone who has been pulled into such projects, to help and fix production environments after new functionality crippled the entire system, and nobody else knew what to do.

I'd say there is no winning here. A full rewrite is often impossible, a gradual migration oftentimes is too complex and not viable, whereas building on top of the legacy codebase is asking for trouble.

> But before you re-write once line of code - get some testing in place. Or, a lot of testing. If you have end-to-end tests that run through every feature that is currently used by your customer base, then you have a baseline to safely make changes. You can delete code as long as the tests pass. You can change code as long as the tests pass.

This is an excellent point, though! Testing is definitely what you should begin with when inheriting a legacy codebase, regardless of whether you want to rewrite it or not. It should help you catch new changes breaking old functionality and be more confident in your own code's impact on the project as a whole.

But once again, oftentimes you cannot really test a system.

What if you have a service that calls 10 other services, which interact with the database or other external integrations, with tight coupling between all of the different parts? You might try mocking everything, but at that point you're spending more time making sure that the mocking framework works as expected, rather than testing your live code. Furthermore, eventually your mocked data structures will drift out of sync to what the application actually does.

Well, you might try going the full integration test approach, where you'd have an environment that would get tests run against it. But what if you cannot easily create such an environment? If there are no database migrations in place, your only option for a new environment will be cloning an existing one. Provided that there is a test environment to do it from (that is close enough to prod) or that you can sufficiently anonymize production data if you absolutely need to use it as the initial dump source, you might just run into issues with reproducibility regardless. What if you have multiple features that you need to work on and test simultaneously, some of which might alter the schema?

If you go for the integration testing approach, you might run into a situation where you'll need multiple environments, each of which will need their own tests, which might cause significant issues in regards to infrastructure expenses and/or software licensing costs/management, especially if it's not built on FOSS. Integration tests are still good, they are also reasonably easy to do in many of the modern projects (just launch a few containers for CI, migrate and seed the database, do your tests, tear everything down afterwards), but that's hard to do in legacy projects.

Not only that, but you might not even be fully aware how to write the tests for all of your old functionality - either you need to study the whole system in depth (which might not be conceivable), or you might miss out on certain bits that need to be tested and therefore have spotty test coverage, letting bugs slip through.

> Once you are at that point, start picking off pieces to modernize and improve.

It helps to be optimistic, but for a plethora of reasons, many won't get that far. Ideally this is what people should strive for and it should be doable, but in these older projects typically the companies maintaining them have other issues in regards to development practices and reluctance to introduce tools/approaches that might help them improve things, simply because they view that currently things are working "good enough", given that the system is still generating profits.

Essentially, be aware of the fact that attempts to improve the system might make things worse in the short term, before they'll get better in the long term, which might reflect negatively upon you, unless you have sufficient buy-in to do this. Furthermore, expect turnover to be a problem, unless there's a few developers who are comfortable maintaining the system as is (which might present a different set of challenges).

Ideally, start with documentation about how things should work, typical use cases, edge cases etc.

Then move on to tests, possibly focusing on unit tests at first and only working with integration tests when you have the proper CI/environment setup for this (vs having tests that randomly fail or are useless).

After that, consider breaking the system up into modules and routing certain requests to the new system. Many won't get this far and I wouldn't fault you for exploring work in environments that set you up for success, instead of ones where failure is a looming possibility.


i'd do it that way too,

- tests to cement interfaces

- gradually write module supporting this interface

- replace module on test clone and bench / retest it

when this module is ok, do another


Huh. You are literally saying do a full rewrite. But it's also the worst idea?

Edit: A full rewrite always meant replacing every part of a system. Whether you do it gradually doesn't really matter.


"Whether you do it gradually doesn't really matter."

It absolutely DOES matter. A gradual rewrite is much more likely to work than a stop-the-press rewrite.


It's still a rewrite. The crux of the statement I made.


There problem with a classic full rewrite is that the existing system is thrown away immediately. All the existing features are not available in production until the rewrite adds them back in. Often incomplete, buggy, changed beyond all recognition, or a combination of all of these. That obviously sucks and is the reason the classic rewrite is rarely done. However, it is clear that something must happen.


"Full rewrite" is a description of the end state, not the process.

The best way to do a full rewrite is incrementally, with test support and consideration for natural separation of internal subsystems.


The best way to do a rewrite may be incremental, but the terminology of "full rewrite" doesn't usually refer to an incremental rewrite, it refers to starting from scratch.


I don't think that's true -- a "full" rewrite is used in contrast to a "partial" rewrite, where only part of the system is replaced. It's called a "full" rewrite because the goal from the start is to fully replace the system with new code.

Consider that if this were not true, then there would be no way to describe an incremental full rewrite, nor any way to describe a from-scratch replacement of a subsystem.

I've written on this topic before, for example https://increment.com/software-architecture/exit-the-haunted...


He’s saying to Ship of Theseus the codebase. Don’t build a new ship and then burn down the old ship. Replace the old ship piece by piece in place.


That only works if the new pieces correspond to old pieces. If there's no good structure to build on, the units to be replaced will constrain the architecture of the new ship.

At some point you end up trying to change a pumpkin boat into an aircraft carrier, and there's no obvious way you can do that one piece at a time.


> If there's no good structure to build on, the units to be replaced will constrain the architecture of the new ship.

Which is why you do it in stages: add scaffolding until local rewrites are possible, then rewrite the business logic, then tear the scaffolding down.


That's a good analogy actually. Scaffolding is a kind of temporary test structure that you can use to maintain function while you figure out something better.


Maybe there are some underlying architectural problems that need to be addressed, but it would be impossible make those changes from the current situation. It sounds like it is impossible to even know what code is live vs sitting on the server. How do you even know you have a firm grasp on the current architecture when it is unclear what code is even running the product?

A lot of low hanging fruit to be addressed that will likely lead to meaningful improvements. Once the code is in better shape and some unfortunate legacy pattern is identified, than it can be considered time to re-tool the architecture.


Agreed. The first thing to do is figure out WTF is going on. This is perhaps the hardest kind of thing to do as a developer.


Full rewrite generally means stop the presses we are gonna migrate this whole thing from here to there and no new features until it's done (hint it never gets done).


I’ve only ever witnessed ship-of-Theseus style migrations and those also never get done.


Does not compute... Ship of Theseus is just regular old development of course it never gets done but new features aren't put on hold.


I mean like “we want to replace X with Y”. Y incrementally starts replacing X, but 100% migration is never achieved, meaning double the API surface area exists indefinitely.

Because the migration doesn’t block new features, that means the org gets tired and reallocates the effort elsewhere before it’s ever done, with no immediate consequences. Rinse and repeat.


I think you've not witnessed Ship of Theseus, but "build Ship2 next to Ship1 and start using Ship2 while Ship1 is still being used and keep saying you're going to migrate to Ship2 eventually but meanwhile Ship1 and Ship2 diverge and now you have 2 ships".

I recently witnessed this mess and it is an enormous mess. Don't build Ship2 in the first place. Instead, replace Ship1's mast and sails, and rudder etc until you've replaced all the parts in Ship1. That's the SoT approach.


Right but how do you replace the masts? Don’t you have to build mast2 and then tear down mast1 if you want to have continuous propulsion?


Yes, can you see how that's quite different from building a second ship?


In my comment, X and Y are different masts, not different ships.


I understand now.


A "full rewrite" means that after the completion of the rewrite, the old code has been fully replaced by new code.

What you're describing is a "stop-the-world" rewrite.


I think you are being needlessly pedantic. Everyone understands that "full rewrite" means "restarting from scratch" in this context, especially since the poster was very clear that eventually everything will be touched.


They’re saying to do it, eventually, incrementally and not all at once.


And critically… never completing an incremental rewrite doesn’t matter. Everything remained working the whole time and continues to make the company money. And as a bonus, you were also able to make feature changes that the business wanted at the same time. It’s classic XP, when the money runs out, the system still works!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: