>there's not a lot of empirical evidence to support claims that automated tests will improve developer productivity or code quality.
If you'd have said "there aren't any published papers _proving_ this" I would've agreed simply because I do not know if such studies exist or not (they might); but I'd argue that automated tests obviously improve code quality and productivity. And also ramp up time, which is important in fast growing teams (my case).
I'm replying to this and also some of the sibling comments.
No, automated development-time tests do not necessarily equate to higher quality, given that they:
- Take time away from other activities
- May provide a false positive, lulling developers into a false sense of confidence
- Increase the time taken for certain CI/CD operations, sometimes by hours or days!
- The presence of a large volume of "test" code means that certain refactoring operations take a lot longer, which is a real cost.
People go into some work environments where tests DID help, and they generalise that to ALL work environments.
Automated build-time tests are less important, and entire categories may be potentially unnecessary when:
- The product is written in a strongly-typed language that catches most errors at compile time.
- If high-level "linting" tools are used to further enhance the code quality before the executable ever runs.
- If feedback from the production environment is more relevant. E.g.: for web applications with performance issues being monitored by an APM, but no critical code stability concerns at build time. Think typical web apps with no real consequences to a crash, not finance applications where a small error could mean Real Money.
- Where time-to-market is more important than chasing 100% robustness at the expense of meeting release dates.
You often see people from a PHP or Python background go on and on about the importance of tests, but those are dynamically typed languages with lots of unexpected pitfalls that only testing will uncover. Meanwhile entire teams do just fine without any automated tests when using languages like C#, Java, or Rust.
Types do a great job at assuring you’re getting generally the right kind of data in and out of different functions/methods, but they don’t assure that the program is actually doing the right thing. Strong typing just helps to assure that the program isn’t doing specific categories of the wrong thing (in the same way that Rust and Go make it difficult to write memory-unsafe code). I mean, Go and Rust both have testing capabilities built in. If it wasn’t necessary, why would they do that?
Beyond the most trivial of projects, I can’t think of a single project that I’ve worked on (or even looked at on GitHub) that has weighed the costs and benefits of automated testing and decided it just wasn’t worth it.
Conversely, I've actually never seen automated tests "in the wild". Every project I've come across has had zero tests, or close to it.
I'm dealing with a migration of a legacy code base right now, today, that could probably use some tests! But even suggesting this is a complete non-starter: The time required to write the tests vastly exceeds the time and budget allocated to the migration project. I'll simply enable an APM tool that costs $50 a month and call it a day. If something crashes in UAT we'll catch it. If not, we'll definitely catch it in PRD. If not, then it's probably not worth dealing with!
I've added tests to some of my projects in the past and caught virtually nothing with them. I always run my code "through its paces" at least a few times with a REPL, debugger, or some sort of tracing tool before it's ever committed to the repo.
The kind of bugs I find in production would never be found by typical automated test suites. Things like logic errors due to misunderstanding the requirements, or deadlocks in a database that only manifest with an obscure multi-user workflow combination.
As a random example, a significant issue in one codebase was that a distinct sort was case-insensitive but should have been case-sensitive. It was dropping a few hundred items out of hundreds of millions. Will your tests find this issue? What if I told you nobody noticed for years, and that the processing where this manifested takes 4 hours?
Will your dev team still be productive if changes require 4-hour tests every time they hit build or check in the code? To find a one-time issue that won't reoccur now that it's been fixed? Really?
Small, fast tests often find nothing and are worthless because they don't test anything of interest.
Big complicated tests are time consuming to write, still often find nothing, and slow down processes.
4 hour tests are a nonstarter, but ponder this: how can you guarantee that bug will never happen again? You can be pretty sure. You can reason about the code that you’re now familiar with and infer (based on your current understanding) that something like that won’t happen again. But you can’t guarantee it.
Some bugs don’t really deserve tests that run all the time - that I’ll grant you. I frequently have tests for stuff like that as a form of documentation for myself down the road. A year from now, will you remember the specific steps you took to fix your case sensitivity bug? I wouldn’t. But my VCS will. Future me will be very grateful if I write a “verify this annoying case sensitivity thing isn’t a problem” script somewhere so that if I’m seeing weird behavior in the future, I can be reasonably sure that it’s a different problem (or that my fix got reverted or overwritten).
I also don’t find a lot of value in unit tests or having 100% code coverage. But that doesn’t invalidate automated testing altogether. I cannot tell you how often tests - even just simple end to end, happy path tests - have saved my ass.
I’m willing to bet that you’re already testing your software manually, right? Like if you change (for the sake of example) a form, you’re probably going to submit that form a few times with different inputs. Somebody else down the road is going to do that too. Why wouldn’t you write down the different inputs you gave so that the next person has them available? Maybe that person doesn’t have the same context that you do. Maybe they just aren’t as thorough as you. Either way though: if you’ve already done the work to figure out how to fully exercise the form, why would you make someone else do that work again?
The next logical step after writing that stuff down is to automate it if for no other reason than to avoid having smart, expensive people sitting around doing data entry when a computer can do it for them.
That seems extremely uncontroversial to me and I have a hard time understanding why someone wouldn’t be on board with that.
The fundamental logic error in the justification for tests is that I don't need to guarantee that the issue will never reoccur. A reasonable level of confidence is more than sufficient for most projects, most of the time.
As an example of an environment where automated tests are critical: I read a post by someone complaining that when they worked at Oracle the automated test suite ran through hundreds of thousands of individual tests and took hours despite using a huge farm of servers.
That guy was wrong! I would absolutely test something like a commercial RDBMS to death. Those tests are not optional. Similarly, if I was developing a file system such as ZFS, I would also test the heck out of it. Famously, Sun had a test lab where robots would physically pull drives out of servers!
But would I test a typical web form with automated tests? No. It's boring CRUD code. It's going to work, because I use parametrised queries using strongly-typed ORM to a CotS database platform using "boring" code paths that are well known to work. If the DB schema changes, the build will fail.
I'm more interested in testing if the form UX layout "looks pretty" and has a "nice layout to help the workflow." Ensuring those will also exercise the boring parts of the back end code anyway as I click through the form a bunch of times.
Once the code has been established as working, and strong typing is used end-to-end, why bother testing it over and over?
Personally, I don't have time to spend on solving the same problem twice. I want a guarantee that something isn't going to break again. Yeah, it probably won't break again in the exact same way, but if you're writing your test to cover one narrow failure mode, perhaps it's time to brush up your software QA skills.
You're making a set of assumptions when you write a piece of code, right? All the steps between when you actually submit a form and when the data is saved in your database are assumptions that you're implicitly relying on in order for your application to function. Individually, each of those components can work just fine, but no matter how "boring" they are, your application can still be broken.
For instance:
> a distinct sort was case-insensitive but should have been case-sensitive. It was dropping a few hundred items out of hundreds of millions.
You don't know that it's _going_ to work until you've _tested_ that it works. Right now, it sounds like your testing is almost exclusively manual. That's a valid approach, but it doesn't scale very far (neither in terms of application complexity nor team size and composition nor project age).
> Once the code has been established as working, and strong typing is used end-to-end, why bother testing it over and over?
You've established that it's working as expected _now_. What happens if somebody submits incomplete or malformed or outright malicious data? What happens if somebody opens two copies of your form, fills them both out, and then submits each of them in sequence? Which one should take precedence and does the notion of submitting this particular form more than once even make sense? What happens if your database isn't reachable but you have data that you need to save? What happens if another developer on your team updates the ORM and that includes a change to how data is sanitized on its way into the database that breaks some implicit assumption about how your data is going to be stored?
I hope you have some way to catch that stuff in code review. Syntactic rigor and being a bottleneck in the code review process is only going to get you so far.
Personally, I don't want to be figuring any of that stuff out at 3am when the pager is going off. I want to be confident that the team has solved _and documented_ all of the potential problems that they can think of (in the form of tests).
Lot of good reasons above. Another is that if code has solid tests, it makes code review a lot easier / faster, especially if somebody else is modifying your code.
I worked on a project with no tests. We had to refactor a specific functionality changing database entities, augmenting algorithms and adding behaviour. Hundreds of possible cases, no way to be tested manually. I refactor all the code to be "testable", I wrote hundreds of tests. I wouldn't have done it without tests.
In the same project, a single algorithm had a dozen of parameters, dozens of possible scenarios. An issue occured in prod. Luckily algorithm had over 60 tests, we added a red test, fixed, run all green and we were very confident of the release, same confidence not possible without tests.
+1. That's one of the first things I look for if I need to refactor something. I want tests to exist to verify refactoring doesn't change functionality.
It's so obviously self evident that automated tests improve productivity, especially for developers new to a codebase.
A developer makes a non trivial change. How do they know nothing broke?
If there are no tests, they must spend an inordinate amount of time learning the entire system / product to gain any confidence they didn't break something, only to likely miss something that a senior team member will point out. The back/forth will take multiple developers time. At worst, it gets punted all the way to a QA team. Could be days.
Or, with tests, that happens far less often and the feedback loop can be more like 10 minutes.
Of course, tests don't solve onboarding, but they provide nice guard rails.
Of course I'm assuming useful tests, why wouldn't I? Note I don't claim tests completely solve all productivity or onboarding problems.
It's just odd to claim it's somehow ~unclear that automated tests aren't good for productivity. Forget the extremes (over testing, too many unit tests vs integration, potential rigidness): tests done right are productivity multipliers.
And that's not a "No True Scotsman" retort because the original claim had no qualifiers, and was suggesting that any version of automated tests are orthogonal to developer productivity. All you must do to refute that is think of any codebase you were able to make progress on faster because a test told you what was wrong faster than it'd take you to blame source code and send an email to a developer you maybe have never met yet.
> It's just odd to claim it's somehow ~unclear that automated tests aren't good for productivity.
No it's not, because it is unclear. Where's the actual evidence? How do you even define and measure "productivity" in software development?
My customers define productivity as "Delivered according to spec on time and within budget." That's the only measurement they care about. As hard as it is for programmers to understand beautiful "clean" code and automated tests are generally not stakeholder priorities. Yes, bad code and no testing has a cost that may have to get paid down the road. I think stakeholders understand that software will fall short of perfection.
For a long time the standard practice was unit testing, integration testing, and (if you were lucky) rigorous QA. Plenty of usable software got written that way. There's nothing inherently wrong with TDD and automated testing, and when done right those techniques can reduce the need for more complex testing (integration and QA) and help the team understand what the code is supposed to do. That's not the only way to write good code that works, but it's one way.
The problems of defining, measuring, and improving programmer productivity have been known and discussed since the 1960s. We have a set of fairly vague best practices (also known since the 60s/70s) but there's no silver bullet. The problems with developing non-trivial software in team environments mainly come from communication and team dynamics, not from lack of style guides or automated tests. Every little bit can help, of course, but let's not kid ourselves that TDD and automated testing are magic fairy dust.
How does the developer know if a failing case is due to a bug or something that needs to be changed? Is the new developer being dropped in with no support and no peer review and asked to make a non-trivial change? If so, then tests are not so useful in an strongly typed environment since interface changes will be caught at compile time.
The question that matters is not whether automated tests improve code quality and productivity.
The question that needs to be answered is "if I took the time I previously spent on automated testing, and instead spent it on something else like code review or design review, would that result in greater code quality and productivity than automated testing did?"
And phrased that way, you see it's not a dichotomy. It depends on what you would spend the time on instead of automated testing. It depends on whether you already had peer review in your process. It depends on what sort of automated tests you spend time on. It's very hard to say something general without those details.
----
Note that I'm not taking a stance against automated testing. I prefer working in a test-first style myself. I'm just saying it's not that simple.
I think you are agreeing with me. There's anecdotal evidence like you offer. It's not necessarily obvious to everyone that automated tests improve code quality or it would be a settled issue (it's not).
In my own career (40+ years) I have worked with automated testing and TDD, and without, and I can't say that TDD "obviously" improves code quality. I know other professional programmers who have that experience. Like everything else in software development it's probably situational and depends on the individuals and the team.
Generally speaking "empirical evidence" means "published papers", with a little bit of flex for preprints or trustworthy-but-not-peer-reviewed experimental findings.
I was using it to mean actual data rather than anecdotes.
The subject of programmer productivity is both much-studied and discussed and very hard to define, measure, and compare. The studies that do have some empirical evidence seem to show that team dynamics and individual programmer personalities have the most dramatic effects, compared to things like programming language, tests, style guides, etc. It's not that consistent styles and automated testing are bad ideas, it's that they don't clearly make a big dent in team productivity in cases where one or a few incompetent (or merely slow) programmers, or one jerk, can derail everyone's productivity.
This I why I emphasized demonstrating a commitment to the team and showing some contributions before criticizing the environment and trying to propose changes. If you want the project to grind to a halt tell your new co-workers they are doing everything wrong. No automated test suite is going to fix that problem.
>there's not a lot of empirical evidence to support claims that automated tests will improve developer productivity or code quality.
If you'd have said "there aren't any published papers _proving_ this" I would've agreed simply because I do not know if such studies exist or not (they might); but I'd argue that automated tests obviously improve code quality and productivity. And also ramp up time, which is important in fast growing teams (my case).