Isn't code review the one area you'd *never* want to replace the programmer in? ...

hansvm · on Dec 22, 2024

If you think of the "AI" benefit as more of a linter then the value starts to become clear.

E.g., I've never hesitated to add a linting rule requiring `except Exception:` in lieu of `except:` in Python, since the latter is very rarely required (so the necessary incantations to silence the linter are cheap on average) and since the former is almost always a bug prone to making shutdown/iteration/etc harder than they should be. When I add that rule at new companies, ~90% of the violations are latent bugs.

AI has the potential to (though I haven't seen it work well yet in this capacity) lint most such problematic patterns. In an ideal world, it'd even have the local context to know that, e.g., since you're handling a DB transaction and immediately re-throwing the raw `except:` is appropriate and not even flag it in the "review" (the AI linting), reducing the false positive rate. You'd still want a human review, but you could avoid bugging a human till the code is worth reviewing, or you could let the human focus on things that matter instead of harping on your use of low-level atomic fences yet again. AI has potential to improve the transition from "code complete" to "shipped and working."

siscia · on Dec 22, 2024

Wouldn't be simpler to just use a linter then?

stavros · on Dec 22, 2024

It would be simpler, but it wouldn't be as effective.

abenga · on Dec 22, 2024

How so? The build fails if linter checks don't pass. What does an ai add to this?

stavros · on Dec 22, 2024

Much more nuanced linting rules.

fzeroracer · on Dec 22, 2024

What exactly is gained from these nuanced linting rules? Why do they need to exist? What are they actually doing in your codebase other than sucking up air?

stavros · on Dec 22, 2024

Are you arguing against linters? These disingenuous arguments are just tiring, you either accept that linters can be beneficial, and thus more nuanced rules are a good thing, or you believe linters are generally useless. This "LLM bad!" rhetoric is exhausting.

fzeroracer · on Dec 22, 2024

No. And speaking of disingenuous arguments, you've made an especially absurd one here.

Linters are useful. But you're arguing that you have such complex rules that they cannot be performed by a linter and thus must be offloaded to an LLM. I think that's wrong, and it's a classical middle management mistake.

We can all agree that some rules are good. But that does not mean more rules are good, nor does it mean more complex rules are good. Not only are you integrating a whole extra system to support these linting rules, you are doing so for the sake of adding even more complex linting rules that cannot be automated in a way that prevents developer friction.

stavros · on Dec 22, 2024

If you think "this variable name isn't descriptive enough" or "these comments don't explain the 'why' well enough" is constructive feedback, then we won't agree, and I'm very curious as to what your PR reviews look like.

hansvm · on Dec 22, 2024

Linters suffer from a false positive/negative tradeoff that AI can improve. If they falsely flag things then developers tend to automatically ignore or silence the linter. If they don't flag a thing then ... well ... they didn't flag it, and that particular burden is pushed to some other human reviewer. Both states are less than ideal, and if you can decrease the rate of them happening then the tool is better in that dimension.

How does AI fit into that picture then? The main benefits IMO are the abilities to (1) use contextual clues, (2) process "intricate" linting rules (implicitly, since it's all just text for the LLM -- this also means you can process many more linting rules, since things too complicated to be described nicely by the person writing a linter without too high of a false positive rate are unlikely to ever be introduced into the linter), and (3) giving better feedback when rules are broken. Some examples to compare and contrast:

For that `except` vs `except Exception:` thing I mentioned, all a linter can do is check whether the offending pattern exists, making the ~10% of proper use cases just a little harder to develop. A smarter linter (not that I've seen one with this particular rule yet) could allow a bare `except:` if the exception is always re-raised (that being both the normal use-case in DB transaction handling and whatnot where you might legitimately want to catch everything, and also a coding pattern where the practice of catching everything is unlikely to cause the bugs it normally does). An AI linter can handle those edge cases automatically, not giving you spurious warnings for properly written DB transaction handling. Moreover, it can suggest a contextually relevant proper fix (`except BaseException:` to indicate to future readers that you considered the problems and definitely want this behavior, `except Exception:` to indicate that you do want to catch "everything" but without weird shutdown bugs, `except SomeSpecificException:` because the developer was just being lazy and would have accidentally written a new bug if they caught `Exception` instead, or perhaps just suggesting a different API if exceptions weren't a reasonable way to control the flow of execution at that point).

As another example, you might have a linting rule banning low-level atomics (fences, seq_cst loads, that sort of thing). Sometimes they're useful though, and an AI linter could handle the majority of cases with advice along the lines of "the thing you're trying to do can be easily handled with a mutex; please remove the low-level atomics". Incorporating the context like that is impossible for normal linters.

My point wasn't that you're replacing a linter with an AI-powered linter; it's that the tool generates the same sort of local, mechanical feedback a linter does -- all the stuff that might bog down a human reviewer and keep them from handling the big-picture items. If the tool is tuned to have a low false-positive rate then almost any advice it gives is, by definition, an important improvement to your codebase. Human reviewers will still be important, both in catching anything that slips through, and with the big-picture code review tasks.

imoverclocked · on Dec 22, 2024

Many code review products are not AI. They are often collections of rules that have been specifically crafted to catch bad patterns. Some of the rules are stylistic in nature and those tend to turn people away the fastest IMHO.

Many of these tools can be integrated into a local workflow so they will never ping you on a PR but many developers prefer to just write some code and let the review process suss things out.

eterm · on Dec 22, 2024

Code review is also one of the last places to get adequate knowledge-sharing into many processes which have gone fully remote / asynchronous.

yorwba · on Dec 22, 2024

You probably shouldn't take a code review bot's "LGTM!" at face value, but if it tells you there is an issue with your code and it is indeed an issue with your code, it has successfully subbed in for your coworker who didn't get around to looking at it yet.

imiric · on Dec 22, 2024

> if it tells you there is an issue with your code and it is indeed an issue with your code

That's the crucial bit. If it ends up hallucinating issues as LLMs tend to do[1], then it just adds more work for human developers to confirm its report.

Human devs can create false reports as well, but we're more likely to miss an issue than misunderstand and be confident about it.

[1]: https://www.theregister.com/2024/12/10/ai_slop_bug_reports/

dakshgupta · on Dec 22, 2024

I agree - at least with where technology is today, I would strongly discourage replacing human code review with AI.

It does however serve as a good first pass, so by the time the human reviewer gets it, the little things have been addressed and they can focus on broader technical decisions.

Would you want your coworkers to run their PRs through an AI reviewer, resolve those comments, then send it to you?

AgentOrange1234 · on Dec 22, 2024

This feels like the right take.

Once we have an AI starting to do something, there’s an art to gradually adopting it in a way that makes sense.

We don’t lose the ability to have humans review code. We don’t have to all use this on every PR. We don’t have to blindly accept every comment it makes.

jerrygoyal · on Dec 22, 2024

No one is blindly merging PR after AI Code Review. Think of review bot as an extra pair of eyes.

bobnamob · on Dec 22, 2024

Yet.

The things I've seen juniors try to do because "the AI said so" is staggering. Furthermore, I have little faith that the average junior gets competent enough teaching/mentoring that this attitude won't be a pervasive problem in 5ish years.

Admittedly, maybe I'm getting old and this is just another "darn kids these days" rant