This rings similar to a recent post that was on the front page about red team vs. blue team.
Before running LLM-generated code through yet more LLMs, you can run it through traditional static analysis (linters, SAST, auto-formatters). They aren’t flashy but they produce the same results 100% of the time.
Consistency is critical if you want to pass/fail a build on the results. Nobody wants a flaky code reviewer robot, just like flaky tests are the worst.
I imagine code review will evolve into a three tier pyramid:
1. Static analysis (instant, consistent) — e.g using Qlty CLI (https://github.com/qltysh/qlty) as a Claude Code or Git hook
2. LLMs — Has the advantage of being able to catch semantic issues
3. Human
We make sure commits pass each level in succession before moving on to the next.
Reading that post sent me down the path to this one. This stack order makes total sense, although in practice it's possible 1-2 merge into a single product with two distinct steps.
The 3. is interesting too - my suspicion is that ~70% of PRs are too minor to need human review as the models get better, but the top 30% will because there will be opinion on what is and isn't the right way to do that complex change.
I was so hoping that this would not be about AI, and actually talk about how we need to do better as an industry and start using objective measures of software quality backed by government standards.
Nope, its about AI code reviewing AI, and how that's a good thing.
Its like everyone suddenly forgot the old adage: "code is a liability".
"We write code twice as fast!" just means "we create liability twice as fast!". It's not a good thing, at all.
I had thought that putting quotes around the phrase "independent auditor" above would have prevented this sort of misunderstanding, but clearly not, so I've changed the title to something more straightforward now.
(Submitted title was "Software needs an independent auditor")
This is advertising for an AI product. Slightly more interesting background story than most articles doing so, but still an ad for a product that probably won't actually work.
I have no direct experience with Greptile, but asked about AI code review assistants on a mailing list of engineering leaders I'm on. Several folks suggested greptile.
My employer uses greptile and I'm pretty happy with it. Sometimes it can be a bit overzealous but more often than not it catches real issues and gives the author a chance to reply or fix them before another human reviews the PR.
Calling it an ad is just lazy dismissal. Everybody is selling something. If you’re more focused on purity tests than evaluating whether the idea actually works, you’re not critiquing, you’re gatekeeping.
This rings similar to a recent post that was on the front page about red team vs. blue team.
Before running LLM-generated code through yet more LLMs, you can run it through traditional static analysis (linters, SAST, auto-formatters). They aren’t flashy but they produce the same results 100% of the time.
Consistency is critical if you want to pass/fail a build on the results. Nobody wants a flaky code reviewer robot, just like flaky tests are the worst.
I imagine code review will evolve into a three tier pyramid:
1. Static analysis (instant, consistent) — e.g using Qlty CLI (https://github.com/qltysh/qlty) as a Claude Code or Git hook
2. LLMs — Has the advantage of being able to catch semantic issues
3. Human
We make sure commits pass each level in succession before moving on to the next.
Reading that post sent me down the path to this one. This stack order makes total sense, although in practice it's possible 1-2 merge into a single product with two distinct steps.
The 3. is interesting too - my suspicion is that ~70% of PRs are too minor to need human review as the models get better, but the top 30% will because there will be opinion on what is and isn't the right way to do that complex change.
I was so hoping that this would not be about AI, and actually talk about how we need to do better as an industry and start using objective measures of software quality backed by government standards.
Nope, its about AI code reviewing AI, and how that's a good thing.
Its like everyone suddenly forgot the old adage: "code is a liability".
"We write code twice as fast!" just means "we create liability twice as fast!". It's not a good thing, at all.
Improving automated code review does improve software, so idk why you're grinding that axe in this particular thread.
I had thought that putting quotes around the phrase "independent auditor" above would have prevented this sort of misunderstanding, but clearly not, so I've changed the title to something more straightforward now.
(Submitted title was "Software needs an independent auditor")
This is advertising for an AI product. Slightly more interesting background story than most articles doing so, but still an ad for a product that probably won't actually work.
I have no direct experience with Greptile, but asked about AI code review assistants on a mailing list of engineering leaders I'm on. Several folks suggested greptile.
So, consider this hearsay that it works.
My employer uses greptile and I'm pretty happy with it. Sometimes it can be a bit overzealous but more often than not it catches real issues and gives the author a chance to reply or fix them before another human reviews the PR.
Calling it an ad is just lazy dismissal. Everybody is selling something. If you’re more focused on purity tests than evaluating whether the idea actually works, you’re not critiquing, you’re gatekeeping.