This does not address the issue raised in iLoveOncall's third paragraph: "the same comment can be a nitpick on one CR but crucial on another..." In "attempt 2", you say that "the LLMs judgment of its own output was nearly random", which raises questions that go well beyond just nitpicking, up to that of whether the current state of the art in LLM code review is fit for much more than ticking the box that says "yes, we are doing code review."