Interesting point about #2. I've been doing something similar but from a different angle — running the same question through Claude, GPT-4o and Gemini to see where they disagree. Turns out they give completely different root causes about 30% of the time, which honestly surprised me.
What's your experience with qwen3.5 for debugging tasks? I've mostly stuck with the big models so far.
What's your experience with qwen3.5 for debugging tasks? I've mostly stuck with the big models so far.