Visweshyc's comments

Visweshyc · 2026-03-20T07:26:33 1773991593

We see this as different from review. The system generates tests to catch second-order effects and executes them against the live application to expose bugs

Visweshyc · 2026-03-20T06:32:40 1773988360

Good point. To keep the regression tests reliable as the app evolves, we run a reliability cascade. First, we generate and execute deterministic Playwright from the codebase. If execution fails then we fall back to DOM and aria tree. If that still fails, we fall back to vision agents that verify what the user actually sees before flagging a drift in the application behavior

Visweshyc · 2026-03-19T20:54:55 1773953695

We evaluated test generation using Claude code and our purpose built harness and measured the quality of tests in catching the unknown unknowns. We noticed Claude Code misses the second order effects that actually break applications. You also need infrastructure to execute the tests - browser fleets, ephemeral environments, data seeding need to be handled

Visweshyc · 2026-03-19T18:23:21 1773944601

The system focuses on going beyond the happy path and generating edge case tests that try to break the application. For example, a Grafana PR added visual drag feedback to query cards. The system came up with an edge case like - does drag feedback still work when there's only one card in the list, with nothing to reorder against?

Visweshyc · 2026-03-19T18:10:34 1773943834

Thanks! To execute these tests reliably you would need custom browser fleets, ephemeral environments, data seeding and device farms

mikestorrent · 2026-03-20T05:06:05 1773983165

If that's what you guys are bringing, you should put that more up front; focus on making it clear you're providing ingredients that Claude et al will not be providing on their own without Real Actual Software to do it.

Visweshyc · 2026-03-20T07:21:23 1773991283

Fair feedback. Will make that clearer. Appreciate it

Visweshyc · 2026-03-19T17:31:05 1773941465

Yes we currently support web apps but plan to extend the foundation to test mobile applications on device emulators

Visweshyc · 2026-03-19T17:19:13 1773940753

Thanks! We believe executing the scenarios and showing what actually broke closes the loop

Visweshyc · 2026-03-19T17:13:06 1773940386

Thanks for the feedback! - Agreed that the form factor can be condensed with a link to detailed information - With the codebase understanding, backend is where we are looking to expand and provide value - The intelligence of the models does lay out the foundation but combining the strength of these models unlocks a system of specialized agents that each reason about the codebase differently to catch the unknown unknowns