The problem with this approach is that any real task requires a lot of internal knowledge (functional and technical) of the system which an outside candidate will not have. The pairing employee will have lots of it though and it is likely the candidate will appear inadequate even if the employee is aware of the internal knowledge difference. It is not straightforward to be productive on an alien code base from the get go.
I can't imagine this to be true for the vast majority of web/sass companies. Even if there is a lot of knowledge, there are so many areas of the code with technical debt that could just be "How would you refactor this code. Here is how it works." or "This is a small error that needs to be handled. Here is how it works. How would you handle it?"
> I can't imagine this to be true for the vast majority of web/sass companies
I entirely disagree with this except for the most trivial things you describe. And for those trivial things, does your company just have a never ending list of these things which are not fixed. Feels like you go through 2 candidates (60 hours) and that backlog is now cleared.
I find it hard to believe that most web/sass companies are not doing the same kinds of queries to the database, consuming stripe APIs the same way, or adding a job to a queue in similar fashion. I think this is true even in the open source world. Any senior dev should be able to pair with someone knowledgeable and provide value, with the exception of very technical code bases. Most web/sass companies do not have very technical code bases.
I would give the same bug to all the candidates for baseline purposes.