I work in infra. I don’t want an incident to be the time that I find out my cowo...

pockmarked19 · on Dec 7, 2024

Some of the best tests come from taking something mundane and twisting it just a little off the beaten path in a way that LLMs cannot handle. The usual example is the river crossing puzzle which has a standard format and solution but can be modified to make that solution incorrect. For code, an example would be parsing log lines to compile insights but switching it up so that the timestamp means something other than the time the log was collected, etc.

You can completely change the complexity of a bog standard leetcode question with the slightest of modifications, and I’m certain no extant LLM will catch such a nuance if your problem is a very common one.

A fun exercise is to ask chatGPT what the river crossing goat puzzle is and then say “do the same puzzle but with two goats and a cabbage please”. It can’t get it right even with multiple corrections. Might stop working at some point now that I posted this though!

viraptor · on Dec 7, 2024

"tell me about potential issues while doing X", then dig into answers? We may get there one day with LLMs, but currently it would be extremely hard to keep the conversation context and process the longer answers while talking to someone. The voice recognition quality and generating speed would be an issue too if you wanted to monitor the whole conversation.

Currently the cheating mostly aims for coding tasks because they're pretty simple to do and self-contained. Much less for helping with a good dialog.