Yes. You can, with effort, condition it to respond sensibly with phrases like “I...

Yes. You can, with effort, condition it to respond sensibly with phrases like “I’m sorry, I don’t know how to reverse strings,” or “I’m sorry, I can’t do any math calculation that a human couldn’t do in their head.” But in doing so you damage its ability to do some tasks it’s actually capable of, e.g. reciting a memorized answer to “What is the fourth root of 625?” Its memorization abilities are insane: It seems to know, for example, the exact MD5 hashes of all single-character alphanumeric strings. Much of the arithmetic it knows is probably similarly memorized, and it’s hard to clarify for it what aspects of that memory are safe to use.

The initial problem that got me interested in GPT-3 is suppressing confabulated answers to the Hofstadter-Bender questions published in The Economist. I eventually found an apparent solution but I’m yet to carefully validate it: https://twitter.com/goodside/status/1556459121834168320?s=21...