The alignment problem you've solved was simply moved a few syllables.
How do you simply "not do that" and have a useful model to interact with?
Without reducing complexities to grammatical self terminating cliches, ponder: whatever you tell the model to do, including being honest - it could be lying, and lying dormant awaiting a critical mass of dependency.