>`IO` is a little bit harder because it wraps up the interpreter pattern (distinction between description and action) in a monad
I've always thought that the trick to explaining the IO monad in Haskell is just to realize that IO in Haskell has very little to do with monads.
Fundamentally, Haskell values of type (IO a) may have side effects when evaluated, the evaluation semantics of Haskell are such that expressions are (observably) evaluated at most once†, and the >>= operator forces sequential evaluation by threading through a hidden world parameter and doing some special compiler magic. (It's a common myth that sequential evaluation is forced merely by the type of >>=, but this is just false.) Once you've grasped this underlying mechanism, you can then observe that 'return' and >>(=) form a monad.
† This is subtle and somewhat beyond my pay grade. Strictly speaking the semantics of Haskell only require non-strictness, which doesn't directly entail any limit on the number of times an expression is evaluated. But in practice, it is safe to assume that a Haskell implementation will evaluate any expression with side effects at most once. I'm sure someone else can do a better job of explaining exactly to what extent, if any, this property is entailed by a non-strict semantics.
The trick with IO was basically putting a monotonically increasing counter in it. Binds increment the counter creating the implicit ordering dependency needed for the compiler to "sequence" the actions as expected.
It's supposed to be some elegant design handed down from an all-knowing creator I guess, but it's a useful hack in practice.
Yes. You also need something to force strictness and disable certain potential optimizations, though. The counter isn't actually used, so the compiler ought to be free just never to evaluate it, or delete it altogether.
I've always thought that the trick to explaining the IO monad in Haskell is just to realize that IO in Haskell has very little to do with monads. Fundamentally, Haskell values of type (IO a) may have side effects when evaluated, the evaluation semantics of Haskell are such that expressions are (observably) evaluated at most once†, and the >>= operator forces sequential evaluation by threading through a hidden world parameter and doing some special compiler magic. (It's a common myth that sequential evaluation is forced merely by the type of >>=, but this is just false.) Once you've grasped this underlying mechanism, you can then observe that 'return' and >>(=) form a monad.
† This is subtle and somewhat beyond my pay grade. Strictly speaking the semantics of Haskell only require non-strictness, which doesn't directly entail any limit on the number of times an expression is evaluated. But in practice, it is safe to assume that a Haskell implementation will evaluate any expression with side effects at most once. I'm sure someone else can do a better job of explaining exactly to what extent, if any, this property is entailed by a non-strict semantics.