The argument for lexers has nothing to do with machine instructions: it has to do with algorithmic performance. Grammars tend to have time complexity that is greater than linear with a moderate constant (and a parser combinator library, which is really just a lightweight programming abstraction making it easier to develop by-hand recursive descent parsers, normally doesn't try to help with this problem), whereas a good lexer tends to have linear time complexity with a nearly zero constant. If you can separate your compilation phase into "first run a lexer, then run a grammar", you can parse much longer files (not just constantly longer, but asymptotically longer) in the same amount of time. There is no fundamental reason to separate these phases, however, and it has minimal effect on the resulting compiler: numerous compilers, even numerous compiler generators, have these phases combined into a single grammar step, with the tokens now being effectively characters. (There are also engines that try to slide between the two levels, using an algorithm more similar to a lexer at the base of your grammar, but still letting you define these psuedo-tokens in the unified grammar as rules.)
I've personally found that even when working with parser generators or handwritten parsers that don't need the separation it still helps to consider them separate. Having a stable and solid set of fundamental tokens makes the resulting language easier to understand. Whenever I see a language (usually made with a PEG generator) that blurs these lines everything feels very shifty. Yes I know these are very touchy-feely attributes I'm describing, and you can obviously avoid them without that separation as well, but these things are important as well.
It's an interesting accident, to me at least, that this separation turned out to be both optimal and useful.