I doubt it. Using O_DIRECT is essentially bypassing the buffer cache. Similar to the “cold” experiments where the file is not cached. Mmap is still faster. I have also done experiments on a NVRAM machine with DAX file system (no buffer cache). Mmap is still several times faster.
It is skipping the buffer cache that is true, but, if I understand things correctly, it allows kernel to use user provided buffers directly, thus skipping the copying of the data from kernel to user land. That is why buffers used in O_DIRECT context have to be aligned properly.
It would be fun to run the experiment non the less.
From open(2) man pages on O_DIRECT:
Try to minimize cache effects of the I/O to and from this
file. In general this will degrade performance, but it is
useful in special situations, such as when applications do
their own caching. File I/O is done directly to/from user-
space buffers
Would using O_DIRECT flag result in similar timings for mmap and read/write?