This reminds me of when I was trying to do Minecraft style chunking in Bevy. I was in a situation where (instead of doing the not-so-obvious fix) I threw parallelization, compiler optimization, caching, release flags etc. at my project and nothing made it go faster. I could not figure out why it was so slow. Turns out what I was doing was so unoptimized that I might've as well loaded the whole world per frame.
You live and you learn :)