Why is that misleading? It shows Gemini with CoT is the best known combination of prompt and LLM on MMLU.
They simply compare the prompting strategies that work best with each model. Otherwise it would be just a comparison of their response to specific prompt engineering.
They simply compare the prompting strategies that work best with each model. Otherwise it would be just a comparison of their response to specific prompt engineering.