@simon oh vendor performance information...
These benchmarks and their data are also in the training data. LLM generally perform worse with alternative formulations of the questions in the benchmarks. https://arxiv.org/pdf/2402.19450.pdf
GPT4 is the best, but size does not justify the cost/size. GPT3.5 now The "vanilla LLM".
It's the defined normal and a standard you can talk about.