Activity - @simon@simonwillison.net oh vendor performance information......

Zeugs, 1 month ago

@simon oh vendor performance information...
These benchmarks and their data are also in the training data. LLM generally perform worse with alternative formulations of the questions in the benchmarks.
https://arxiv.org/pdf/2402.19450.pdf
GPT4 is the best, but size does not justify the cost/size. GPT3.5 now The "vanilla LLM".
It's the defined normal and a standard you can talk about.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...