#leaderboard - Posts - kbin.social

ErikJonker, 1 month ago to ai

The score of Llama3 70B on the LMSYS leaderboard is impressive. Although it's also clear that the latest GPT-4 is still a lot better. However Llama3 is opensource and freely available and a larger version (400B parameters) is on the way and will be closer to GPT4 with regard to performance on the various benchmarks.
https://chat.lmsys.org/?leaderboard
#AI #GPT4 #LMSYS #Leaderboard #Llama3 #opensource

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stefano, 1 month ago

@ErikJonker That's the point: having control over hosting, even if it means sacrificing some capabilities, can be a game changer for privacy and security reasons.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 1 month ago

@stefano true, especially for (european) governments the privacy and security of models is very important otherwise their use will probably be illegal

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 2 months ago to ai

Interesting , GPT-4-Turbo is on top again, beating Claude3 and GPT-5 hasn't even arrived. At the same time a lot of people actually prefer Claude3 , leaderboards don't tell the whole story probably.
#AI #Leaderboard
https://chat.lmsys.org/?leaderboard

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 2 months ago (edited 2 months ago) to ai

Claude 3 is officially on the top of the leaderbord, although it's just one leaderboard/benchmark and added value always depends on use and context, it's still the end of GPT4 total dominance (unil GPT5 arrives probably). Interesting is also the performance of the Claude 3 Haiku model which is relatively small/cheap.
https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
#leaderboard #Claude3 #GPT4 #AI

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...