ErikJonker, to ai
@ErikJonker@mastodon.social avatar

The score of Llama3 70B on the LMSYS leaderboard is impressive. Although it's also clear that the latest GPT-4 is still a lot better. However Llama3 is opensource and freely available and a larger version (400B parameters) is on the way and will be closer to GPT4 with regard to performance on the various benchmarks.
https://chat.lmsys.org/?leaderboard

stefano,
@stefano@bsd.cafe avatar

@ErikJonker That's the point: having control over hosting, even if it means sacrificing some capabilities, can be a game changer for privacy and security reasons.

ErikJonker,
@ErikJonker@mastodon.social avatar

@stefano true, especially for (european) governments the privacy and security of models is very important otherwise their use will probably be illegal

ErikJonker, to ai
@ErikJonker@mastodon.social avatar

Interesting , GPT-4-Turbo is on top again, beating Claude3 and GPT-5 hasn't even arrived. At the same time a lot of people actually prefer Claude3 , leaderboards don't tell the whole story probably.

https://chat.lmsys.org/?leaderboard

ErikJonker, (edited ) to ai
@ErikJonker@mastodon.social avatar

Claude 3 is officially on the top of the leaderbord, although it's just one leaderboard/benchmark and added value always depends on use and context, it's still the end of GPT4 total dominance (unil GPT5 arrives probably). Interesting is also the performance of the Claude 3 Haiku model which is relatively small/cheap.
https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

  • All
  • Subscribed
  • Moderated
  • Favorites
  • provamag3
  • InstantRegret
  • mdbf
  • ethstaker
  • magazineikmin
  • GTA5RPClips
  • rosin
  • thenastyranch
  • Youngstown
  • osvaldo12
  • slotface
  • khanakhh
  • kavyap
  • DreamBathrooms
  • JUstTest
  • Durango
  • everett
  • cisconetworking
  • Leos
  • normalnudes
  • cubers
  • modclub
  • ngwrru68w68
  • tacticalgear
  • megavids
  • anitta
  • tester
  • lostlight
  • All magazines