AnnemarieBridy,
@AnnemarieBridy@mastodon.social avatar

I’d be curious to know what effect, if any, this change has on a relatively large LLM’s likelihood of outputting strings of text that are memorized from training data sources.

Meta multi-token prediction makes LLMs up to 3X faster | VentureBeat https://venturebeat.com/ai/metas-new-multi-token-prediction-makes-ai-models-up-to-3x-faster/

kellogh,
@kellogh@hachyderm.io avatar

@AnnemarieBridy the way i understood the paper, it wouldn’t change much, but there’s a lot of variables, like the increased data efficiency also means there’s less training data to reference, but theoretically without increasing overfitting (quoting the source)

paninid,
@paninid@mastodon.world avatar

@kellogh @AnnemarieBridy

I still don’t understand what problem it’s solving.

Which compelling business application use case was facing latency issues that this addresses?

kellogh,
@kellogh@hachyderm.io avatar

@paninid @AnnemarieBridy are you serious?

paninid,
@paninid@mastodon.world avatar

@kellogh @AnnemarieBridy

Yes, which was the product use case that market has the appetite to pay for in which latency and speed was the primary pain-point?

I mean one besides creating 1,500 books a year to self-publish on Amazon.

Like, an actual invoice-generating accounts-payable involving enterprise use case.

kellogh,
@kellogh@hachyderm.io avatar

@paninid @AnnemarieBridy latency touches every part of every business case

  • environmental impact is less
  • cost is less
  • interactive apps see drastically better user experience
  • some applications weren’t possible but are enabled by lower latency

if you don’t understand the user experience impact, try using Groq at 500 tokens/s https://groq.com/

kellogh,
@kellogh@hachyderm.io avatar

@paninid @AnnemarieBridy Little’s Law is doing the work here — if you cut latency 1/3 and keep the amount of work the same, you also can cut the number of servers to 1/3

paninid,
@paninid@mastodon.world avatar

@kellogh @AnnemarieBridy

For enterprise applications, the most valuable is behind corporate firewalls, not out on the internet.

And if that’s the case, maybe the models don’t need to be large in the first place.

kellogh,
@kellogh@hachyderm.io avatar

@paninid @AnnemarieBridy i don’t think that’s a logical step you can make. the largeness gives it the “general” capabilities, where you don’t have to train it for a specific task. most enterprises are using LLMs via RAG, i.e. they have no need to train their own model. one of the benefits of LLMs in general is that model training is left to the people who are best at it, and everyone else just uses databases

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • ngwrru68w68
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • InstantRegret
  • GTA5RPClips
  • Youngstown
  • everett
  • slotface
  • rosin
  • osvaldo12
  • mdbf
  • kavyap
  • cubers
  • JUstTest
  • modclub
  • normalnudes
  • tester
  • khanakhh
  • Durango
  • ethstaker
  • tacticalgear
  • Leos
  • provamag3
  • anitta
  • cisconetworking
  • megavids
  • lostlight
  • All magazines