I’d be curious to know what effect, if any, this change has on a relatively... - Random

AnnemarieBridy, 16 days ago

I’d be curious to know what effect, if any, this change has on a relatively large LLM’s likelihood of outputting strings of text that are memorized from training data sources.

Meta multi-token prediction makes LLMs up to 3X faster | VentureBeat https://venturebeat.com/ai/metas-new-multi-token-prediction-makes-ai-models-up-to-3x-faster/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

kellogh, 16 days ago

@AnnemarieBridy the way i understood the paper, it wouldn’t change much, but there’s a lot of variables, like the increased data efficiency also means there’s less training data to reference, but theoretically without increasing overfitting (quoting the source)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

paninid, 16 days ago

@kellogh @AnnemarieBridy

I still don’t understand what problem it’s solving.

Which compelling business application use case was facing latency issues that this addresses?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 16 days ago

@paninid @AnnemarieBridy are you serious?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

paninid, 16 days ago

@kellogh @AnnemarieBridy

Yes, which was the product use case that market has the appetite to pay for in which latency and speed was the primary pain-point?

I mean one besides creating 1,500 books a year to self-publish on Amazon.

Like, an actual invoice-generating accounts-payable involving enterprise use case.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 16 days ago

@paninid @AnnemarieBridy latency touches every part of every business case

environmental impact is less

cost is less

interactive apps see drastically better user experience

some applications weren’t possible but are enabled by lower latency

if you don’t understand the user experience impact, try using Groq at 500 tokens/s https://groq.com/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 16 days ago

@paninid @AnnemarieBridy Little’s Law is doing the work here — if you cut latency 1/3 and keep the amount of work the same, you also can cut the number of servers to 1/3

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

paninid, 16 days ago

@kellogh @AnnemarieBridy

For enterprise applications, the most valuable #trainingdata is behind corporate firewalls, not out on the internet.

And if that’s the case, maybe the models don’t need to be large in the first place.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 16 days ago

@paninid @AnnemarieBridy i don’t think that’s a logical step you can make. the largeness gives it the “general” capabilities, where you don’t have to train it for a specific task. most enterprises are using LLMs via RAG, i.e. they have no need to train their own model. one of the benefits of LLMs in general is that model training is left to the people who are best at it, and everyone else just uses databases

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment