LChoshen

@LChoshen@sigmoid.social

🥇 #NLProc researcher

🥈 Opinionatedly Summarizing new #ML & #NLP papers

🥉 Good science #scientivism

Now: @ibmresearch phd:@nlphuj

This profile is from a federated server and may be incomplete. Browse more on the original instance.

LChoshen, 6 days ago to llm

Do LLMs learn foundational concepts required to build world models? (less than expected)

We address this question with 🌐🐨EWoK (Elements of World Knowledge)🐨🌐

a flexible cognition-inspired framework to test knowledge across physical and social domains

https://ewok-core.github.io

#llm #llms #evaluation #ml #machinelearning

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

LChoshen, 22 days ago to random

Pretrain to predict the future
At each step the model predicts n-tokens
Performance: 😃
Inference time: ✖️3
Training time: same

MetaAI
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve

https://arxiv.org/abs/2404.19737

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kellogh

LChoshen, 3 months ago to generativeAI

DoRA explores the magnitude and direction and
surpasses LoRA quite significantly

This is done with an empirical finding that I can't wrap my head around

@nvidia
https://arxiv.org/abs/2402.09353
#lora #NLProc #ML #machinelearning #nlp #llm #llms

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kellogh

LChoshen, 3 months ago to ArtificialIntelligence

Happy to share our paper:

Genie🧞: Achieving Human Parity
in Content-Grounded Datasets Generation

was accepted to #ICLR24

From your content
Genie creates content-grounded data
of magical quality ✨
Rivaling human-based datasets!

https://arxiv.org/abs/2401.14367
#data #NLP #nlproc #ML #machinelearning #llm #RAG a

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kellogh

LChoshen, 4 months ago to random

English Code models are better than Chinese
on Chinese
They hallucinate less
They generalize better

If true, this defies our thoughts on LMs as domain experts
https://arxiv.org/abs/2401.10286

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kellogh

LChoshen, 4 months ago to opensource

Crowd-sourcing human feedback for open-source LLMs? 💬🤖

Let's make it happen together! 💪

https://chromewebstore.google.com/detail/sharelm-share-your-chat-c/nldoebkdaiidhceaphmipeclmlcbljmh

With Shachar Don-Yehiya and Omri Abend
#OpenScience #opensource #chatgpt #llm #lllms #data #ml #nlproc #machinelearning

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

LChoshen, 4 months ago

ShareLM is a chrome plugin that makes it easy for you to contribute your own human-model interactions.

The goal -> collecting an ever-growing dataset of conversations, for the benefit of the open-source community 💬🥳

And this is so easy, so no excuses!

https://chromewebstore.google.com/detail/sharelm-share-your-chat-c/nldoebkdaiidhceaphmipeclmlcbljmh

#OpenScience #opensource #chatgpt #llm #lllms #data #ml #nlproc #machinelearning

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kellogh

LChoshen, 5 months ago to modeltrains

#neurips keynote
(with my live jetlagged interpretation)
from
@StableDiffusion
creator:
scaling is not the solution
A keynote to restart the debate #scalemodels
#LLMs #MachineLearning #GPTs
#ML #NLP #nlproc #GPT

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

LChoshen, 5 months ago

@StableDiffusion First, my take, It is not the solution, but remember how many people said it is not a solution, and who (
@OpenAI
\
@ilyasut
) said you need more engineering and more scale
There is a lot to gain even from less appealing ideas
Also from appealing ideas
Extremism is often simplistic

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

LChoshen, 5 months ago

@StableDiffusion The main problem with scaling, fewer players are able to compete!
Me: I keep telling you that! But, we can have other technologies that scale, and also use expertise from the community as scaled and distributed, each evolving the model slightly.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

LChoshen, 5 months ago

@StableDiffusion The second issue with scaling is data (
@blancheminerva can probably link to her great counter argument, couldn't find it)
We would require more data.
Me: and compute and other problems, scaling is hard too. Don't mix easy with no new algorithm (scaling).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

LChoshen, 5 months ago

@StableDiffusion Also more data may come with copyright problems.
(
@shayneredford what are your thoughts on that? The two must be connected? not sure)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

LChoshen, 5 months ago

@StableDiffusion @shayneredford Now we got to the more interesting part for me.
There are hard things to find in the data, there are rare thoughts and ideas that are hard to capture, those will be still rare if scaled
Me: interesting. still, we will get more of those with time, unless training ignores rarities

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ brodriguesco

LChoshen, 6 months ago to ArtificialIntelligence

The language people use when they interact with each other changes over the course of the conversation.

🔍 Will we see a systematic language change along the interaction of human users with a text-to-image model?

#EMNLP23
http://arxiv.org/abs/2311.12131

Shachar Don-Yehia
me
&
Omri Abend

#NLP #NLProc #language #linguistics #ML #midjourney #machinelearning

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ nsaphra

LChoshen, 9 months ago to random

What are the strongest\canonical papers that discuss data quality?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ nsaphra

LChoshen, 11 months ago to random

Larger models are better😱
But...
Can we train smaller models to be better?
Can we learn about language learning?

Our baby👶, babyLM challenge in the
@nytimes
:
https://www.nytimes.com/2023/05/30/science/ai-chatbots-language-learning-models.html
⭐️🌟
@a_stadt @amuuueller @weGotlieb @jhuclsp @EvaPortelance & @sama

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mbollmann

LChoshen, 1 year ago to random

Opposite scaling law: detection of machine-generated text is done better by smaller models

Everyone (outside #NLProc...) is afraid GPT would cheat for them, which pushes for detection methods

https://arxiv.org/abs/2305.09859
#NLProc #ML #machinelearning

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Jigsaw_You

LChoshen, 1 year ago

First the problem, given a text you want to know whether a human wrote it. You've been in NLP lately I am sure a teacher, sister, nephew etc. called and told you they suspect someone handed them a GPT text.
Problem: how can you tell
The approach
Randomly replace words
Then see how much it changed the sentence probability\likelihood

presented by
https://arxiv.org/abs/2301.11305

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ KathyReid

nsaphra, 1 year ago to Stoicism

I accidentally posted something under my 2022 #book thread but it's time to live in the future! So this is officially the beginning of my 2023 book thread!

reply

expand (69)

collapse (69)

report

activity

copy /kbin url

copy original url

open original url

Loading...

LChoshen, 8 months ago

@nsaphra hmm, interesting in the life hacks and relationships hacking context

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...