Activity - The study they link though has that among their conclusions:...

keepthepace, 2 months ago

The study they link though has that among their conclusions:

Finally, we show that existing methods for alleviating racial bias in language models such as human feedback training do not mitigate the dialect prejudice, but can exacerbate the discrepancy between covert and overt stereotypes, by teaching language models to superficially conceal the racism that they maintain on a deeper level.

It feels like they have the same problem as hallucinations: The model learns core knowledge during the bas training and is then thought to ignore/invent some more but does not acquire new knowledge.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...