dumpsterlid,

Ah so the ven diagram for cops and LLMs includes more than just “bullshitters” but also now includes “hopelessly rascist”

keepthepace,

I really like MIT Technology Review. This is usually written by people with a deeper understanding of the field than most “scientific” journalists. And this one does not fail either, though I feel it does not make a clear distinction between prompts filtering and actual fine-tuning for alignment. Prompts filtering is rightly qualified as a “filmsy filter” but the fine tuning methods seem to not fix the core racism but to teach the model to conceal it.

Renegade,

Nothing in the article corroborated the claim in the title that human intervention made things worse, just that the problem goes deeper.

keepthepace,

The study they link though has that among their conclusions:

Finally, we show that existing methods for alleviating racial bias in language models such as human feedback training do not mitigate the dialect prejudice, but can exacerbate the discrepancy between covert and overt stereotypes, by teaching language models to superficially conceal the racism that they maintain on a deeper level.

It feels like they have the same problem as hallucinations: The model learns core knowledge during the bas training and is then thought to ignore/invent some more but does not acquire new knowledge.

AbouBenAdhem,

“Feedback training teaches models to consider their racism,” says Valentin Hofmann, a researcher at the Allen Institute for AI and a coauthor on the paper. “But dialect prejudice opens a deeper level.”

Hmm… I think dialect bias is a distinct problem, which may need a separate approach that doesn’t just lump it together with racism and try to eliminate both using the same means.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • artificial_intel@lemmy.ml
  • kavyap
  • tacticalgear
  • Durango
  • DreamBathrooms
  • mdbf
  • magazineikmin
  • thenastyranch
  • Youngstown
  • khanakhh
  • slotface
  • vwfavf
  • everett
  • rosin
  • osvaldo12
  • provamag3
  • modclub
  • GTA5RPClips
  • ethstaker
  • InstantRegret
  • cisconetworking
  • cubers
  • ngwrru68w68
  • tester
  • normalnudes
  • Leos
  • anitta
  • megavids
  • JUstTest
  • All magazines