dentangle,
@dentangle@chaos.social avatar

All Your Base Are Belong to LLM

"The output from an LLM is a derivative work of the data used to train the LLM.

If we fail to recognise this, or are unable to uphold this in law, copyright (and copyleft on which it depends) is dead. Copyright will still be used against us by corporations, but its utility to FOSS to preserve freedom is gone."

https://blog.brettsheffield.com/all-your-base-are-belong-to-llm

helgztech,
@helgztech@fosstodon.org avatar

@dentangle copyright already only serves those with the wherewithal to pursue breaches - i.e. $$$ to lawer up in a European court.

dentangle,
@dentangle@chaos.social avatar

@helgztech True. This is why the work of organisations like @conservancy is so important. Copyleft requires some level of enforcement.

mishari,
@mishari@floss.social avatar

@dentangle I don't think it's that simple. I was reading a commentary that says with model sizes, it is very unlikely a single byte of the original code is stored in the model in any meaningful way.

I propose we need new thinking about all of this.

dentangle,
@dentangle@chaos.social avatar

@mishari "We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT"

https://arxiv.org/abs/2311.17035

  • All
  • Subscribed
  • Moderated
  • Favorites
  • foss
  • DreamBathrooms
  • magazineikmin
  • tacticalgear
  • InstantRegret
  • ngwrru68w68
  • Durango
  • Youngstown
  • slotface
  • mdbf
  • rosin
  • PowerRangers
  • kavyap
  • thenastyranch
  • vwfavf
  • Leos
  • hgfsjryuu7
  • cisconetworking
  • osvaldo12
  • everett
  • ethstaker
  • GTA5RPClips
  • khanakhh
  • modclub
  • tester
  • cubers
  • normalnudes
  • anitta
  • provamag3
  • All magazines