jeridansky,
@jeridansky@sfba.social avatar

"The race to lead A.I. has become a desperate hunt for the digital data needed ..."

"At Meta .... managers, lawyers & engineers last year discussed buying the publishing house Simon & Schuster to procure long works .... They also conferred on gathering copyrighted data from across the internet, even if that meant facing lawsuits. Negotiating licenses with publishers, artists, musicians & the news industry would take too long, they said."

https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html or https://web.archive.org/web/20240406095041/https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html

madeindex,
@madeindex@mastodon.social avatar

@jeridansky
this is why we use !
=

NatureMC,
@NatureMC@mastodon.online avatar

@madeindex All public Fediverse posts can be found by Google, so they can be used, too. They even scrap pirated e-versions of books that exist only in print. There are not so many secure places. @jeridansky

madeindex,
@madeindex@mastodon.social avatar

@NatureMC @jeridansky
yeah they scrape everything, I think there will be a way for creators to profit at some point, but the internet will change dramatically, soon all website will just be databases for AI if it continues like this.

NatureMC,
@NatureMC@mastodon.online avatar

@madeindex Websites already ARE databases for AI.
How should creators profit when AI companies earn money with their content and artists find less clients?!?

We already have the problem that in the moment when your publisher publishes a book, several AI-generated scam books with your title and even falsified biographies of authors are on the market: https://www.npr.org/2024/03/13/1237888126/growing-number-ai-scam-books-amazon
@jeridansky

madeindex,
@madeindex@mastodon.social avatar

@NatureMC @jeridansky
Very interesting! So all content just gets rewritten and republished. can already detect content and the date of publishing will always be first on the original content uploaded to the internet. I believe there will be legislation e.g. AI content will be automatically marked and AI companies might have to pay to scrape any content similar to the pay per use of music. Also all websites can already block the main AI crawlers from entering & scraping their content

NatureMC,
@NatureMC@mastodon.online avatar

@madeindex I admire your optimism. (Do you know an effective blocking mechanism?) @jeridansky

madeindex,
@madeindex@mastodon.social avatar

@NatureMC @jeridansky
You can block the User Agents via a websites robot.txt file (for example openai officially follows "disallow"), or directly the IP ranges.

https://platform.openai.com/docs/gptbot

The big players (, , ) seem to have agreed to a watermarking proposal for AI generated content:
https://www.theverge.com/2023/7/21/23802274/artificial-intelligence-meta-google-openai-white-house-security-safety

NatureMC,
@NatureMC@mastodon.online avatar

@madeindex Thanks! Well, I personally don't believe them to respect the robot.txt files (and for most websites it comes too late, they are already scraped).

And that article talks only about an "announcement" in the USA.

These big players even don't respect European right! I live in France, where these companies are regularly sentenced to pay millions of euros because they violate EU digital and privacy laws - they pay and shrug. Or try new tricks.
So my optimism is at zero.

@jeridansky

NatureMC,
@NatureMC@mastodon.online avatar

@madeindex They want to mark their content (well, they want to earn money with it). That's not a protection against steeling content from creators before! That steeling and scraping is already done.

When artists tried to fight back with Nightshade and Glaze, they worked on disturbing them: https://www.technologyreview.com/2023/10/23/1082189/data-poisoning-artists-fight-generative-ai/
They are zero interested in copyright and the protection of creators!
@jeridansky

  • All
  • Subscribed
  • Moderated
  • Favorites
  • ai
  • GTA5RPClips
  • magazineikmin
  • InstantRegret
  • everett
  • osvaldo12
  • Youngstown
  • mdbf
  • slotface
  • rosin
  • kavyap
  • Durango
  • ngwrru68w68
  • thenastyranch
  • DreamBathrooms
  • megavids
  • khanakhh
  • Leos
  • cisconetworking
  • ethstaker
  • modclub
  • tester
  • cubers
  • tacticalgear
  • provamag3
  • normalnudes
  • anitta
  • JUstTest
  • lostlight
  • All magazines