liztai,
@liztai@hachyderm.io avatar

Hello
Apparently has changed their privacy policy and now says that they'll scrape everything you post online to train their AI tools.
I even post my online on & my blog and now wonder if this is a bad idea.
They say paywalls could deter the scraping.
What do you think writers can do to protect their content? Or should we just roll over and accept that this is the way things will be from now on?

https://gizmodo.com/google-says-itll-scrape-everything-you-post-online-for-1850601486

ellierenae,

@liztai I may have a workaround for #writers, #Gumroad!

It has a $0+ setting so people still get your work for free, but bots would have to choose their price, enter an email address, click the link inside, and download in order to scrape the text. Like a false paywall! Hopefully this will protect my #writing, and maybe it can protect others too?

I'm not against all uses of #AI, but I'm very much for consent and compensation, and it's hard to fend off the "requests" of a massive tech oligopoly

ShinyBlueThing,
@ShinyBlueThing@dice.camp avatar

@liztai This looks like some new law is going to be made. I suspect the argument is that "anyone can read what's publicly available, and the AI is reading it," but it's going to take some legal wrangling to determine who owns the output if the input is used without consent with the intention of creating derivative works.

fstateaudio,
@fstateaudio@mastodon.sdf.org avatar

@liztai don't roll over, and don't go with the flow. I beg all creative types currently feeling the seeming need to stay with Big Tech for the sake of their creative pursuits to look at what happened with us independent musicians and Spotify etc. Sure, you can find a few examples that have done alright and will defend the system (just like the previous label situation), but the truth is that we've become fodder for the corporate data/money machine. We can do better without them.

liztai,
@liztai@hachyderm.io avatar

@fstateaudio thanks for sharing your thoughts. What can creative types do to resist? What lessons have you learned as a musician?

lispi314,

@liztai We should all stop paying corporations as much as possible.

Fuck their AI, fuck their copyright, for their "intellectual property".

Paywalls are just an attempt at enclosing any and all remaining commons & public goods for mass #enshittification and value extraction.

#AbolishCopyright

retrohondajunki,
@retrohondajunki@mstdn.social avatar

@liztai
Just like being able to limit people and charge money of people for our services we should be able to unequivocally do that with AI. Bots should use the door like everyone else. It should be all powerful and omnipresent.

mintyfresh,
@mintyfresh@mastodon.social avatar

@liztai It's either paywall or be scraped.

Maybe you could play games with Google by publishing posts full of gibberish with no truthfulness but include lots of keywords embedded. Of course you'd have to also post a disclaimer that the post is gibberish. It's just a tiny protest but I guess it helps.

JacquelineJannotta,

@liztai Hmmm. Not surprising. Do we think this possibly means anything written in google docs too? I mean it's not like any of us has enough hours in a lifetime to read the fine print of user agreements... grrrr.

sz_duras,

@liztai i think #writingcommunity that its far too late for this, writers cannot protect their content and probably should not

imtheq,
@imtheq@realsocial.life avatar

@liztai This is unconscionable.

liztai,
@liztai@hachyderm.io avatar

@imtheq Yes, I so agree

gabri,

@liztai
Perhaps a CAPTCHA with a ToS checkbook could work

It's annoying for users but it beats having to pay.

timo21,
@timo21@mastodon.sdf.org avatar

@liztai as I read the article my brain decided to recall the phrase "all your base are belong to us". 🙀

schack,

The worst part is that the eventually will sell us back our own stuff with paid access to their AI tool. We always end up with the short end of the stick when dealing with the internet giants. #Enshittification

rcteske,
@rcteske@mastodon.gamedev.place avatar

@liztai @eff That's probably not legal, right?

liztai,
@liztai@hachyderm.io avatar

@rcteske @eff It is in a gray area

rcteske,
@rcteske@mastodon.gamedev.place avatar

@liztai @eff Yeah, if you have a different license stated for your publicly accessible content (plus robots.txt and any other no-bot measure) and G**gle gobbles it you’d need to get legal advise to enforce the license. They are probably counting on the vast majority of people not being able to do that.

simondassow,

@liztai So basically everyone needs to implement capitalism blockers that return FORBIDDEN to Google et al IP ranges and user agents. Should be doable, especially as a community that keeps the block information updated.

CrackerBarrelGrandma,

@liztai jokes on them because most of my posts are pretty much crap

abhijit,

@liztai this is messed up. Apart from regulations, is there anyway one can prevent this from happening?

vgoller,
@vgoller@nrw.social avatar

@liztai i think the danger of general AI is that it will try to learn everything remotely available. Even before ..
.. anyone read your fiction, AI companies will publish dozens of clones
.. anyone working on a patent with help of AI will find Microsoft or Google file it earlier than yourself
.. anyone working on scientific papers using online tools such as office365 will find publications showing up just before ones paper is really finished under different names
..

funkaspuck,
@funkaspuck@mathstodon.xyz avatar

@liztai do people not understand how their existing search and cache worked? There’s nothing in this change that says they will refuse to honor robots.txt settings. If you cared to keep your data from Google before, this changes nothing. If you didn’t care before, ¯_(ツ)_/¯

skribe,
@skribe@aus.social avatar

@liztai I had my stories scraped from Wattpad, Royal Road, and others. They were reposted (with ads) within hours. Anything on the internet is fair game to some.

Laurie,

@liztai Try:

Medium @medium

They should be helpful.

geraldew,

@liztai that question gets my nerd side looking to the Bruce Schneier tome on my bookshelf and wondering if there's a encryption protocol that would do that.

Or, another way is that each reader supplies their public key and then gets to pull down a personally encrypted copy that only they can decrypt. That lets all your subscribers read your work but only them. For text, the load should be light.

(Of course, each reader could then on-forward so it's not about stopping that.)

Sounds doable.

techviator, (edited )
@techviator@noc.social avatar

@liztai
Maybe something like the PDA access restriction plugin could be implemented to add a Paywall but only for Google IPs, that should allow their indexers to find your website and titles, but limit how much content they can access... maybe a new plugin will have to be created, but technically it is possible.

kadin,

@techviator @liztai I think Google already spot-checks pages to ensure they are returning the same content to Google's crawler that a regular user would see—they definitely used to do this, anyway. Mostly to prevent a whole range of shitty SEO techniques (showing content to Google but only showing ads to humans).

But they certainly have the technical capability to detect anything that shows the crawler significantly different content based on IP or user-agent.

techviator,
@techviator@noc.social avatar

@kadin @liztai
Currently a lot of paywalled websites show up in the Google results, and one of the ways to circumvent the paywall is to use Google Translate as a proxy. I'm suggesting doing it the other way around, apply a paywall for every single known Google IP, it will break Google Translate and other Gservices for your website, but I don't see other ways to avoid them harvesting the data.

kadin,

@techviator @liztai You could, but that will probably drop you out of Google search results, which could be a deal-breaker for many people (though as they make Google Search crappier, maybe less so); more generally, there's no guarantee that their ML crawler will use predictable IP addresses. They have access to a lot of IPs. You might end up blocking Google Fiber users, Project Fi, mobile VPN, trigger Chrome malware detect… could be messy.

Not sure there's a good technical solution.

techviator,
@techviator@noc.social avatar

@kadin @liztai True, but I am not suggesting blocking them, just paywall them, as in offer the first paragraph or so, and then require the user to perform some kind of interaction. Could even use browser's user-agent string to whitelist browsers so that users do not get the paywall, only bots and crawlers.

kadin,

@techviator @liztai Google is capable of detecting if different content is served to Googlebot vs. normal users, as an anti-SEO thing (anti-cloaking). Though the worst they can do is knock you down or out of search results.

But the user-agent string shouldn't be used that way—it's trivially spoofed. (I have wget pretend to be Chrome all the time.) Reverse DNS is slightly better: https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot

But that depends entirely on Google intentionally making it easy to know when it's them.

cendawanita,

@liztai
Idk about paywalls but srsly considering making my blog pw-protected. Haih.

liztai,
@liztai@hachyderm.io avatar

@cendawanita IKR. It seems like big tech seems determined to make everyone hate them this year. Every platform is going nuts over AI and as a result, screwing over the users to get that stupid pot of gold.

sombragris,

@liztai Perhaps, in your blog, limitingit via robots.txt ?

haikushack,

@liztai Unfortunately, I don't think we have much choice. I'm not too worried about AI, though. People will eventually realize that whatever is created via AI has no soul or beauty.

liztai,
@liztai@hachyderm.io avatar

@haikushack I hope that timeline will be a short one :\

haikushack,

@liztai Everything is cyclical. Human beings need to be affected by something to realize the errors of their ways. ;-)

  • All
  • Subscribed
  • Moderated
  • Favorites
  • escribiendo
  • DreamBathrooms
  • mdbf
  • ethstaker
  • magazineikmin
  • GTA5RPClips
  • rosin
  • thenastyranch
  • Youngstown
  • osvaldo12
  • slotface
  • khanakhh
  • kavyap
  • InstantRegret
  • Durango
  • provamag3
  • everett
  • cisconetworking
  • Leos
  • normalnudes
  • cubers
  • modclub
  • ngwrru68w68
  • tacticalgear
  • megavids
  • anitta
  • tester
  • JUstTest
  • lostlight
  • All magazines