Hello #WritingCommunity... - Escribiendo

liztai, 11 months ago

Hello #WritingCommunity
Apparently #Google has changed their privacy policy and now says that they'll scrape everything you post online to train their AI tools.
I even post my #Fiction online on #Substack & my #Wordpress blog and now wonder if this is a bad idea.
They say paywalls could deter the scraping.
What do you think writers can do to protect their content? Or should we just roll over and accept that this is the way things will be from now on?

https://gizmodo.com/google-says-itll-scrape-everything-you-post-online-for-1850601486

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dgoldsmith, gvwilson, AdaraAstin, moonchildontheblueside +9 more

Image

Image alternative text

ellierenae, 11 months ago

@liztai I may have a workaround for #writers, #Gumroad!

It has a $0+ setting so people still get your work for free, but bots would have to choose their price, enter an email address, click the link inside, and download in order to scrape the text. Like a false paywall! Hopefully this will protect my #writing, and maybe it can protect others too?

I'm not against all uses of #AI, but I'm very much for consent and compensation, and it's hard to fend off the "requests" of a massive tech oligopoly

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ weirdwriter

ShinyBlueThing, 11 months ago

@liztai This looks like some new law is going to be made. I suspect the argument is that "anyone can read what's publicly available, and the AI is reading it," but it's going to take some legal wrangling to determine who owns the output if the input is used without consent with the intention of creating derivative works.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

fstateaudio, 11 months ago

@liztai don't roll over, and don't go with the flow. I beg all creative types currently feeling the seeming need to stay with Big Tech for the sake of their creative pursuits to look at what happened with us independent musicians and Spotify etc. Sure, you can find a few examples that have done alright and will defend the system (just like the previous label situation), but the truth is that we've become fodder for the corporate data/money machine. We can do better without them.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

liztai, 11 months ago

@fstateaudio thanks for sharing your thoughts. What can creative types do to resist? What lessons have you learned as a musician?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lispi314, 11 months ago

@liztai We should all stop paying corporations as much as possible.

Fuck their AI, fuck their copyright, for their "intellectual property".

Paywalls are just an attempt at enclosing any and all remaining commons & public goods for mass #enshittification and value extraction.

#AbolishCopyright

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

retrohondajunki, 11 months ago

@liztai
Just like being able to limit people and charge money of people for our services we should be able to unequivocally do that with AI. Bots should use the door like everyone else. It should be all powerful and omnipresent.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mintyfresh, 11 months ago

@liztai It's either paywall or be scraped.

Maybe you could play games with Google by publishing posts full of gibberish with no truthfulness but include lots of keywords embedded. Of course you'd have to also post a disclaimer that the post is gibberish. It's just a tiny protest but I guess it helps.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

JacquelineJannotta, 11 months ago

@liztai Hmmm. Not surprising. Do we think this possibly means anything written in google docs too? I mean it's not like any of us has enough hours in a lifetime to read the fine print of user agreements... grrrr.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sz_duras, 11 months ago

@liztai i think #writingcommunity that its far too late for this, writers cannot protect their content and probably should not

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

imtheq, 11 months ago

@liztai This is unconscionable.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

liztai, 11 months ago

@imtheq Yes, I so agree

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

gabri, 11 months ago

@liztai
Perhaps a CAPTCHA with a ToS checkbook could work

It's annoying for users but it beats having to pay.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

timo21, 11 months ago

@liztai as I read the article my brain decided to recall the phrase "all your base are belong to us". 🙀

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

schack, 11 months ago

The worst part is that the eventually will sell us back our own stuff with paid access to their AI tool. We always end up with the short end of the stick when dealing with the internet giants. #Enshittification

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rcteske, 11 months ago

@liztai @eff That's probably not legal, right?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

liztai, 11 months ago

@rcteske @eff It is in a gray area

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rcteske, 11 months ago

@liztai @eff Yeah, if you have a different license stated for your publicly accessible content (plus robots.txt and any other no-bot measure) and G**gle gobbles it you’d need to get legal advise to enforce the license. They are probably counting on the vast majority of people not being able to do that.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simondassow, 11 months ago

@liztai So basically everyone needs to implement capitalism blockers that return FORBIDDEN to Google et al IP ranges and user agents. Should be doable, especially as a community that keeps the block information updated.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

CrackerBarrelGrandma, 11 months ago

@liztai jokes on them because most of my posts are pretty much crap

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

abhijit, 11 months ago

@liztai this is messed up. Apart from regulations, is there anyway one can prevent this from happening?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

vgoller, 11 months ago

@liztai i think the danger of general AI is that it will try to learn everything remotely available. Even before ..
.. anyone read your fiction, AI companies will publish dozens of clones
.. anyone working on a patent with help of AI will find Microsoft or Google file it earlier than yourself
.. anyone working on scientific papers using online tools such as office365 will find publications showing up just before ones paper is really finished under different names
..

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

funkaspuck, 11 months ago

@liztai do people not understand how their existing search and cache worked? There’s nothing in this change that says they will refuse to honor robots.txt settings. If you cared to keep your data from Google before, this changes nothing. If you didn’t care before, ¯_(ツ)_/¯

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

skribe, 11 months ago

@liztai I had my stories scraped from Wattpad, Royal Road, and others. They were reposted (with ads) within hours. Anything on the internet is fair game to some.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Laurie, 11 months ago

@liztai Try:

Medium @medium

They should be helpful.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

geraldew, 11 months ago

@liztai that question gets my nerd side looking to the Bruce Schneier tome on my bookshelf and wondering if there's a encryption protocol that would do that.

Or, another way is that each reader supplies their public key and then gets to pull down a personally encrypted copy that only they can decrypt. That lets all your subscribers read your work but only them. For text, the load should be light.

(Of course, each reader could then on-forward so it's not about stopping that.)

Sounds doable.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

techviator, 11 months ago (edited 11 months ago)

@liztai
Maybe something like the PDA access restriction plugin could be implemented to add a Paywall but only for Google IPs, that should allow their indexers to find your website and titles, but limit how much content they can access... maybe a new plugin will have to be created, but technically it is possible.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kadin, 11 months ago

@techviator @liztai I think Google already spot-checks pages to ensure they are returning the same content to Google's crawler that a regular user would see—they definitely used to do this, anyway. Mostly to prevent a whole range of shitty SEO techniques (showing content to Google but only showing ads to humans).

But they certainly have the technical capability to detect anything that shows the crawler significantly different content based on IP or user-agent.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

techviator, 11 months ago

@kadin @liztai
Currently a lot of paywalled websites show up in the Google results, and one of the ways to circumvent the paywall is to use Google Translate as a proxy. I'm suggesting doing it the other way around, apply a paywall for every single known Google IP, it will break Google Translate and other Gservices for your website, but I don't see other ways to avoid them harvesting the data.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kadin, 11 months ago

@techviator @liztai You could, but that will probably drop you out of Google search results, which could be a deal-breaker for many people (though as they make Google Search crappier, maybe less so); more generally, there's no guarantee that their ML crawler will use predictable IP addresses. They have access to a lot of IPs. You might end up blocking Google Fiber users, Project Fi, mobile VPN, trigger Chrome malware detect… could be messy.

Not sure there's a good technical solution.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

techviator, 11 months ago

@kadin @liztai True, but I am not suggesting blocking them, just paywall them, as in offer the first paragraph or so, and then require the user to perform some kind of interaction. Could even use browser's user-agent string to whitelist browsers so that users do not get the paywall, only bots and crawlers.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kadin, 11 months ago

@techviator @liztai Google is capable of detecting if different content is served to Googlebot vs. normal users, as an anti-SEO thing (anti-cloaking). Though the worst they can do is knock you down or out of search results.

But the user-agent string shouldn't be used that way—it's trivially spoofed. (I have wget pretend to be Chrome all the time.) Reverse DNS is slightly better: https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot

But that depends entirely on Google intentionally making it easy to know when it's them.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

cendawanita, 11 months ago

@liztai
Idk about paywalls but srsly considering making my blog pw-protected. Haih.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

liztai, 11 months ago

@cendawanita IKR. It seems like big tech seems determined to make everyone hate them this year. Every platform is going nuts over AI and as a result, screwing over the users to get that stupid pot of gold.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sombragris, 11 months ago

@liztai Perhaps, in your blog, limitingit via robots.txt ?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

haikushack, 11 months ago

@liztai Unfortunately, I don't think we have much choice. I'm not too worried about AI, though. People will eventually realize that whatever is created via AI has no soul or beauty.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

liztai, 11 months ago

@haikushack I hope that timeline will be a short one :\

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

haikushack, 11 months ago

@liztai Everything is cyclical. Human beings need to be affected by something to realize the errors of their ways. ;-)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment