Hello #WritingCommunity
Apparently #Google has changed their privacy policy and now says that they'll scrape everything you post online to train their AI tools.
I even post my #Fiction online on #Substack & my #Wordpress blog and now wonder if this is a bad idea.
They say paywalls could deter the scraping.
What do you think writers can do to protect their content? Or should we just roll over and accept that this is the way things will be from now on?
It has a $0+ setting so people still get your work for free, but bots would have to choose their price, enter an email address, click the link inside, and download in order to scrape the text. Like a false paywall! Hopefully this will protect my #writing, and maybe it can protect others too?
I'm not against all uses of #AI, but I'm very much for consent and compensation, and it's hard to fend off the "requests" of a massive tech oligopoly
@liztai This looks like some new law is going to be made. I suspect the argument is that "anyone can read what's publicly available, and the AI is reading it," but it's going to take some legal wrangling to determine who owns the output if the input is used without consent with the intention of creating derivative works.
@liztai don't roll over, and don't go with the flow. I beg all creative types currently feeling the seeming need to stay with Big Tech for the sake of their creative pursuits to look at what happened with us independent musicians and Spotify etc. Sure, you can find a few examples that have done alright and will defend the system (just like the previous label situation), but the truth is that we've become fodder for the corporate data/money machine. We can do better without them.
@liztai
Just like being able to limit people and charge money of people for our services we should be able to unequivocally do that with AI. Bots should use the door like everyone else. It should be all powerful and omnipresent.
Maybe you could play games with Google by publishing posts full of gibberish with no truthfulness but include lots of keywords embedded. Of course you'd have to also post a disclaimer that the post is gibberish. It's just a tiny protest but I guess it helps.
@liztai Hmmm. Not surprising. Do we think this possibly means anything written in google docs too? I mean it's not like any of us has enough hours in a lifetime to read the fine print of user agreements... grrrr.
The worst part is that the eventually will sell us back our own stuff with paid access to their AI tool. We always end up with the short end of the stick when dealing with the internet giants. #Enshittification
@liztai@eff Yeah, if you have a different license stated for your publicly accessible content (plus robots.txt and any other no-bot measure) and G**gle gobbles it you’d need to get legal advise to enforce the license. They are probably counting on the vast majority of people not being able to do that.
@liztai So basically everyone needs to implement capitalism blockers that return FORBIDDEN to Google et al IP ranges and user agents. Should be doable, especially as a community that keeps the block information updated.
@liztai i think the danger of general AI is that it will try to learn everything remotely available. Even before ..
.. anyone read your fiction, AI companies will publish dozens of clones
.. anyone working on a patent with help of AI will find Microsoft or Google file it earlier than yourself
.. anyone working on scientific papers using online tools such as office365 will find publications showing up just before ones paper is really finished under different names
..
@liztai do people not understand how their existing search and cache worked? There’s nothing in this change that says they will refuse to honor robots.txt settings. If you cared to keep your data from Google before, this changes nothing. If you didn’t care before, ¯_(ツ)_/¯
@liztai I had my stories scraped from Wattpad, Royal Road, and others. They were reposted (with ads) within hours. Anything on the internet is fair game to some.
@liztai that question gets my nerd side looking to the Bruce Schneier tome on my bookshelf and wondering if there's a encryption protocol that would do that.
Or, another way is that each reader supplies their public key and then gets to pull down a personally encrypted copy that only they can decrypt. That lets all your subscribers read your work but only them. For text, the load should be light.
(Of course, each reader could then on-forward so it's not about stopping that.)
@liztai
Maybe something like the PDA access restriction plugin could be implemented to add a Paywall but only for Google IPs, that should allow their indexers to find your website and titles, but limit how much content they can access... maybe a new plugin will have to be created, but technically it is possible.
@techviator@liztai I think Google already spot-checks pages to ensure they are returning the same content to Google's crawler that a regular user would see—they definitely used to do this, anyway. Mostly to prevent a whole range of shitty SEO techniques (showing content to Google but only showing ads to humans).
But they certainly have the technical capability to detect anything that shows the crawler significantly different content based on IP or user-agent.
@kadin@liztai
Currently a lot of paywalled websites show up in the Google results, and one of the ways to circumvent the paywall is to use Google Translate as a proxy. I'm suggesting doing it the other way around, apply a paywall for every single known Google IP, it will break Google Translate and other Gservices for your website, but I don't see other ways to avoid them harvesting the data.
@techviator@liztai You could, but that will probably drop you out of Google search results, which could be a deal-breaker for many people (though as they make Google Search crappier, maybe less so); more generally, there's no guarantee that their ML crawler will use predictable IP addresses. They have access to a lot of IPs. You might end up blocking Google Fiber users, Project Fi, mobile VPN, trigger Chrome malware detect… could be messy.
@kadin@liztai True, but I am not suggesting blocking them, just paywall them, as in offer the first paragraph or so, and then require the user to perform some kind of interaction. Could even use browser's user-agent string to whitelist browsers so that users do not get the paywall, only bots and crawlers.
@techviator@liztai Google is capable of detecting if different content is served to Googlebot vs. normal users, as an anti-SEO thing (anti-cloaking). Though the worst they can do is knock you down or out of search results.
@cendawanita IKR. It seems like big tech seems determined to make everyone hate them this year. Every platform is going nuts over AI and as a result, screwing over the users to get that stupid pot of gold.
@liztai Unfortunately, I don't think we have much choice. I'm not too worried about AI, though. People will eventually realize that whatever is created via AI has no soul or beauty.
Add comment