So the old robots.txt tool from google is truly gone, it now redirects to the new report. Quick reminder that if you're missing the ability to test rules & urls, @jammer_volts & I did this tool https://tamethebots.com/tools/robotstxt-checker that uses the official parse and lets you test one or many urls, and export the results
We recently noticed a fair bit of traffic on www.bbc.co.uk & www.bbc.com from a User Agent which identifies itself as "ByteSpider" (& has a @bytedance.com email address).
Lots of docs on the web state it doesn't obey robots.txt but ByteDance have told us it does:
> ...in the robots.txt files > user-agent:Bytespider > Disallow:/
Thought that might be worth documenting as it might be a recent change & several of us searched but found zero docs from ByteDance
Not sure if I totally trust that it will be effective but I did add a robots.txt file and robots meta tag to my website to tryyyyy to block bots crawling my website to feed AI models.
Die New York Times verhandelt schon länger mit OpenAI über Urheberrechtsabgaben. Laut Berichten hat OpenAI für das Training von ChatGPT unautorisiert Materialien der New York Times genutzt. Das kann in mehrfacher Sicht teuer werden.
Wer selbst aktiv werden will, kann bereits jetzt einige wenige #KI-Crawler davon abhalten, zukünftig keine Inhalte der eigenen Webseiten mehr zu verdauen. Dabei bleiben aber noch viele Fragen offen.
PSA: If you're running @writefreely, make sure your server is set up to serve a robots.txt so that you can block bots you don't want to gobble up the contents of your website (looking at you, #ChatGPT).
Something like
location /robots.txt {
alias /complete/path/to/your/robots.txt;
}
Perspectives is a tab that will showcase social media posts in their search results. This will give -- you guessed it -- "perspectives" on current events and other matters as well.
While Google is positioning this generating results from all social media, this also has massive implications for the #Fediverse.
I've been saying for a long time that if we Fediverse developers didn't nail search soon, Google will eat our lunch.
Well, it looks like they've just set down at a table and are studying the menu right now -- because I completely expect that the Fediverse will be present on that Perspective tabs, especially since the Fediverse is now generating 1 billion+ posts each month.
What is the next logical step for Google?
If I were putting on my Google product development hat, I'd push for full text search with near-instant results. This is very easy for Google to do. Their engineers could probably build it fast.
Meanwhile, the Fediverse is practically giving away Fediverse search to Google -- Google is what most people use to find posts on the Fediverse right now.
Are we just going to allow Google to extend their search monopoly into the Fediverse, and without a fight too?
It's time for the Fediverse to learn about Robots.TXT files and add server-level (and perhaps user-level) features that instruct search engines like Google about which profiles are welcome for external indexing, and which are not.