All those who doubted me when I said that BingÊŒs index was in the 1 to 2 milliards âŠ
This is where Inktomi was over two decades ago, and itÊŒs a fraction of the size of MojeekÊŒs.
#AI#GenerativeAI#Search#SearchEngines: "There seem to be clear indications of a novelty factor at work. And while novelty in and of itself is not a bad thing, if it isnât followed with a consistent behavior change, we canât really call it a trend.
Take the above Bing.com numbers for instance. If we credit the inclusion of AI search tools on the platform as the cause of the unique user bump, it would seemingly serve to solidify the predicted 25% drop. Yet when we consulted our panel data further, we found that only between 4% and 9% of users used Bing Chat (their AI agent) in any given month during 2023. Whatâs more, of those that did use it, only two to four searches were conducted over the ensuing month.
#AI#GenerativeAI#SEO#Reddit#Google#Search#SearchEngines: "For years, people who have found Google search frustrating have been adding âRedditâ to the end of their search queries. This practice is so common that Google even acknowledged the phenomenon in a post announcing that it will be scraping Reddit posts to train its AI. And so, naturally, there are now services that will poison Reddit threads with AI-generated posts designed to promote products.
A service called ReplyGuy advertises itself as âthe AI that plugs your product on Redditâ and which automatically âmentions your product in conversations naturally.â Examples on the site show two different Redditors being controlled by AI posting plugs for a text-to-voice product called âAnySpeechâ and a bot writing a long comment about a debt consolidation program called Debt Freedom Now." https://www.404media.co/ai-is-poisoning-reddit-to-promote-products-and-game-google-with-parasite-seo/
What is everyone using for a search engine these days? I was bouncing between Duck Duck Go and Kagi - but Kagi is now in cahoots with Brave which I'm not a fan of so that's out. Duck Duck Go is ok, but curious what else is out there.
It scrapes results from Google brave Bing duck duck etc. minus all the frilly bits like suggested answers that are pissing folks off then relevance ranks those results.
#AI#Search#SearchEngines#Google: "Google is considering charging for new âpremiumâ features powered by generative artificial intelligence, in what would be the biggest ever shake-up of its search business.
The proposed revamp to its cash cow search engine would mark the first time the company has put any of its core product behind a paywall, and shows it is still grappling with a technology that threatens its advertising business, almost a year and a half after the debut of ChatGPT.
Google is looking at options including adding certain AI-powered search features to its premium subscription services, which already offer access to its new Gemini AI assistant in Gmail and Docs, according to three people with knowledge of its plans.
#AI#GenerativeAI#Search#SearchEngines#LLMs#Hallucinations: "A couple of days ago, Wharton professor Ethan Mollick, who studies the effects of AI and often writes about his own uses of it, summarized (on X) something that has become clear over the past year: âTo most users, it isn't clear that LLMs don't work like search engines. This can lead to real issues when using them for vital, changing information. Frontier models make less mistakes, but they still make them. Companies need to do more to address users being misled by LLMs.â
In an age of LLMs, is it time to reconsider human-edited web directories?
Back in the early-to-mid '90s, one of the main ways of finding anything on the web was to browse through a web directory.
These directories generally had a list of categories on their front page. News/Sport/Entertainment/Arts/Technology/Fashion/etc.
Each of those categories had subcategories, and sub-subcategories that you clicked through until you got to a list of websites. These lists were maintained by actual humans.
Typically, these directories also had a limited web search that would crawl through the pages of websites listed in the directory.
Lycos, Excite, and of course Yahoo all offered web directories of this sort.
(EDIT: I initially also mentioned AltaVista. It did offer a web directory by the late '90s, but this was something it tacked on much later.)
By the late '90s, the standard narrative goes, the web got too big to index websites manually.
Google promised the world its algorithms would weed out the spam automatically.
And for a time, it worked.
But then SEO and SEM became a multi-billion-dollar industry. The spambots proliferated. Google itself began promoting its own content and advertisers above search results.
And now with LLMs, the industrial-scale spamming of the web is likely to grow exponentially.
My question is, if a lot of the web is turning to crap, do we even want to search the entire web anymore?
Do we really want to search every single website on the web?
Or just those that aren't filled with LLM-generated SEO spam?
Or just those that don't feature 200 tracking scripts, and passive-aggressive privacy warnings, and paywalls, and popovers, and newsletters, and increasingly obnoxious banner ads, and dark patterns to prevent you cancelling your "free trial" subscription?
At some point, does it become more desirable to go back to search engines that only crawl pages on human-curated lists of trustworthy, quality websites?
And is it time to begin considering what a modern version of those early web directories might look like?
@ajsadauskas@degoogle I guess the problem though is how you make sure they are actually maintained by a human acting in good faith. The way community Facebook groups meant to be for this kinda thing get spammed by likely fake businesses doesnât give me hope
I used them and contributed to links as well - it was quite a rush to see a contribution accepted because it felt like you were adding to the great summary of the Internet. At least until the size of the Internet made it impossible to create a user-submitted, centrally-approved index of the Net. And so that all went away.
What seemed like a better approach was social bookmarking, like del.icio.us, where everyone added, tagged and shared bookmarks. The tagging basically crowd-sourced the categorisation and meant you could browse, search and follow links by tags or by the users. It created a folksonomy (thanks for the reminder Wikipedia) and, crucially, provided context to Web content (I think weâre still talking about the Semantic Web to some degree but perhaps AI is doing this better). Then after a long series of takeovers, it all went away. The spirit lives on in Pinterest and Flipboard to some degree but as this was all about links it was getting at the raw bones of the Internet.
Iâve been usingPostmarks a single user social bookmarking tool but it isnât really the same as del.icio.us because part of what made it work was the easy discoverablity and sharing of other peopleâs links. So what we need is, as I named my implementation of Postmarks, Relicious - pretty much del.icio.us but done Fediverse style so you sign up to instances with other people (possibly run on shared interests or region, so you could have a body modification instance or a German one, for example) and get bookmarking. If it works and people find it useful a FOSS Fediverse implementation would be very difficult to make go away.
#Media#News#Journalism#SEO#AdTech#Search#SearchEngines: "[F]ew network effects have damaged the news more than Search Engine Optimization, where the allure of traffic from search engines like Google has led publishers to create content not with the goal of serving their audience, but attracting the spurious traffic that one might get from those searching "when does the Super Bowl start."
The result is a media industry in crisis. Desperate executives and disconnected editors twist their reporters' coverage to please Google's algorithms as a means of improving traffic to please advertisers' algorithms, creating content that looks and sounds the same as other outlets, which in turn leads to layoffs as profits fail to increase, which in turn normalizes and weakens the content created by the outlet. This is largely a result of those in power not actually consuming or producing any of the product that makes the outlet money, only understanding the business as a series of symbols that at some point create revenue, ostensibly from the written word and video.
When you make decisions for a website or company that produces words that it sells for money based not on the writing, but on how to twist that writing to make it "more profitable," the conclusion is always inevitable â the creation of identical-looking slop that people only read by accident, and the slow asphyxiation of journalism and culture.
It almost always leads to overstaffing and mismanagement, too. Any form of creative media requires an understanding that building an audience takes time and money, and that one cannot just spend a bunch of money to make that happen. But these craven idiots are as rotten as the rest of the economy (...) The media is being run by people that do not see value in people or the things that they create, but the metrics that come as a result."
#AI#GenerativeAI#Web#Search#SearchEngines#Chatbots: "The Browser Companyâs new app lets you ask semantic questions to a chatbot, which then summarizes live internet results in a simulation of a conversation. Which is great, in theory, as long as you donât have any concerns about whether what itâs saying is accurate, donât care where that information is coming from or who wrote it, and donât think through the long-term feasibility of a product like this even a little bit. Or, as Dash put it, âItâs the parasite that kills the host.â
The base logic of something like Arcâs AI search doesnât even really make sense. As Engadget recently asked in their excellent teardown of Arcâs AI search pivot, âWho makes money when AI reads the internet for us?â But letâs take a step even further here. Why even bother making new websites if no oneâs going to see them? At least with the Web3 hype cycle, there were vague platitudes about ownership and financial freedom for content creators. To even entertain the idea of building AI-powered search engines means, in some sense, that you are comfortable with eventually being the reason those creators no longer exist. It is an undeniably apocalyptic project, but not just for the web as we know it, but also your own product."
"AI search is about summarizing web results so you don't have to click links and read the pages yourself.
If that's the future of the web, who the fuck is going to write those pages that the summarizer summarizes? What is the incentive, the business-model, the rational explanation for predicting a world in which millions of us go on writing web-pages, when the gatekeepers to the web have promised to rig the game so that no one will ever visit those pages, or read what we've written there, or even know it was us who wrote the underlying material the summarizer just summarized?
If we stop writing the web, AIs will have to summarize each other, forming an inhuman centipede of botshit-ingestion. This is bad news, because there's pretty solid mathematical evidence that training a bot on botshit makes it absolutely useless. Or, as the authors of the paper â including the eminent cryptographer Ross Anderson â put it, "using model-generated content in training causes irreversible defects":" https://pluralistic.net/2024/02/23/gazeteer/#out-of-cycle