In an age of LLMs, is it time to reconsider human-edited web directories?
Back in the early-to-mid '90s, one of the main ways of finding anything on the web was to browse through a web directory.
These directories generally had a list of categories on their front page. News/Sport/Entertainment/Arts/Technology/Fashion/etc.
Each of those categories had subcategories, and sub-subcategories that you clicked through until you got to a list of websites. These lists were maintained by actual humans.
Typically, these directories also had a limited web search that would crawl through the pages of websites listed in the directory.
Lycos, Excite, and of course Yahoo all offered web directories of this sort.
(EDIT: I initially also mentioned AltaVista. It did offer a web directory by the late '90s, but this was something it tacked on much later.)
By the late '90s, the standard narrative goes, the web got too big to index websites manually.
Google promised the world its algorithms would weed out the spam automatically.
And for a time, it worked.
But then SEO and SEM became a multi-billion-dollar industry. The spambots proliferated. Google itself began promoting its own content and advertisers above search results.
And now with LLMs, the industrial-scale spamming of the web is likely to grow exponentially.
My question is, if a lot of the web is turning to crap, do we even want to search the entire web anymore?
Do we really want to search every single website on the web?
Or just those that aren't filled with LLM-generated SEO spam?
Or just those that don't feature 200 tracking scripts, and passive-aggressive privacy warnings, and paywalls, and popovers, and newsletters, and increasingly obnoxious banner ads, and dark patterns to prevent you cancelling your "free trial" subscription?
At some point, does it become more desirable to go back to search engines that only crawl pages on human-curated lists of trustworthy, quality websites?
And is it time to begin considering what a modern version of those early web directories might look like?
It feels like people are 50%/50% on the search feature on Mastodon and Fediverse in general. Some Fedi software have quite extensive search implemented already in my knowledge. Mastodon barely has any search if you don't count the open text search patch.
Spirited debate today on Mastodon about the new opt-in search capability. Some people think it should be opt-out. Whatever you think about this, perhaps let's all just step back and appreciate that a complex requested feature has been added which will fundamentally change the utility of Mastodon for academics, scientists, journalists and organizations as well as ordinary users. Thank you Mastodon devs! We spoke and you listened.
Full text search is now live on many instances running version v4.2.0 !!!
It is "opt-in", meaning you need to check the "Include public post in search results" box to enable your posts to be searchable: Click Preferences (on right near bottom [gear icon]) >>> Click Public Profile (on left near top [person icon]) >>> Click Privacy and reach box (near top [lock icon]) >>> Under Search (scroll down middle of page)
Now with more and more #instances upgraded to version 4.2.0 of #Mastodon, you can compare full text #search on different ones: Having multiple accounts, I did – and yes, the differences are rather big.
Once again it seems clear, the bigger your instance the better for finding content in the #fediverse. Especially for journalists, this also implies, that an instance by your organization might not be the best idea. Something like journa.host could make more sense.
This clearly seems to follow up the VyrCossont's idea of the extended search feature that finds public posts.
Gargron: "It is my decision to unite all discovery features in one setting, because all of this stuff is an expected part of a social network and splitting it up into different settings that everyone has to opt-into one by one just to get the same behaviour they get by default on other social media seems like a bad user experience. Also, discoverable is already a federated attribute." #Mastodon#Search#OpenSource
Every post in the #fediverse that isn't public should be encrypted. This should be the default, so that to post publicly, a person has to know what they're doing and turn it off. That way, anyone who wants to build search and discovery tools for public posts can do so without permission, or fear of retribution.
this enables full-text search for posts you haven't interacted with, as well as full-text search for accounts, and includes several advanced filtering operators and parser fixes.
If you didn't know, on DuckDuckGo you can search for posts or magazines with site:kbin.social just like you would with Reddit. The same applies to any other Fediverse site like Kbin, Lemmy, Mastodon, etc.. I was really frustrated because it seems like Google was intentionally suppressing them....
Proposing that Lemmy or Kbin could substitute for Reddit while not acknowledging that lack of search makes it impossible to find the appropriate groups in a decentralized maze of servers is very on-brand for the Mastodon crowd. #search#reddit#federated @fediverse
I've heard you can do a google search and append say kbin.social to get results just from kbin, but if I wanted to treat the fediverse's content like reddit's, is there a way to do the google search on just everything that's on the fediverse?...
"Any user interaction data from a system this broken will become increasingly unreliable. So it's no surprise we're seeing a simulacrum of content, a landscape full of mediocre content that might seem tasty but isn't nutritional."
Now that #DuckDuckGo has moved to opt-out advertising without an obvious opt-out (it is there, it is just not obvious), rather than opt-in, I wonder for how much longer I will continue to use it for my www #search needs.
#StartPage, perhaps? I don't fancy running searx again, but I guess I could.
Many of Discord's flaws could be solved with an option to allow full google indexing & even browsing without an account.
I'd gladly opt in. So much weird crap is only on discords these days. I use my discord for, you know, actually chatting not tech support or Github updates, but I'd still happily opt into searchability.
Everything (not private for whatever reason) should be searchable. We went over this in the 90s. Ain't nobody find shit without search.
Just found out that DuckDuckGo can be used to search Fediverse sites easily.
If you didn't know, on DuckDuckGo you can search for posts or magazines with site:kbin.social just like you would with Reddit. The same applies to any other Fediverse site like Kbin, Lemmy, Mastodon, etc.. I was really frustrated because it seems like Google was intentionally suppressing them....
Google search from every aggregator on the fediverse?
I've heard you can do a google search and append say kbin.social to get results just from kbin, but if I wanted to treat the fediverse's content like reddit's, is there a way to do the google search on just everything that's on the fediverse?...
What Search Engine or terms do you use?
What search engine do you use? What terms might you use to filter out junk?...