mcc,
@mcc@mastodon.social avatar

Something I've been noticing about search engines (well, google) recently is how bad they are at respecting word order in cases where it matters. Like, you search "convert x to y" and they match "convert y to x". Or this morning I'm searching "inset diamond in square equation" and i am indeed finding overwhelmingly things that contain both the words "diamond" and "equation", such as the "diamond method" for quadratics. But nothing related to insetting diamonds in squares.

mcc,
@mcc@mastodon.social avatar

It occurs to me these are the kinds of problems that "old" AI methods could be applied to. You know, the ancient days of 2018, before OpenAI redefined "AI" to mean something worthless and uninteresting. Markov models, neural network models, NLP trees and even word vectors/word embeddings. I could use something that lets me ask "give me sentences like 'insetting a diamond in a square'", alternate phrasings or grammar-ings. We have technologies that can be applied toward that end.

trochee,
@trochee@dair-community.social avatar

@mcc my computational linguistics career from about 2002–2008 was specifically about trying to use syntax — parse trees — to drive stochastic models of word sequences

I used to joke that ""parse trees are a solution looking for a problem"; but my dissertation is basically "hey I found some problems for which parse trees help with a solution"

And yes, all this thinking is lost in the LLM flood, like tears in the rain.

mcc,
@mcc@mastodon.social avatar

But no, "AI search" now means and is only ever going to mean "a markov chain bot with some extra layers is going to make up a sentence which plausibly resembles English text and contains one or more of your keywords". Which is not a problem I need or want to solve. I have MegaHAL at home.

irenes,

@mcc no, yeah, it goes back to the "neat" vs. "scruffy" thing, which was pre-AI-winter terminology about approaches that can, in principle, be understood by humans vs. approaches that can't (that's our very loose attempt to summarize the intuition, we're not aware of a widely-known explanation)

mcc,
@mcc@mastodon.social avatar

Oh. Huh. In the hours since I made this post, Google has gone from finding 0 results to finding 1 result

phi1997,

@mcc
Congratulations?

mcc,
@mcc@mastodon.social avatar

@phi1997 I solved the problem

ratsnakegames,
@ratsnakegames@mastodon.social avatar

@mcc Why would anybody ever google the phrase "What is diamond 💎 ?"

mcc,
@mcc@mastodon.social avatar

@ratsnakegames how is diamond 💎 formed

fabiosantoscode,
@fabiosantoscode@mastodon.social avatar

@mcc A couple months ago I actually tried to build a search engine with markov chains, but I didn't get far 😅

I think I would love to use a more intelligent search, and importantly, a search that's curated by humans.

Websites should be briefly seen by a human to confirm whether they are spam garbage before they show up on my results. I don't care if I have to pay for this feature.

lightninhopkins,
@lightninhopkins@mastodon.social avatar

@mcc try chatGPT. It's honestly better.

excess,

@mcc
I now just asume that google's " " are actual air quotes to denote sarcasm, instead of the old way of preserving the exact sentence.

  • Oh, so you are telling me that "this exact order is important"? Sure, let me fetch you amazon affiliate links so you can "order" online a bunch of "important" stuff you don't actually need ;)
elliottc,

@mcc true, but does it help to, eg. put the x to y itself in quotes?

mcc,
@mcc@mastodon.social avatar

@elliottc for some types of sentences but not others (eg it depends on whether the language of the thing means people are or aren't likely to all uniformly phrase it the same way)

ravenonthill,
@ravenonthill@mastodon.social avatar

@mcc they're searching keywords; they don't know grammar. That's probably something an LLM would do better; LLMs are all about grammar. In some sense it may be that LLMs actually are grammars.

mcc,
@mcc@mastodon.social avatar

@ravenonthill That's possible, but since LLMs as they developed are designed as copyright laundering mechanisms and not as query systems, I cannot use them to locate where in the source data set something from a particular point in LLM space was being discussed. The only thing it can do is produce randomly generated sentences. Which is not what I want or need.

mcc,
@mcc@mastodon.social avatar

@ravenonthill The thing I want ("where is this from?") has been intentionally and aggressively thrown away, and no one will ever design a system where it isn't thrown away, because if it were not thrown away then things like OpenAI and Gemini would be legally untenable.

ravenonthill,
@ravenonthill@mastodon.social avatar

@mcc sigh. Yes.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • khanakhh
  • magazineikmin
  • mdbf
  • GTA5RPClips
  • everett
  • rosin
  • Youngstown
  • tacticalgear
  • slotface
  • ngwrru68w68
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • tester
  • JUstTest
  • ethstaker
  • cubers
  • osvaldo12
  • cisconetworking
  • Durango
  • InstantRegret
  • normalnudes
  • Leos
  • modclub
  • anitta
  • provamag3
  • megavids
  • lostlight
  • All magazines