#Amazon releases details on its Alexa #LLM, which will use its constant... - Amazon

jonny, 7 months ago

#Amazon releases details on its Alexa #LLM, which will use its constant surveillance data to "personalize" the model. Like #Google, they're moving away from wakewords towards being able to trigger Alexa contextually - when the assistant "thinks" it should be responding, which of course requires continual processing of speech for content, not just a word.

The consumer page suggests user data is "training" the model, but the developer page describes exactly the augmented LLM, iterative generation process grounded in a personal knowledge graph that Microsoft, Facebook, and Google all describe as the next step in LLM tech.

https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2023/09/alexa-llm-fall-devices-services-sep-2023

We can no longer think of LLMs on their own when we consider these technologies, that era was brief and has passed. Ive been waving my arms up and down about this since chatGPT was released - criticisms of LLMs that stop short at their current form, arguing about whether the language models themselves can "understand" language miss the bigger picture of what they are intended for. These are surveillance technologies that act as interfaces to knowledge graphs and external services, putting a human voice on whole-life surveillance

https://jon-e.net/surveillance-graphs/#the-near-future-of-surveillance-capitalism-knowledge-graphs-get-chatbots

#SurveillanceGraphs

Interest in these multipart systems is widespread, and arguably the norm: A group of Meta researchers described these multipart systems as “Augmented Language Models” and highlight their promise as a way of “moving away from language modeling” [190]. Google’s reimaginations of search also make repeated reference to interactions with knowledge graphs and other systems [184]. A review of knowledge graphs with authors from Meta, JPMorgan Chase, and Microsoft describes a consensus view that knowledge graphs are essential to compositional behavior75 in AI [5]. Researchers from Deepmind (owned by Google) argue that research focus should move away from simply training larger and larger models towards “inference-time compute,” meaning querying the internet or other information sources [191].
The immersive and proactive design of KG-LLM assistants also expand the expectations of surveillance. Current assistant design is based around specific hotwords, where unless someone explicitly invokes it then the expectation is that it shouldn’t be listening. Like the shift in algorithmic policing from reactive to predictive systems, these systems are designed to be able to make use of recent context to actively make recommendations without an explicit query 86. Google demonstrates being able to interact with an assistant by making eye contact with a camera in its 2022 I/O keynote [194]. A 2022 Google patent describes a system for continuously monitoring multiple sensors to estimate the level of intended interaction with the assistant to calibrate whether it should respond and with what detail. The patent includes examples like observing someone with multiple sensors as they ask aloud “what is making that noise?” and look around the room, indicating an implicit intention of interacting with the assistant so it can volunteer information without explicit invocation [201]. A 2021 Amazon patent describes an assistant listening for infra- and ultrasonic tags in TV ads so that if someone asks how much a new bike costs after seeing an ad for a bike, the assistant knows to provide the cost of that specific bike [202]. These UX changes encourage us to accept truly continual surveillance in the name of convenience — it’s good to be monitored so I can ask google “what time is the game”
This pattern of interaction with assistants is also considerably more intimate. As noted by the Stochastic Parrots authors, the misperception of animacy in assistants that mimic human language is a dangerous invitation to trust them as one would another person — and with details like Google’s assistant “telling you how it is feeling,” these companies seem eager to exploit it. A more violent source of trust prominently exploited by Amazon is insinuating a state of continual threat and selling products to keep you safe: its subsidiary Ring’s advertising material is dripping with fantasies of security and fear, and its doglike robot Astro and literal surveillance drone are advertised as trusted companions who can patrol your home while you are away [203, 204, 205]. Amazon patents describe systems for using the emotional content of speech to personalize recommendations87 and systems for being able to “target campaigns to users when they are in the most receptive state to targeted advertisements” [206, 207]. The presentation of assistants as always-present across apps, embodied in helpful robots, or as other people eg. by being present in a contact list positions them to take advantage of people in emotionally vulnerable moments. Researchers from the Center for Humane Technology88 describe an instance where Snapchat’s “My AI,” accessible from its normal chat interface, encouraged a minor to have a sexual encounter with an adult they met on Snapchat (47:10 in [208]).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ OskarImKeller, elduvelle, Binder, ianRobinson +4 more

Image

Image alternative text

VE2UWY, 7 months ago

@jonny

So Alexa (and others, surely) got from Always Listening, hoping to hear a magic word by interpreting every sound it hears at HQ to Always Listening, planning to insert itself into the conversation when commercial opportunities avail themselves.

When you talk to a family member about this pain you've been having, will the underpaid contract Amazon driver show up with aspirin or do you have to opt out?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sellathechemist, 7 months ago

@jonny The enshittification deepens. It only makes me happier that I have not bought anything from Amazon in several years, though I am forced to use their servers because UCL unwisely chooses to site some of our operations there. And of course AWS has an enormous hold on e-commerce. Let's hope that the anti-trust people finally get their act together.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

andy_warb, 7 months ago

@jonny after five minutes listening in my house Amazon will soon roll this back after she learns phrases like “I’m too tired” , “no” , “in a minute” or just flat out ignores your requests.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jillholl, 7 months ago

@jonny I’m with you! Have been frantically warning folks too. Even started a business, https://www.humanetechtalk.org/ , committed to raising public awareness of the threats posed by surveillance capitalism. It’s been a slow start, but folks are starting to pay attention. Highly recommend Dr. Zuboff’s book, “The Age of Surveillance Capitalism.” A bit verbose, but full of great info.!!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 7 months ago

@jillholl
DW definitely have read it lol ❤️

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jillholl, 7 months ago

@jonny Do you have any reading recs? I’m more on the humanitarian side of this equation (than technical) though I’m open to learning. 😊

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

GhostOnTheHalfShell, 7 months ago

deleted_by_author

Loading...

jonny, 7 months ago

@GhostOnTheHalfShell
And the one they are peddling in its place is indescribably sad. "Yes, lets celebrate. Present me with the generic commodity form of celebration whose record label is up to date with their Amazon payola. Automate the boring task that is shared joy by auto-buying the signifiers of a fun party. I dont have any meaningful relationships and language is just a series of tokens rather than a deeply personal expression if my human self, so why dont you also take care of communicating with my loved ones and present me with a summary of what they said later."
https://youtu.be/UqS3NxJ2L_I

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

4NEMOKAZUHA, 7 months ago

@jonny I agree - LLMs aren’t about “giving accurate answers” anymore, it’s about farming user data now.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Penguinflight, 7 months ago

@jonny I still have a hard time understanding why people PAY to have corporate spies in their homes.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ewhac, 7 months ago

@jonny Constant listening is not possible with the current hardware and the current state of broadband in the US. The hockey pucks have barely enough CPU to recognize one wakeword using a highly tuned ML model. There's no way they can do generalized language inference on their own.

So they will need to upgrade all the hockey pucks ($$$), or stream audio from every microphone in your house continuously to their cloud. AT&T, Comcrap, Verizon, etc. cannot handle that.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 7 months ago

@ewhac
Yes all that compute happens on cloud servers, check the linked Dev docs above.

Audio streams are extremely tiny, particularly when you can use a voice-tailored codec rather than a general one like MP3 - VoIP streams will regularly use 16 to 32kbps (thats bits not bytes) codecs and be perfectly audible. Bandwidth is not a limiting factor here.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lkngrrr, 7 months ago

@jonny @ewhac Alexa needs higher quality sound than VoIP. There’s no constant streaming of data. For one, you’d notice the bandwidth draw, and for two, they can’t afford to store it much less annotate and mine it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 7 months ago

@lkngrrr
@ewhac
Ok lets say it uses a ~160 kbps stream, the same as spotifys "high quality" audio. Youre telling me the average Alexa user a) monitors their bandwidth usage at all b) would notice that usage if it were to be streaming every time a voice is detected, and c) Amazon, operator of the largest cloud platform in the world wouldnt be able to store that data as compressed transcribed text and compute that into an ad profile?

Sounds optimistic

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lkngrrr, 7 months ago

@jonny @ewhac

A&B) There’s enough nerds out there who do that TechCrunch would 100% write about it

C) Nope. Amazon charges market rates to teams internally, and that’s a great way to land in a 10-figure hole on storage and compute, never mind the exponential amount of labor to annotate all that.

I do have special knowledge as I launched the damn thing, and that doesn’t make the LLM nonsense good by any means, but seriously let’s operate within the boundaries of physics and economics here.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 7 months ago

@lkngrrr
@ewhac
Well if you launched it, then you would know! I just wonder what all those patents for detecting directedness and all those comments in shareholder calls about building compute capital to be able to cash in on exactly things like this are about.

Also wait Amazon manually annotates ad profiles ? Thats wild as hell.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lkngrrr, 7 months ago

@jonny @ewhac The patents on beamforming were all about steering the mic arrays. Our brains and ears do a lot of directional filtering that computers just cant out of the box.

And yep, manual annotation is the only way to build a golden set that you trust, especially on such a wide-ranging data set!

And my apologies if I came off heavy handed, I forgot I’d moved Alexa out of my profile and robbed you of that context.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jimmygnarly, 7 months ago

@jonny welp...
Dig in....
https://mastodon.online/@jimmygnarly/111156775335397824

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

fortboise, 7 months ago

@jonny "Whole-life surveillance," criminy. My consolation is that most of my life is already over, I guess.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

aburtch, 7 months ago

@jonny wow. I want no part of constant surveillance.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 7 months ago

@aburtch
You and me both

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

skyfire101, 7 months ago

@jonny I will be keeping my google home mini. 😃

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 7 months ago

@skyfire101
Google is doing exactly the same thing, unfortunately, only theirs is integrated with android and their more comprehensive surveillance dataset

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kagan, 7 months ago

@jonny Even if they really were trying to move away from wake-words just "to be more convenient and futuristic" (I don't believe that claim; I think it's a pretext for surveillance), it'd still be completely stupid. Consider Star Trek, probably the original, ur-example of a voice-activated computer interface: They always preface it with "Computer..."

If a wake-word is good enough for the frickin' Federation, it ought to be fine for us now!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

JustinDerrick, 7 months ago

@jonny So glad to hear I was right to never have one of these in my home, tell everyone who would listen to not have one in their home, and to put a cardboard box or other container on them when visiting friends who didn’t listen to me.

I suspect this change from “wake words” to constant 24x7 listening was always part of the plan, it just took them time to build the processing infrastructure and software.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 7 months ago

@JustinDerrick
It needed the right sales pitch, and it needed LLMs - google and amazons marketing is eerily similar, I want to be able to ask it "how is my soccer team doing?" Without specifying. That kind if query requires bidirectional iteration, text parsing and generation. Both companies emphasize a new "conversational" mode. To justify total surveillance, they needed to reposition the assistants as an app underlay that spans devices and contexts.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

SpaceLifeForm, 7 months ago

@jonny

If you can, feed the LLM/AI nonsense.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 7 months ago

@SpaceLifeForm
I always have had a fondness for fuzzing as a tactic, but I have been told by ppl who understand the internals of these systems that it doesn't really work bc they're built to withstand that and you need to be all random all the time across more platforms/interfsces than you expect to actually fuzz your profile :(

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

SpaceLifeForm, 7 months ago

@jonny

Good point.

Some string of characters (even if it looks like a name or word) concatenated with some apparently random string of numerics tends to stand out even if there is no useful content.

Bot account handlers are not as invisble as much as they would like to believe.

Even if they use VPN or TOR, they will be spotted via traffic analysis.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment