#speechrecognition - kbin.social

zeroun, 1 month ago to random French

Salut,
Vous avez un soft de reconnaissance vocale (#SpeechRecognition) à me conseiller sur gnu/lINUX ?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Fanch29, TheOtterDragon, Sans_DeC, PhieLaidMignon

daeinc, 3 months ago (edited 3 months ago) to genart

Voice coding session with AI (Google Gemini Pro)
This time with vanilla JS Canvas code + Ssam.js
🔊 (sound on)
video edited to keep it short.

#creativeCoding #createSsam #MachineLearning #AI #javaScript #SpeechRecognition

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

persagen, 5 months ago to ai

Human brain cells hooked up to a chip can do speech recognition
Clusters of brain cells grown in the lab have shown potential as a new type of hybrid bio-computer
https://www.technologyreview.com/2023/12/11/1084926/human-brain-cells-chip-organoid-speech-recognition/

https://www.nature.com/articles/s41928-023-01069-w.epdf?sharing_token=8lxB1sW6uf1wH5zkCcESdNRgN0jAjWel9jnR3ZoTv0M8M4oYFZwXw3ADmbAHb74edPY2vKOF1O-GEmUziwBdM4S_Qm6LsNScxuBszwcCh3Rk4pthEc9aDrLpsaqe7-4YKhK3HWl1E_kodyNXdLfYE8tDAEEOwYPeyB57nYaTS37RGi7ufFW6oTEw56CoODMl5--KpkoDNlKeNmnAy_xdh1yFCG3GnlCN2KvDVaBbYe4%3D&tracking_referrer=www.technologyreview.com

https://en.wikipedia.org/wiki/Biological_computing

#AI #organoids #BrainOrganoids #BiologicalComputing #SpeechRecognition #biocomputing

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kkarhan

devinprater, 6 months ago to accessibility

Youtube video, not from me. Talon and NVDA:

https://www.youtube.com/watch?v=TI7m049FgtE

#nvda #accessibility #blind #windows #speechRecognition

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

unfa, 7 months ago to opensource

Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).

I just recorded a dozen or so sentences :)

https://commonvoice.mozilla.org/

#OpenSource #SpeechRecognition #SpeechToText #Mozilla #DeepSpeech

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ KathyReid, ppatel, jaybird110127, janriemer +2 more

Edent, 7 months ago to ai

Quick comparison between AWS and Google's speech recognition.

Google has a superior UI. Click to upload a file and then a bunch of options.
AWS makes you go to a different site to upload the file to S3, and offers very few options.

But AWS is amazingly accurate, whereas Google is quite dumb.

Take the phrase "Fourteen pounds".
AWS: "£14"
Google: "14 LB"

WTAF?

Both were told to process as en-GB, and there are a few quirks in both. But AWS is excellent.

#AI #SpeechRecognition #Transcription

Hello, audiobook and Internet world. I'm just weirded out. I've been sent as payment for a small project. I did luncheon vouchers. No I. I know it's churlish to complain when when people pay you, but I've literally in in my life. In 31 years of living on planet Earth, I have never even seen a luncheon voucher. I thought they died out in the eighties, but I'm I'm here with with £14 worth of 50 P luncheon vouchers. I don't know if I have to use them individually or all at once, but it's just weird. I Why? If you were going to pay someone and offer them a voucher, wouldn't you say have an Amazon voucher? You know that they don't cost any extra to send out, and they can be emailed used everywhere, or at least offer a choice. You know, Would you like PayPal or a high street voucher play dot com, But just to send out lunch vouchers? It's a bit weird, isn't it?

reply

expand (14)

collapse (14)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ KathyReid

Kiloku, 7 months ago to random

Does anyone here know how to use #kaldi or other speech recognition stuff?

I tried whisper.cpp but it apparently can only use OpenAI's models, so it's not an option, on ethical grounds.

I want to implement cross-platform voice commands into #Freespace Open, as it currently only works in Windows with the Microsoft SAPI.

(boosts welcome)

#speechrecognition #ASR

reply

expand (12)

collapse (12)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ thepoliticalcat, grrrr_shark, KathyReid

chenzi, 7 months ago to linguistics

Are you looking for a Cantonese forced aligner? Check out my new #Kaldi (source) tutorial on training models of HK Cantonese: 🌟https://chenzixu.rbind.io/resources/3asr/sr3/

I also replicated the process of training acoustic models for HK Cantonese in a streamlined MFA workflow. It is easily applicable to many other languages. Check out the MFA tutorial: 🌟 https://chenzixu.rbind.io/resources/3asr/sr4/

#linguistics #phonetics #ASR #academic #speechrecognition

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ KathyReid

KathyReid, 7 months ago to OpenAI

Today in my web browsing history: The tech sector waxing lyrical about #OpenAI's upgrades to #ChatGPT, which include #SpeechRecognition and #SpeechSynthesis capabilities, meanwhile on Wikipedia, which was scraped to train many LLMs, begs for donations.

All of the content we've placed online has been mined, processed, refined, and is being sold back to us.

For many of us, that's a new experience.

But I suspect that for those from the global South, it's a repeat of centuries of colonisation.

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

itnewsbot, 7 months ago to machinelearning

ChatGPT update enables its AI to “see, hear, and speak,“ according to OpenAI - Enlarge (credit: Getty Images)

On Monday, OpenAI announced a s... - https://arstechnica.com/?p=1970737 #largelanguagemodels #speechrecognition #machinelearning #speechsynthesis #computervision #textsynthesis #multimodalai #multimodal #microsoft #whisperai #aiethics #bemyeyes #bingchat #android #chatgpt #chatgtp #biz⁢ #openai #tech #ios #ai

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 8 months ago to OpenAI

For folks who work with #ASR #SpeechRecognition, specifically #Whisper from #OpenAI - I have heard some anecdotal evidence of transcription with the medium-en model returning paragraphs of "junk" content, like weather reports and adverts for golfing supplies.

I have three confirmed reports from transcripts of interviews of unrelated topics, and am curious if there are other (as yet unreported) instances of similar?

If so, please let me know - DM for email address.

Boosts appreciated.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Paxxi

schizanon, 9 months ago to Amazon

"In addition to text, #ToxicityDetection uses speech cues such as tones and pitch to hone in on toxic intent in speech."

Good luck AWS, my toxicity is completely deadpan

Flag harmful language in spoken conversations with #Amazon #Transcribe #Toxicity #Detection | #AWS #MachineLearning Blog https://aws.amazon.com/blogs/machine-learning/flag-harmful-language-in-spoken-conversations-with-amazon-transcribe-toxicity-detection/

#speechRecognition #AI #moderation

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 10 months ago to random

A little piece I spoke to, and which my #ANU #Cybernetics colleague, Lauren Pay, wrangled into coherence - it's about the history of #speechRecognition as a #complex #system - and how the ANU School of Cybernetics can help you learn how to interrogate and shape such systems.

Written to promote the school's new short courses.

Check it out at:
https://cybernetics.anu.edu.au/news/2023/07/05/are-you-listening-to-me/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yingtai, 11 months ago to random

I have been trying to create a whole new system of #SpeechRecognition shortcuts for myself, and I tell you, it is uncommonly like casting magic spells. Like Diane Duane and Patricia Wrede's magic systems: lots of work beforehand so you can trigger the spell with one word later on.

It makes me think the Harry Potter universe must have an awful lot of freeware floating around. Utterly unacknowledged, because JK Rowling isn't great at thinking through her worldbuilding.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Crell

itnewsbot, 11 months ago to tech

11 NLP Use Cases: Putting the Language Comprehension Tech to Work - Natural Language Processing (NLP), which encompasses areas such as linguistics, co... - https://readwrite.com/11-nlp-use-cases-putting-the-language-comprehension-tech-to-work/ #speechrecognition #industrial #cogitotech #security #chatbots #tech #ai

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chris_hayes, 1 year ago to accessibility

Dictation - Google's Project Relate looks interesting. Google has you train your voice on 500+ cards, then creates a custom model for your voice. Particularly useful for anyone with unusual speech patterns.

Announced in late 2021, the Android app was released in January 2023. It sounds like the dictation accuracy (once trained) is better than current apps (Siri, Echo, Google, Dragon).

https://abilitynet.org.uk/news-blogs/dictating-speech-challenges-testers-experience-google%E2%80%99s-project-relate-speech-app
#a11y #speechrecognition #dictation #google #android

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

DoomsdaysCW, 1 year ago to random

#Indigenous groups in #NewZealand, #UnitedStates fear #colonisation as #AI learns their languages

By Rina Chandran, April 03, 2023

• Generative AI models learn from mass data scraped from web
• Indigenous groups fear losing control over their data
• Some move to protect their information from commercial use

"When U.S. tech firm OpenAI rolled out Whisper, a speech recognition tool offering audio transcription and translation into English for dozens of languages including Māori, it rang alarm bells for many Indigenous New Zealanders.

"Whisper, launched in September by the company behind the ChatGPT chatbot, was trained on 680,000 hours of audio from the web, including 1,381 hours of the Māori language."

#Maori #FirstNations #IndigenousNews #NativeAmericans #AIColonization #Colonization

Read more: https://www.context.news/ai/nz-us-indigenous-fear-colonisation-as-bots-learn-their-languages?utm_source=pocket-newtab

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ MetalSamurai, HistoPol

HistoPol, 1 year ago

@DoomsdaysCW

@msquebanh

##LLM's, #ChatGPT, #SpeechRecognition is feared to foster #colonization by #indigenous peoples:
https://kolektiva.social/@DoomsdaysCW/110176994664392122

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...