Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).
Quick comparison between AWS and Google's speech recognition.
Google has a superior UI. Click to upload a file and then a bunch of options.
AWS makes you go to a different site to upload the file to S3, and offers very few options.
But AWS is amazingly accurate, whereas Google is quite dumb.
Take the phrase "Fourteen pounds".
AWS: "£14"
Google: "14 LB"
WTAF?
Both were told to process as en-GB, and there are a few quirks in both. But AWS is excellent.
I also replicated the process of training acoustic models for HK Cantonese in a streamlined MFA workflow. It is easily applicable to many other languages. Check out the MFA tutorial: 🌟 https://chenzixu.rbind.io/resources/3asr/sr4/
Today in my web browsing history: The tech sector waxing lyrical about #OpenAI's upgrades to #ChatGPT, which include #SpeechRecognition and #SpeechSynthesis capabilities, meanwhile on Wikipedia, which was scraped to train many LLMs, begs for donations.
All of the content we've placed online has been mined, processed, refined, and is being sold back to us.
For many of us, that's a new experience.
But I suspect that for those from the global South, it's a repeat of centuries of colonisation.
For folks who work with #ASR#SpeechRecognition, specifically #Whisper from #OpenAI - I have heard some anecdotal evidence of transcription with the medium-en model returning paragraphs of "junk" content, like weather reports and adverts for golfing supplies.
I have three confirmed reports from transcripts of interviews of unrelated topics, and am curious if there are other (as yet unreported) instances of similar?
A little piece I spoke to, and which my #ANU#Cybernetics colleague, Lauren Pay, wrangled into coherence - it's about the history of #speechRecognition as a #complex#system - and how the ANU School of Cybernetics can help you learn how to interrogate and shape such systems.
Written to promote the school's new short courses.
I have been trying to create a whole new system of #SpeechRecognition shortcuts for myself, and I tell you, it is uncommonly like casting magic spells. Like Diane Duane and Patricia Wrede's magic systems: lots of work beforehand so you can trigger the spell with one word later on.
It makes me think the Harry Potter universe must have an awful lot of freeware floating around. Utterly unacknowledged, because JK Rowling isn't great at thinking through her worldbuilding.
Dictation - Google's Project Relate looks interesting. Google has you train your voice on 500+ cards, then creates a custom model for your voice. Particularly useful for anyone with unusual speech patterns.
Announced in late 2021, the Android app was released in January 2023. It sounds like the dictation accuracy (once trained) is better than current apps (Siri, Echo, Google, Dragon).
• Generative AI models learn from mass data scraped from web
• Indigenous groups fear losing control over their data
• Some move to protect their information from commercial use
"When U.S. tech firm OpenAI rolled out Whisper, a speech recognition tool offering audio transcription and translation into English for dozens of languages including Māori, it rang alarm bells for many Indigenous New Zealanders.
"Whisper, launched in September by the company behind the ChatGPT chatbot, was trained on 680,000 hours of audio from the web, including 1,381 hours of the Māori language."