#SpeechToText - kbin.social

dusoft, 2 months ago to ai

Somebody asks for a desktop alternative to MacWhisper for Windows (i.e. speech to text transcriber) and somebody else recommends a cloud Google service. Should we call that second person a simpleton with a cloudy mind?

BTW, SpeechNote is the recommended alternative for Linux (a FlatPak package).
#ai #audio #transcription #speechtotext #STT #hugginface #mac #Linux

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simoncox

oblomov, 3 months ago (edited 3 months ago) to linux

Are there any decent #speechToText options for #Linux? Last time I checked the situation seemed pretty dire.

#askFedi #fediHelp

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ travisfw

serpicojam, 4 months ago to apple

Serious iPhone question: what's up with the speech-to-text automatic insertion of commas between subjects and verbs?

#autocorrect #SpeechToText #grammar #Apple #iPhone #editing #english

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ serpicojam

derickr, 5 months ago to ai

📽 New video: "Practical AI: Automated Subtitle Generation", in which I explain how to run OpenAI's WhisperNet locally, to automatically create subtitles for videos, at high accuracy.

https://youtu.be/FZonnnalfYc

#AI #SpeechToText #Subtitles

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

donwatkins, 5 months ago to opensource

Interview Hack: AI saves the day(and ears) – #OpenSource #AI #Whisper #SpeechToText https://www.both.org/?p=2928

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

eudaimon, 7 months ago to fdroid

acabo de descobrir una aplicació que és la canya al @fdroid . aquest missatge està pràcticament tot dictat a aquest teclat de reconeixement de veu, que té molts idiomes inclòs al català. realment pensava que el reconeixement de veu utilitzant un android totalment lliure era una quimera, però veig que no. l'única cosa que li haig d'indicar és la puntuació i alguna paraula que no és purament catalana, com "f-droid"

#STT #SpeechToText #ReconeixementDeLaParla #ReconeixementDeVeu #Català

(edit: el programa és Sayboard, https://f-droid.org/es/packages/com.elishaazaria.sayboard/ #Sayboard)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Tom23, 7 months ago to linux French

Mastodon, un proche a besoin de toi !
Adulte #dys il a lâché le français depuis le collège.
Il est obligé de se reconvertir et bosse depuis 6 mois pour reprendre confiance en son écrit avec de gros progrès.
Restent de vraies fragilités qui vont compliquer une formation qu'il débute bientôt dans un secteur où l'écrit a une place importante.

Connais tu un logiciel gratuit qui fait du #speechtotext sous #linux pour lui simplifier la vie ?

Si tu boost ou repouet, tu aides une bonne personne.

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ciredutempsEsme, Camille_Poulsard, bilbo_le_hobbit, Purplenessa +8 more

techsinger, 7 months ago to random

I'm sure everyone who wants to know about this already does but, just in case anyone has, particularly if #blind or #DeafBlind, been looking for a local method of converting speech to text ... Whisper is an ML model from OpenAI which allows doing that. It can be used accessibly with all screen readers on Windows. Obviously, this is great for those of us with impaired hearing, it is certainly far more accurate than any of the speech to text programs I've seen, needs no training, and can handle background noise quite well. The audio duration limits are set by your hard drive space and the amount of time you're willing to put into transcription, I've transcribed several hours of audio without difficulty, it just takes time. It's available on Windows using https://github.com/Softcatala/whisper-ctranslate2 which just seems to need python. A GPU makes it faster, but it's usable on an I5 CPU. The model is also available online at https://freesubtitles.ai though that requires payment or waiting for long periods to transcribe limited amounts of audio. Thanks to @Bryn for the pointer at whisper-ctranslate2. #whisper #SpeechToText

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Rasta, Binder, datajake1999, objectinspace +1 more

unfa, 7 months ago to opensource

Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).

I just recorded a dozen or so sentences :)

https://commonvoice.mozilla.org/

#OpenSource #SpeechRecognition #SpeechToText #Mozilla #DeepSpeech

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ KathyReid, ppatel, jaybird110127, janriemer +2 more

cs, 7 months ago to random

#Siri, under what circumstances would you pick nonsense word “Hatten” over the commonly used word “hadn’t”?

#SMH #IPone #SpeechToText

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

gregorni, 7 months ago to fdroid

I just found this speech-to-text keyboard on F-Droid (in the IzzyOnDroid repo) called Sayboard. The app is pretty bad (no dark mode!! 🤬), but the transcribing works pretty well! The app is open source, and everything is run locally on-device.

https://apt.izzysoft.de/fdroid/index/apk/com.elishaazaria.sayboard

https://github.com/ElishaAz/Sayboard

#Sayboard #FDroid #SpeechToText #Android #IME #Keyboard #InputMethod

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mina, 8 months ago to accessibility German

#VoiceRecognition (#SpeechToText) is an important part of making technology more accessible to everyone.

There are solutions, but belong to corporations and tied to license agreements, which can change at any time and hence, can not reliably be used by developers outside these corporations.

Mozilla Common Voice is aiming to change that, but they need voices (esp. female ones!) and verification for the models.

With just a few minutes of your time, you can help!

https://commonvoice.mozilla.org/

#a11y

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ grrrr_shark

thelinuxcast, 9 months ago to linux

Speech Note - A Cool Note Taking App with #TextToSpeech, #SpeechToText and #Translation - https://youtu.be/zlLVgTB42Bo #Linux #TheLinuxCast

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

mkiol, 10 months ago to linuxphones

#Papago 2.0 repeats what you say but in a different language!

It uses offline live speech translations.

Watch how I speak Spanish 😎

https://youtu.be/nTCwWwko7lU

#SailfishOS #offline #translator #TextToSpeech #SpeechToText

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Sh4d0w_H34rt, linmob

mkiol, 10 months ago to linux

I've just released Speech Note 4.0!
New version comes with shiny new offline machine Translator and many new Text to Speech voices.

To implement the Translator I borrowed some code and models from amazing #BergamotProject and #FirefoxTranslations.

#SpeechNote is a Linux offline #SpeechToText, #TextToSpeech and #MachineTranslation app. You can download it from #Flathub

Videos:
#Linux Desktop: https://youtu.be/psRT0UPFb04
#PinePhone: https://youtu.be/kTsM3kUxE2Q
#SailfishOS: https://youtu.be/88cdPpvBmmI

image/png
image/png

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ linmob

igorwarneck, 10 months ago to linux German

Question to the #Linux and #Foss community:

Is there a working and free software for speech recognition under Linux?
I would like to dictate directly into the writing program and would not like to make a diversion via Google - i.e., preferably offline.

I would be very grateful for any tips!

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mic, brainwane, chillicampari

doboprobodyne, 10 months ago

@downey
@igorwarneck

@geb has generously published #Numen #SpeechToText under an #OpenSource licence, and it looks very cool. I'm waiting for the day I can talk to an automated #AirTraffic controller on #Flightgear #FlightSimulator !

https://sr.ht/~geb/numen/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ chillicampari

sakuramochi55, 11 months ago to random

Daily #protip :
If you have a video file but you don't have a subtitle for the video, and would like to create using local speech to text, you can use #kdenlive to generate subtitle for said video.
You have two options for speech to text engines: VOSK and OpenAI Whisper.
Both are using different types of models you can use, and CUDA computing or the good old CPU, and VOSK requires separate models for each language, and does not produce a formatted output, just raw text.
Whisper is slightly more advanced, because it uses a multilingual model by default, which you can select to translate into English, from any language in the model and the output will be formatted normally, and acronyms like GPU and such are properly capitalized in the final text.
But, as with anything, there's a catch: If you would want to utilize CUDA computing and the large model, you would need around 10 GB of VRAM, which isn't very common these days. However, you can always use the default option to use CPU compute, but that'll be around a couple hours, but in the case of a 24-min video, it'll be likely 40-50 mins to create the subtitle, which is a nice waiting game. However, once it's done, you will have a somewhat usable subtitle, exactly tied to the speech in the video. More details and info:
https://docs.kdenlive.org/en/effects_and_compositions/speech_to_text.html
https://kdenlive.org/en/download/
#techtips #subtitles #speechtotext

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

teachpaperless, 11 months ago to random

Are there any good apps that both do speech-to-text and translate at the same time? So if I, for example, would speak in language A it would both write it as text and as the same time in language B?

#apps #translation #speechtotext

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

josschuurmans, 1 year ago to ai

I'm testing otter.ai

on the free plan one gets 300 minutes of real time transcription per month.

Otto can also otter can also transcribe audio that has been recorded previously I'm wondering if the 300 minutes count against such previously recorded audio

I'm also wondering if otter is the best solution the best transcription solution

#OtterAI #SpeechToText #AI #transcription #notetaking #interviewing #meetings #KnowledgeManagement #KM #productivity #curation

1/3

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...