Somebody asks for a desktop alternative to MacWhisper for Windows (i.e. speech to text transcriber) and somebody else recommends a cloud Google service. Should we call that second person a simpleton with a cloudy mind?
📽 New video: "Practical AI: Automated Subtitle Generation", in which I explain how to run OpenAI's WhisperNet locally, to automatically create subtitles for videos, at high accuracy.
acabo de descobrir una aplicació que és la canya al @fdroid . aquest missatge està pràcticament tot dictat a aquest teclat de reconeixement de veu, que té molts idiomes inclòs al català. realment pensava que el reconeixement de veu utilitzant un android totalment lliure era una quimera, però veig que no. l'única cosa que li haig d'indicar és la puntuació i alguna paraula que no és purament catalana, com "f-droid"
Mastodon, un proche a besoin de toi !
Adulte #dys il a lâché le français depuis le collège.
Il est obligé de se reconvertir et bosse depuis 6 mois pour reprendre confiance en son écrit avec de gros progrès.
Restent de vraies fragilités qui vont compliquer une formation qu'il débute bientôt dans un secteur où l'écrit a une place importante.
Connais tu un logiciel gratuit qui fait du #speechtotext sous #linux pour lui simplifier la vie ?
Si tu boost ou repouet, tu aides une bonne personne.
I'm sure everyone who wants to know about this already does but, just in case anyone has, particularly if #blind or #DeafBlind, been looking for a local method of converting speech to text ... Whisper is an ML model from OpenAI which allows doing that. It can be used accessibly with all screen readers on Windows. Obviously, this is great for those of us with impaired hearing, it is certainly far more accurate than any of the speech to text programs I've seen, needs no training, and can handle background noise quite well. The audio duration limits are set by your hard drive space and the amount of time you're willing to put into transcription, I've transcribed several hours of audio without difficulty, it just takes time. It's available on Windows using https://github.com/Softcatala/whisper-ctranslate2 which just seems to need python. A GPU makes it faster, but it's usable on an I5 CPU. The model is also available online at https://freesubtitles.ai though that requires payment or waiting for long periods to transcribe limited amounts of audio. Thanks to @Bryn for the pointer at whisper-ctranslate2. #whisper#SpeechToText
Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).
I just found this speech-to-text keyboard on F-Droid (in the IzzyOnDroid repo) called Sayboard. The app is pretty bad (no dark mode!! 🤬), but the transcribing works pretty well! The app is open source, and everything is run locally on-device.
There are solutions, but belong to corporations and tied to license agreements, which can change at any time and hence, can not reliably be used by developers outside these corporations.
Mozilla Common Voice is aiming to change that, but they need voices (esp. female ones!) and verification for the models.
With just a few minutes of your time, you can help!
Is there a working and free software for speech recognition under Linux?
I would like to dictate directly into the writing program and would not like to make a diversion via Google - i.e., preferably offline.
Daily #protip :
If you have a video file but you don't have a subtitle for the video, and would like to create using local speech to text, you can use #kdenlive to generate subtitle for said video.
You have two options for speech to text engines: VOSK and OpenAI Whisper.
Both are using different types of models you can use, and CUDA computing or the good old CPU, and VOSK requires separate models for each language, and does not produce a formatted output, just raw text.
Whisper is slightly more advanced, because it uses a multilingual model by default, which you can select to translate into English, from any language in the model and the output will be formatted normally, and acronyms like GPU and such are properly capitalized in the final text.
But, as with anything, there's a catch: If you would want to utilize CUDA computing and the large model, you would need around 10 GB of VRAM, which isn't very common these days. However, you can always use the default option to use CPU compute, but that'll be around a couple hours, but in the case of a 24-min video, it'll be likely 40-50 mins to create the subtitle, which is a nice waiting game. However, once it's done, you will have a somewhat usable subtitle, exactly tied to the speech in the video. More details and info: https://docs.kdenlive.org/en/effects_and_compositions/speech_to_text.html https://kdenlive.org/en/download/ #techtips#subtitles#speechtotext
Are there any good apps that both do speech-to-text and translate at the same time? So if I, for example, would speak in language A it would both write it as text and as the same time in language B?
on the free plan one gets 300 minutes of real time transcription per month.
Otto can also otter can also transcribe audio that has been recorded previously I'm wondering if the 300 minutes count against such previously recorded audio
I'm also wondering if otter is the best solution the best transcription solution