dusoft, to ai
@dusoft@fosstodon.org avatar

Somebody asks for a desktop alternative to MacWhisper for Windows (i.e. speech to text transcriber) and somebody else recommends a cloud Google service. Should we call that second person a simpleton with a cloudy mind?

BTW, SpeechNote is the recommended alternative for Linux (a FlatPak package).

oblomov, (edited ) to linux
@oblomov@sociale.network avatar

Are there any decent options for ? Last time I checked the situation seemed pretty dire.

serpicojam, to apple
@serpicojam@mas.to avatar

Serious iPhone question: what's up with the speech-to-text automatic insertion of commas between subjects and verbs?

derickr, to ai
@derickr@phpc.social avatar

📽 New video: "Practical AI: Automated Subtitle Generation", in which I explain how to run OpenAI's WhisperNet locally, to automatically create subtitles for videos, at high accuracy.

https://youtu.be/FZonnnalfYc

donwatkins, to opensource
@donwatkins@fosstodon.org avatar

Interview Hack: AI saves the day(and ears) – https://www.both.org/?p=2928

eudaimon, to fdroid

acabo de descobrir una aplicació que és la canya al @fdroid . aquest missatge està pràcticament tot dictat a aquest teclat de reconeixement de veu, que té molts idiomes inclòs al català. realment pensava que el reconeixement de veu utilitzant un android totalment lliure era una quimera, però veig que no. l'única cosa que li haig d'indicar és la puntuació i alguna paraula que no és purament catalana, com "f-droid"

(edit: el programa és Sayboard, https://f-droid.org/es/packages/com.elishaazaria.sayboard/ )

Tom23, to linux French
@Tom23@pouet.chapril.org avatar

Mastodon, un proche a besoin de toi !
Adulte il a lâché le français depuis le collège.
Il est obligé de se reconvertir et bosse depuis 6 mois pour reprendre confiance en son écrit avec de gros progrès.
Restent de vraies fragilités qui vont compliquer une formation qu'il débute bientôt dans un secteur où l'écrit a une place importante.

Connais tu un logiciel gratuit qui fait du sous pour lui simplifier la vie ?

Si tu boost ou repouet, tu aides une bonne personne.

techsinger, to random

I'm sure everyone who wants to know about this already does but, just in case anyone has, particularly if or , been looking for a local method of converting speech to text ... Whisper is an ML model from OpenAI which allows doing that. It can be used accessibly with all screen readers on Windows. Obviously, this is great for those of us with impaired hearing, it is certainly far more accurate than any of the speech to text programs I've seen, needs no training, and can handle background noise quite well. The audio duration limits are set by your hard drive space and the amount of time you're willing to put into transcription, I've transcribed several hours of audio without difficulty, it just takes time. It's available on Windows using https://github.com/Softcatala/whisper-ctranslate2 which just seems to need python. A GPU makes it faster, but it's usable on an I5 CPU. The model is also available online at https://freesubtitles.ai though that requires payment or waiting for long periods to transcribe limited amounts of audio. Thanks to @Bryn for the pointer at whisper-ctranslate2.

unfa, to opensource
@unfa@mastodon.social avatar

Common Voice is a project by Mozilla to build an extensive ethically-sourced dataset of spoken word in various languages to help push forward open-source voice recognition technology like DeepVoice (also by Mozilla).

I just recorded a dozen or so sentences :)

https://commonvoice.mozilla.org/

cs, to random
@cs@mastodon.sdf.org avatar

, under what circumstances would you pick nonsense word “Hatten” over the commonly used word “hadn’t”?

gregorni, to fdroid
@gregorni@fosstodon.org avatar

I just found this speech-to-text keyboard on F-Droid (in the IzzyOnDroid repo) called Sayboard. The app is pretty bad (no dark mode!! 🤬), but the transcribing works pretty well! The app is open source, and everything is run locally on-device.

https://apt.izzysoft.de/fdroid/index/apk/com.elishaazaria.sayboard

https://github.com/ElishaAz/Sayboard

Mina, to accessibility German

() is an important part of making technology more accessible to everyone.

There are solutions, but belong to corporations and tied to license agreements, which can change at any time and hence, can not reliably be used by developers outside these corporations.

Mozilla Common Voice is aiming to change that, but they need voices (esp. female ones!) and verification for the models.

With just a few minutes of your time, you can help!

https://commonvoice.mozilla.org/

thelinuxcast, to linux
@thelinuxcast@fosstodon.org avatar
mkiol, to linuxphones

2.0 repeats what you say but in a different language!

It uses offline live speech translations.

Watch how I speak Spanish 😎

https://youtu.be/nTCwWwko7lU

mkiol, to linux

I've just released Speech Note 4.0!
New version comes with shiny new offline machine Translator and many new Text to Speech voices.

To implement the Translator I borrowed some code and models from amazing #BergamotProject and #FirefoxTranslations.

#SpeechNote is a Linux offline #SpeechToText, #TextToSpeech and #MachineTranslation app. You can download it from #Flathub

Videos:
#Linux Desktop: https://youtu.be/psRT0UPFb04
#PinePhone: https://youtu.be/kTsM3kUxE2Q
#SailfishOS: https://youtu.be/88cdPpvBmmI

image/png
image/png

igorwarneck, to linux German

Question to the and community:

Is there a working and free software for speech recognition under Linux?
I would like to dictate directly into the writing program and would not like to make a diversion via Google - i.e., preferably offline.

I would be very grateful for any tips!

doboprobodyne,
@doboprobodyne@mathstodon.xyz avatar

@downey
@igorwarneck

@geb has generously published #Numen #SpeechToText under an #OpenSource licence, and it looks very cool. I'm waiting for the day I can talk to an automated #AirTraffic controller on #Flightgear #FlightSimulator !

https://sr.ht/~geb/numen/

sakuramochi55, to random
@sakuramochi55@sakurajima.moe avatar

Daily :
If you have a video file but you don't have a subtitle for the video, and would like to create using local speech to text, you can use to generate subtitle for said video.
You have two options for speech to text engines: VOSK and OpenAI Whisper.
Both are using different types of models you can use, and CUDA computing or the good old CPU, and VOSK requires separate models for each language, and does not produce a formatted output, just raw text.
Whisper is slightly more advanced, because it uses a multilingual model by default, which you can select to translate into English, from any language in the model and the output will be formatted normally, and acronyms like GPU and such are properly capitalized in the final text.
But, as with anything, there's a catch: If you would want to utilize CUDA computing and the large model, you would need around 10 GB of VRAM, which isn't very common these days. However, you can always use the default option to use CPU compute, but that'll be around a couple hours, but in the case of a 24-min video, it'll be likely 40-50 mins to create the subtitle, which is a nice waiting game. However, once it's done, you will have a somewhat usable subtitle, exactly tied to the speech in the video. More details and info:
https://docs.kdenlive.org/en/effects_and_compositions/speech_to_text.html
https://kdenlive.org/en/download/

teachpaperless, to random

Are there any good apps that both do speech-to-text and translate at the same time? So if I, for example, would speak in language A it would both write it as text and as the same time in language B?

josschuurmans, to ai

I'm testing otter.ai

on the free plan one gets 300 minutes of real time transcription per month.

Otto can also otter can also transcribe audio that has been recorded previously I'm wondering if the 300 minutes count against such previously recorded audio

I'm also wondering if otter is the best solution the best transcription solution

#OtterAI #SpeechToText #AI #transcription #notetaking #interviewing #meetings #KnowledgeManagement #KM #productivity #curation

1/3

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • mdbf
  • ngwrru68w68
  • modclub
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • megavids
  • GTA5RPClips
  • tacticalgear
  • normalnudes
  • tester
  • osvaldo12
  • everett
  • cubers
  • ethstaker
  • anitta
  • provamag3
  • Leos
  • cisconetworking
  • lostlight
  • All magazines