If I want to transcribe every new audio file that appears in a folder on my Mac using (Mac)Whisper and then run the transcription through a LLM using Ollama to clean it up, what are my best options? Do I have to write some Python code myself or is the readymade tools for this?
Folks, I'm starting my post-#PhD job search low-key on the side while I write up my #thesis.
I have an odd collection of skills - #Linux, #Python, #Jupyter, #pandas, #DevRel, and I've done a lot of work in team leadership and management, and have led a multi-million $ not for profit in the past. Keynote speaker.
I'm looking for something that harnesses all of these skills - and it will be a senior role with senior pay, given my experience, qualifications and proven capability. I have time and will be discerning about my next step.
Job titles that might fit here would be Senior Research Engineer, Engineering Lead, Lead AI Engineer or similar.
Looking for fully remote work, with one day a fortnight max in #Melbourne, AU. If you don't believe in #RemoteWork or #WFH, we're not a good fit.
Super keen on something full time rather than splitting my attention over multiple part-time roles.
Looking to start around August, so a fair amount of lead time.
Keen on organisations that have strong values alignment - #FAIR and #CARE data use, #EthicalAI, AI for social good.
I see lots of posts here on Mastodon where people state that today's "AI" (LLMs) have no use, waste energy and are just doing copyright infringement on a vast scale.
I don't get it.
I just put together "summarize.sh" - a bit of glue between some open source and self-hosted LLMs. It takes a Youtube URL as its only parameter, and outputs a summary in text of the important parts of the spoken words in the video.
That is, I run yt-dlp, Whisper and finally Mixtral 8x7b. And I no longer need to sit through someone yapping about for a few minutes to tell me what should've been a short blog post.
Example output from a 4 minute video:
"The text describes a video tutorial on how to reset a Corsair keyboard when it's not working properly. The keyboard in question has three white flashing lights at the top and is experiencing issues with its RGB lighting and key input. To reset the keyboard, the user should unplug the USB cables from the computer, hold down the escape key, and then plug the USB cables back into the computer while still holding down the escape key. After releasing the escape key, the keyboard's lights should flash, indicating that it has been reset. The tutorial notes that this method has worked for other Corsair keyboards as well."
Chcecie wiedzieć jak świetny może być #LLM w ułatwieniach dostępu? Proszę bardzo!
Od teraz wszystkie nagrania #podcast@kolonas na #PeerTube (@Tube od @ftdl) oraz #YouTube posiadają napisy stworzone przez silnik #Whisper (model Large) od #OpenAI bez ingerencji człowieka*!
Ze względów technicznych, w przyszłych nagraniach podcastu, napisy będą publikowane z opóźnieniem, ale przynajmniej będą ;)
*z wyjątkiem usunięcia jednego słowa w odcinku nr 4 :)
Was in dieser Form amüsant ist und allenfalls zu Diskussionen über #KI und Urheberrecht angeregt, kann inhaltlich zu unvollständigen oder gar falschen Ergebnissen führen – das gilt gerade auch bei Transkriptionen aus dem Schweizerdeutschen.
Today I used #Emacs Lisp to parse Deepgram's #speech recognition JSON output with utterances, punctuation, and smart format turned on and the #Whisper Large model selected. I turned the words array into a VTT subtitle file with speaker identification (handy for EmacsConf Q&A) and captions limited to roughly 45 characters with punctuation preferred for splitting. It's way faster than waiting for a CPU-only computer to run Whisper Large on the files. Looking forward to experimenting with this for my personal braindumping too.
Whisper: I don’t smell meat.
Dexy: No cheese detected either.
Whisper: What is he eating?
Dexy: The container lid says hummus.
Whisper: What’s that, some kind of ham?
Dexy: I’m afraid not.
Whisper: Wait, you can read!?
Dexy: You can’t?
What happens when you have #OpenAI’s #Whisper running for >24 hours on your server, transcribing all your voice memos since 2018.
(I record a loot of audios, but so far, using the large model, it barely got to April 2018. Now rebooting the server and configuring Whisper to run for each audio within a for loop invoked by a bash script, rather than going through all the audios itself. This should make things better)
'Subtitle' is an open-source AI-powered caption generation project that utilizes machine learning algorithms and natural language processing techniques to generate accurate and natural-sounding captions for videos in various languages.
The tool can run on your own server for enhanced control and privacy and utilizes the open-source #Whisper ASR model for high-quality speech recognition.
How accurate is transcribing with openAI’s #Whisper technology? I tried it on myself but now want to try it on something more substantial. anybody want to share experiences?
I'm sure everyone who wants to know about this already does but, just in case anyone has, particularly if #blind or #DeafBlind, been looking for a local method of converting speech to text ... Whisper is an ML model from OpenAI which allows doing that. It can be used accessibly with all screen readers on Windows. Obviously, this is great for those of us with impaired hearing, it is certainly far more accurate than any of the speech to text programs I've seen, needs no training, and can handle background noise quite well. The audio duration limits are set by your hard drive space and the amount of time you're willing to put into transcription, I've transcribed several hours of audio without difficulty, it just takes time. It's available on Windows using https://github.com/Softcatala/whisper-ctranslate2 which just seems to need python. A GPU makes it faster, but it's usable on an I5 CPU. The model is also available online at https://freesubtitles.ai though that requires payment or waiting for long periods to transcribe limited amounts of audio. Thanks to @Bryn for the pointer at whisper-ctranslate2. #whisper#SpeechToText
Ich muss zur Zeit ca. 300 Minuten Podcast-Interview transkribieren. Mein Account ist aufgebraucht. Hat jemand noch einen Tipp, zB für einen #Whisper-Installation, die ich nutzen könne?
For folks who work with #ASR#SpeechRecognition, specifically #Whisper from #OpenAI - I have heard some anecdotal evidence of transcription with the medium-en model returning paragraphs of "junk" content, like weather reports and adverts for golfing supplies.
I have three confirmed reports from transcripts of interviews of unrelated topics, and am curious if there are other (as yet unreported) instances of similar?
"Though we haven’t even spoken, still I sense there’s a rapport"... Now that @Arotrios has opened the door for Mr Sandman to enter, with the exuberant 'Bueno', here's something for a little later in the evening... you walked a few blocks through the pouring rain to get here tonight because you heard that there would be a...
Hiya mastodonters and good Sunday to you.
Does anyone know of an open source and free (as in freedom) transcriber audio to text (the audio is from a video of a video call)?
If it's offline it's even better for data care, if that makes sense.
Thanks in advance for your toots & answers.
Wherever you are take care of yourself. 🧡💚❤
Morphine - Whisper - (live in Europe 1 studio, c1995) (m.youtube.com)
"Though we haven’t even spoken, still I sense there’s a rapport"... Now that @Arotrios has opened the door for Mr Sandman to enter, with the exuberant 'Bueno', here's something for a little later in the evening... you walked a few blocks through the pouring rain to get here tonight because you heard that there would be a...