#Whisper - kbin.social

anders, 3 days ago to ai

If I want to transcribe every new audio file that appears in a folder on my Mac using (Mac)Whisper and then run the transcription through a LLM using Ollama to clean it up, what are my best options? Do I have to write some Python code myself or is the readymade tools for this?

#obsidian #pkm #ai #python #whisper

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 19 days ago to philosophy

Folks, I'm starting my post-#PhD job search low-key on the side while I write up my #thesis.

I have an odd collection of skills - #Linux, #Python, #Jupyter, #pandas, #DevRel, and I've done a lot of work in team leadership and management, and have led a multi-million $ not for profit in the past. Keynote speaker.

My speciality is #voice and #speech AI, more on the #ASR side with models like #Whisper.

I'm looking for something that harnesses all of these skills - and it will be a senior role with senior pay, given my experience, qualifications and proven capability. I have time and will be discerning about my next step.

Job titles that might fit here would be Senior Research Engineer, Engineering Lead, Lead AI Engineer or similar.

Looking for fully remote work, with one day a fortnight max in #Melbourne, AU. If you don't believe in #RemoteWork or #WFH, we're not a good fit.

Super keen on something full time rather than splitting my attention over multiple part-time roles.

Looking to start around August, so a fair amount of lead time.

Keen on organisations that have strong values alignment - #FAIR and #CARE data use, #EthicalAI, AI for social good.

No crypto, no web3, no deepfake stuff.

Check out my LinkedIn for more info on my background:
https://www.linkedin.com/in/kathyreid/

#FediHired #FediJobs #GetFediHired

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ decryption, andrewfeeney, Girgias, grmpyprogrammer +1 more

troed, 1 month ago to llm

I see lots of posts here on Mastodon where people state that today's "AI" (LLMs) have no use, waste energy and are just doing copyright infringement on a vast scale.

I don't get it.

I just put together "summarize.sh" - a bit of glue between some open source and self-hosted LLMs. It takes a Youtube URL as its only parameter, and outputs a summary in text of the important parts of the spoken words in the video.

That is, I run yt-dlp, Whisper and finally Mixtral 8x7b. And I no longer need to sit through someone yapping about for a few minutes to tell me what should've been a short blog post.

Example output from a 4 minute video:

"The text describes a video tutorial on how to reset a Corsair keyboard when it's not working properly. The keyboard in question has three white flashing lights at the top and is experiencing issues with its RGB lighting and key input. To reset the keyboard, the user should unplug the USB cables from the computer, hold down the escape key, and then plug the USB cables back into the computer while still holding down the escape key. After releasing the escape key, the keyboard's lights should flash, indicating that it has been reset. The tutorial notes that this method has worked for other Corsair keyboards as well."

How is this not a great thing to have?

#LLM #AI #Whisper #Mixtral

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

mstankiewicz, 2 months ago to llm Polish

Chcecie wiedzieć jak świetny może być #LLM w ułatwieniach dostępu? Proszę bardzo!
Od teraz wszystkie nagrania #podcast @kolonas na #PeerTube (@Tube od @ftdl) oraz #YouTube posiadają napisy stworzone przez silnik #Whisper (model Large) od #OpenAI bez ingerencji człowieka*!
Ze względów technicznych, w przyszłych nagraniach podcastu, napisy będą publikowane z opóźnieniem, ale przynajmniej będą ;)

*z wyjątkiem usunięcia jednego słowa w odcinku nr 4 :)

#AI #OzN #niepelnosprawność

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ 74

brie, 2 months ago to til

Oh. #TIL that @Jami , #serverless #IM with #voice and #video #chat, has #whisper plugin, to real-time generate #subtitles in conversations. Great functionality for people with impaired #hearing, or just in noisy environment.
Extensions

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ 74

ChrisChaffin, 3 months ago to iPhone

New Accessible iPhone AI Memo App. The app is called Whisper Memos, and this app lets you record a quick memo, and once your done, it will transcribe it, and then email the results to you. The app is free to download and is totally accessible. I heard about the app on Jonathan Mosens Living Blindfully podcast. Here is the app store link. https://apps.apple.com/us/app/whisper-memos-speech-to-text/id6443658039. #iPhone #App #iOS #Whisper #AI #Transcription #Email #Free #Accessible #Blind

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ JeffBishop, weirdwriter, acdha, ppatel

martinsteiger, 3 months ago to OpenAI German

«Untertitel von Stephanie Geiges» im KI-generierten #Transkript? 🤯

Solche KI-typischen Halluzinationen kennen vermutlich die meisten Nutzer von #Whisper, dem #OpenAI-Modell für #Spracherkennung:

https://steigerlegal.ch/2024/02/11/transkription-ai-ki-whisper-halluzinationen/

Was in dieser Form amüsant ist und allenfalls zu Diskussionen über #KI und Urheberrecht angeregt, kann inhaltlich zu unvollständigen oder gar falschen Ergebnissen führen – das gilt gerade auch bei Transkriptionen aus dem Schweizerdeutschen.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ CerstinMahlow

bdiederik, 5 months ago to NoStupidQuestions

#question does some one has a #raspberrypi 5 and #homeassistant i want to know what the #whisper responce time is for the base-int8 and small-int8 model in dutch?

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ peter

donwatkins, 5 months ago to opensource

Interview Hack: AI saves the day(and ears) – #OpenSource #AI #Whisper #SpeechToText https://www.both.org/?p=2928

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sachac, 5 months ago to emacs

Today I used #Emacs Lisp to parse Deepgram's #speech recognition JSON output with utterances, punctuation, and smart format turned on and the #Whisper Large model selected. I turned the words array into a VTT subtitle file with speaker identification (handy for EmacsConf Q&A) and captions limited to roughly 45 characters with punctuation preferred for splitting. It's way faster than waiting for a CPU-only computer to run Whisper Large on the files. Looking forward to experimenting with this for my personal braindumping too.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ fsf

thecoffemaker, 5 months ago to bot Spanish

Johnny5 - #XMPP #Whisper #bot
https://cyberdelia.com.ar/johnny5-xmpp-whisper-bot.html

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

adtothebone, 5 months ago to animals

Whisper: I don’t smell meat.
Dexy: No cheese detected either.
Whisper: What is he eating?
Dexy: The container lid says hummus.
Whisper: What’s that, some kind of ham?
Dexy: I’m afraid not.
Whisper: Wait, you can read!?
Dexy: You can’t?

#caturday #dexyduchessofthemist #whisperthewonderpuss #dexy #whisper #catsofmastodon

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ smooz

tommi, 5 months ago to OpenAI

What happens when you have #OpenAI’s #Whisper running for >24 hours on your server, transcribing all your voice memos since 2018.

(I record a loot of audios, but so far, using the large model, it barely got to April 2018. Now rebooting the server and configuring Whisper to run for each audio within a for loop invoked by a bash script, rather than going through all the audios itself. This should make things better)

#WhisperAI

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

ixi, 6 months ago to accessibility

'Subtitle' is an open-source AI-powered caption generation project that utilizes machine learning algorithms and natural language processing techniques to generate accurate and natural-sounding captions for videos in various languages.

The tool can run on your own server for enhanced control and privacy and utilizes the open-source #Whisper ASR model for high-quality speech recognition.

https://github.com/innovatorved/subtitle

#Subtitles #Subtitle #CaptionGenerator #Accessibility #Translations #Captions

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sillyblindharper, 6 months ago to random

How accurate is transcribing with openAI’s #Whisper technology? I tried it on myself but now want to try it on something more substantial. anybody want to share experiences?

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pvagner, 6 months ago

@sillyblindharper It's deepl.com. At least I think this is what @Piciok mentioned.
More about #whisper. I like the fact it can all be run locally.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bruienne, 6 months ago to random

Holy crap, this is a pretty awesome new feature in #AudioHijack 4.3.0: realtime audio transcription using #Whisper 🤯

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

eumrz, 6 months ago to random Spanish

No dejo de sorprenderme con el funcionamiento de MacWhisper https://goodsnooze.gumroad.com/l/macwhisper de @jordibruin. Estoy probando a transcribir varios vídeos en bloque y va como la seda. Gracias por esta maravilla, Jordi, muchas gracias. #MacWhisper #Whisper

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ juandesant

techsinger, 7 months ago to random

I'm sure everyone who wants to know about this already does but, just in case anyone has, particularly if #blind or #DeafBlind, been looking for a local method of converting speech to text ... Whisper is an ML model from OpenAI which allows doing that. It can be used accessibly with all screen readers on Windows. Obviously, this is great for those of us with impaired hearing, it is certainly far more accurate than any of the speech to text programs I've seen, needs no training, and can handle background noise quite well. The audio duration limits are set by your hard drive space and the amount of time you're willing to put into transcription, I've transcribed several hours of audio without difficulty, it just takes time. It's available on Windows using https://github.com/Softcatala/whisper-ctranslate2 which just seems to need python. A GPU makes it faster, but it's usable on an I5 CPU. The model is also available online at https://freesubtitles.ai though that requires payment or waiting for long periods to transcribe limited amounts of audio. Thanks to @Bryn for the pointer at whisper-ctranslate2. #whisper #SpeechToText

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Rasta, Binder, datajake1999, objectinspace +1 more

Ruhrnalist, 7 months ago to random German

Ich muss zur Zeit ca. 300 Minuten Podcast-Interview transkribieren. Mein Account ist aufgebraucht. Hat jemand noch einen Tipp, zB für einen #Whisper-Installation, die ich nutzen könne?

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

kenkousen, 8 months ago to Java

Deep dive into audio-to-text transcription with Java using OpenAI's Whisper API
https://youtu.be/ZeH3bBKdqRU
#java #openai #whisper

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kenkousen, 8 months ago to Java

Tales from the jar side: Live Stream with Craig Walls, Java and Whisper AI, Java 21 released, More theme songs, and the usual silly toots and skeets
https://open.substack.com/pub/kenkousen/p/tales-from-the-jar-side-live-stream-c1b?r=2dwq5&utm_campaign=post&utm_medium=web
#java #whisper #talesfromthejarside

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 8 months ago to OpenAI

For folks who work with #ASR #SpeechRecognition, specifically #Whisper from #OpenAI - I have heard some anecdotal evidence of transcription with the medium-en model returning paragraphs of "junk" content, like weather reports and adverts for golfing supplies.

I have three confirmed reports from transcripts of interviews of unrelated topics, and am curious if there are other (as yet unreported) instances of similar?

If so, please let me know - DM for email address.

Boosts appreciated.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Paxxi

Morphine - Whisper - (live in Europe 1 studio, c1995) (m.youtube.com)

"Though we haven’t even spoken, still I sense there’s a rapport"... Now that @Arotrios has opened the door for Mr Sandman to enter, with the exuberant 'Bueno', here's something for a little later in the evening... you walked a few blocks through the pouring rain to get here tonight because you heard that there would be a...

globcoco, 9 months ago to random French

Hiya mastodonters and good Sunday to you.
Does anyone know of an open source and free (as in freedom) transcriber audio to text (the audio is from a video of a video call)?
If it's offline it's even better for data care, if that makes sense.
Thanks in advance for your toots & answers.
Wherever you are take care of yourself. 🧡💚❤

reply

expand (19)

collapse (19)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ elduvelle, doug, m0bi13, Loukas

adelgado, 8 months ago

@globcoco @kallekn Whisper https://github.com/openai/whisper is FOSS but not the data it was trained with. It's part of OpenAI. And I have used this implementation https://github.com/Nikorasu/LiveWhisper to transcribe Finnish phone calls. It's a python app that runs on a terminal, so not super user friendly yet but you run it and it listen to your mic and transcribe what it hears and optionally translate it
#Whisper #LiveWhisper #OpenAI #Transcription #VoiceToText #Translate #Language

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...