anders, to ai
@anders@thoresson.social avatar

If I want to transcribe every new audio file that appears in a folder on my Mac using (Mac)Whisper and then run the transcription through a LLM using Ollama to clean it up, what are my best options? Do I have to write some Python code myself or is the readymade tools for this?

KathyReid, to philosophy
@KathyReid@aus.social avatar

Folks, I'm starting my post- job search low-key on the side while I write up my .

I have an odd collection of skills - , , , , , and I've done a lot of work in team leadership and management, and have led a multi-million $ not for profit in the past. Keynote speaker.

My speciality is and AI, more on the side with models like .

I'm looking for something that harnesses all of these skills - and it will be a senior role with senior pay, given my experience, qualifications and proven capability. I have time and will be discerning about my next step.

Job titles that might fit here would be Senior Research Engineer, Engineering Lead, Lead AI Engineer or similar.

Looking for fully remote work, with one day a fortnight max in , AU. If you don't believe in or , we're not a good fit.

Super keen on something full time rather than splitting my attention over multiple part-time roles.

Looking to start around August, so a fair amount of lead time.

Keen on organisations that have strong values alignment - and data use, , AI for social good.

No crypto, no web3, no deepfake stuff.

Check out my LinkedIn for more info on my background:
https://www.linkedin.com/in/kathyreid/

troed, to llm
@troed@ioc.exchange avatar

I see lots of posts here on Mastodon where people state that today's "AI" (LLMs) have no use, waste energy and are just doing copyright infringement on a vast scale.

I don't get it.

I just put together "summarize.sh" - a bit of glue between some open source and self-hosted LLMs. It takes a Youtube URL as its only parameter, and outputs a summary in text of the important parts of the spoken words in the video.

That is, I run yt-dlp, Whisper and finally Mixtral 8x7b. And I no longer need to sit through someone yapping about for a few minutes to tell me what should've been a short blog post.

Example output from a 4 minute video:

"The text describes a video tutorial on how to reset a Corsair keyboard when it's not working properly. The keyboard in question has three white flashing lights at the top and is experiencing issues with its RGB lighting and key input. To reset the keyboard, the user should unplug the USB cables from the computer, hold down the escape key, and then plug the USB cables back into the computer while still holding down the escape key. After releasing the escape key, the keyboard's lights should flash, indicating that it has been reset. The tutorial notes that this method has worked for other Corsair keyboards as well."

How is this not a great thing to have?

mstankiewicz, to llm Polish
@mstankiewicz@pol.social avatar

Chcecie wiedzieć jak świetny może być #LLM w ułatwieniach dostępu? Proszę bardzo!
Od teraz wszystkie nagrania #podcast @kolonas na #PeerTube (@Tube od @ftdl) oraz #YouTube posiadają napisy stworzone przez silnik #Whisper (model Large) od #OpenAI bez ingerencji człowieka*!
Ze względów technicznych, w przyszłych nagraniach podcastu, napisy będą publikowane z opóźnieniem, ale przynajmniej będą ;)

*z wyjątkiem usunięcia jednego słowa w odcinku nr 4 :)

#AI #OzN #niepelnosprawność

brie, to til

Oh. that @Jami , with and , has plugin, to real-time generate in conversations. Great functionality for people with impaired , or just in noisy environment.
Extensions

ChrisChaffin, to iPhone
@ChrisChaffin@dragonscave.space avatar

New Accessible iPhone AI Memo App. The app is called Whisper Memos, and this app lets you record a quick memo, and once your done, it will transcribe it, and then email the results to you. The app is free to download and is totally accessible. I heard about the app on Jonathan Mosens Living Blindfully podcast. Here is the app store link. https://apps.apple.com/us/app/whisper-memos-speech-to-text/id6443658039.

martinsteiger, to OpenAI German
@martinsteiger@chaos.social avatar

«Untertitel von Stephanie Geiges» im KI-generierten ? 🤯

Solche KI-typischen Halluzinationen kennen vermutlich die meisten Nutzer von , dem -Modell für :

https://steigerlegal.ch/2024/02/11/transkription-ai-ki-whisper-halluzinationen/

Was in dieser Form amüsant ist und allenfalls zu Diskussionen über und Urheberrecht angeregt, kann inhaltlich zu unvollständigen oder gar falschen Ergebnissen führen – das gilt gerade auch bei Transkriptionen aus dem Schweizerdeutschen.

bdiederik, to NoStupidQuestions
@bdiederik@mastodon.online avatar

does some one has a 5 and i want to know what the responce time is for the base-int8 and small-int8 model in dutch?

donwatkins, to opensource
@donwatkins@fosstodon.org avatar

Interview Hack: AI saves the day(and ears) – https://www.both.org/?p=2928

sachac, to emacs
@sachac@emacs.ch avatar

Today I used Lisp to parse Deepgram's recognition JSON output with utterances, punctuation, and smart format turned on and the Large model selected. I turned the words array into a VTT subtitle file with speaker identification (handy for EmacsConf Q&A) and captions limited to roughly 45 characters with punctuation preferred for splitting. It's way faster than waiting for a CPU-only computer to run Whisper Large on the files. Looking forward to experimenting with this for my personal braindumping too.

thecoffemaker, to bot Spanish
@thecoffemaker@rebel.ar avatar
adtothebone, to animals

Whisper: I don’t smell meat.
Dexy: No cheese detected either.
Whisper: What is he eating?
Dexy: The container lid says hummus.
Whisper: What’s that, some kind of ham?
Dexy: I’m afraid not.
Whisper: Wait, you can read!?
Dexy: You can’t?

tommi, to OpenAI
@tommi@pan.rent avatar

What happens when you have ’s running for >24 hours on your server, transcribing all your voice memos since 2018.

(I record a loot of audios, but so far, using the large model, it barely got to April 2018. Now rebooting the server and configuring Whisper to run for each audio within a for loop invoked by a bash script, rather than going through all the audios itself. This should make things better)

ixi, to accessibility
@ixi@mastodon.online avatar

'Subtitle' is an open-source AI-powered caption generation project that utilizes machine learning algorithms and natural language processing techniques to generate accurate and natural-sounding captions for videos in various languages.

The tool can run on your own server for enhanced control and privacy and utilizes the open-source ASR model for high-quality speech recognition.

https://github.com/innovatorved/subtitle

sillyblindharper, to random

How accurate is transcribing with openAI’s technology? I tried it on myself but now want to try it on something more substantial. anybody want to share experiences?

pvagner,

@sillyblindharper It's deepl.com. At least I think this is what @Piciok mentioned.
More about . I like the fact it can all be run locally.

bruienne, to random

Holy crap, this is a pretty awesome new feature in 4.3.0: realtime audio transcription using 🤯

eumrz, to random Spanish

No dejo de sorprenderme con el funcionamiento de MacWhisper https://goodsnooze.gumroad.com/l/macwhisper de @jordibruin. Estoy probando a transcribir varios vídeos en bloque y va como la seda. Gracias por esta maravilla, Jordi, muchas gracias.

techsinger, to random

I'm sure everyone who wants to know about this already does but, just in case anyone has, particularly if or , been looking for a local method of converting speech to text ... Whisper is an ML model from OpenAI which allows doing that. It can be used accessibly with all screen readers on Windows. Obviously, this is great for those of us with impaired hearing, it is certainly far more accurate than any of the speech to text programs I've seen, needs no training, and can handle background noise quite well. The audio duration limits are set by your hard drive space and the amount of time you're willing to put into transcription, I've transcribed several hours of audio without difficulty, it just takes time. It's available on Windows using https://github.com/Softcatala/whisper-ctranslate2 which just seems to need python. A GPU makes it faster, but it's usable on an I5 CPU. The model is also available online at https://freesubtitles.ai though that requires payment or waiting for long periods to transcribe limited amounts of audio. Thanks to @Bryn for the pointer at whisper-ctranslate2.

Ruhrnalist, to random German
@Ruhrnalist@mastodon.social avatar

Ich muss zur Zeit ca. 300 Minuten Podcast-Interview transkribieren. Mein Account ist aufgebraucht. Hat jemand noch einen Tipp, zB für einen -Installation, die ich nutzen könne?

kenkousen, to Java

Deep dive into audio-to-text transcription with Java using OpenAI's Whisper API
https://youtu.be/ZeH3bBKdqRU

kenkousen, to Java

Tales from the jar side: Live Stream with Craig Walls, Java and Whisper AI, Java 21 released, More theme songs, and the usual silly toots and skeets
https://open.substack.com/pub/kenkousen/p/tales-from-the-jar-side-live-stream-c1b?r=2dwq5&utm_campaign=post&utm_medium=web

KathyReid, to OpenAI
@KathyReid@aus.social avatar

For folks who work with , specifically from - I have heard some anecdotal evidence of transcription with the medium-en model returning paragraphs of "junk" content, like weather reports and adverts for golfing supplies.

I have three confirmed reports from transcripts of interviews of unrelated topics, and am curious if there are other (as yet unreported) instances of similar?

If so, please let me know - DM for email address.

Boosts appreciated.

Morphine - Whisper - (live in Europe 1 studio, c1995) (m.youtube.com)

"Though we haven’t even spoken, still I sense there’s a rapport"... Now that @Arotrios has opened the door for Mr Sandman to enter, with the exuberant 'Bueno', here's something for a little later in the evening... you walked a few blocks through the pouring rain to get here tonight because you heard that there would be a...

globcoco, to random French
@globcoco@mamot.fr avatar

Hiya mastodonters and good Sunday to you.
Does anyone know of an open source and free (as in freedom) transcriber audio to text (the audio is from a video of a video call)?
If it's offline it's even better for data care, if that makes sense.
Thanks in advance for your toots & answers.
Wherever you are take care of yourself. 🧡💚❤

adelgado,
@adelgado@eu.mastodon.green avatar

@globcoco @kallekn Whisper https://github.com/openai/whisper is FOSS but not the data it was trained with. It's part of OpenAI. And I have used this implementation https://github.com/Nikorasu/LiveWhisper to transcribe Finnish phone calls. It's a python app that runs on a terminal, so not super user friendly yet but you run it and it listen to your mic and transcribe what it hears and optionally translate it

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • kavyap
  • DreamBathrooms
  • tacticalgear
  • mdbf
  • InstantRegret
  • magazineikmin
  • Youngstown
  • thenastyranch
  • rosin
  • slotface
  • Durango
  • cubers
  • ngwrru68w68
  • anitta
  • cisconetworking
  • GTA5RPClips
  • modclub
  • tester
  • khanakhh
  • everett
  • provamag3
  • osvaldo12
  • Leos
  • normalnudes
  • ethstaker
  • megavids
  • lostlight
  • All magazines