If you saw the example footage of #Sora recently and thought, "Pfft, it's just #AIvideo garbage. Can't do sound, narration, and all the other stuff you need, so there!"
You can do all of that, VERY easily now. 15 minutes and I was able to get #aimusic from Google music labs. The narration is my voice reading a script #GPT4 made for me and then Speech2Speech being used on #ElevenLabs to get some star power.
Folks, give me two hours and I could get you a 10 minute doc.
I have always said that the digital transformation will lead to a new look at our history and culture. But I never thought that this view would be so fictional.
This is Sora by OpenAI: an AI model to create videos from text with the ability to generate realistic scenes and adhere to prompts for up to a minute. https://openai.com/sora #sora#AI#VideoCreation
We have already experienced the lifelike AI voice imitation such as #ElevenLabs. Now the video generation #Sora (still as a silent film). Will we next be able to put a text in the person's mouth and they will speak lip-synchronised? And then chatGPT, which already has a voice front end, will become a convincing video assistant? How much longer will it take? #HeyGen#transformermodels@pallenberg
It was one year ago today that I saw a post from @BorrisInABox that introduced me to #ElevenLabs. Of course I played with it a ton that day, as did a bunch of other people.
Two fake-audio experts say that the deepfake robocall of #PresidentBiden received by some #voters last week was likely created with technology from Silicon Valley’s favorite voice cloning startup.
I'm practicing my #German using #ElevenLabs. OK not really, I dubbed a tutorial I made for the Komplete Kontrol WhatsApp group, in which I demonstrate how to rescan libraries using VOCR. Hearing Kate (the TTS engine I use) speaking German and getting quite excited in-places is quite the amusing experience to say the least.
I made myself say many many utterly ridiculous things by taking a recording of hailstones hitting a microphone, running that through #ElevenLabs speech to speech with my voice as an output, then the almost mumble-language it produced dubbed into English, again using my voice as the output, to get... Whatever this is. I literally have no idea.
I use Eleven Labs to read my writing out loud to me in a natural voice, and I noticed a new feature today: Speech to Speech.
You can record or upload audio, and it will create new audio of what you said using one of its generated voices. It uses your intonations and even does laughter. Here's an 8-second example that turned me into "Adam."
The only advantage I can think of for this over plain AI voices is that it can do a wider range of emotions. What else?
So yes, it's that time again. For those who know me well, you know what's coming. For those who don't…on Thanksgiving of 2005 I made this awful thing, and no, I'm pretty sure I do not have the multitracks still. It's very short, and not so sweet, and well you'll just have to hear it. What was I thinking? Even I don't know!
In related news, I ran this very file through #ElevenLabs speech to speech yesterday, using the Adam voice, just to see what it'd do. Here's what Adam had to say on the subject.
Never, ever use #elevenlabs dubbing feature to translate ASMR from another language to English. I may never sleep again! Just endless creepy whispery voices telling me "This is Lee Hyuk Young from NBC news", strange demonic moaning, aliens thanking me for watching, and saying sorry. This started as someone whisper reading in another language I don't speak (because I find that relaxing), and I think whatever the heck she was reading just makes it worse somehow. Something about decapitated heads, trains, and raccoons? Then again, I don't think ElevenLabs can translate whispers at all, so this could have been anything.
cursed fact: Adobe Podcast's "Enhance AI", a tool for noise removal and voice boosting, firmly believes that any audio you give it must have human speech.
If you upload, say, vocal-free chiptunes playing on a Game Boy, it will find the speech.
@lazerwalker That's why I gave it the sound of a flushing toilet and of the dishwasher. Ran that through #ElevenLabs dub feature and told it to output English. The result is... Well, embedded in this weirdness I created:
A Very Freaky Dream: https://youtu.be/7h1Ock4XRa4
I recorded an English version of my voice back in February or so. I just ran a bunch of nonsense through the multilingual model, and what comes out is essentially a very, very angry person from 't Gooi, the Netherlands. That sounds, kinda sorta like this #Dutch#elevenLabs#silly
Okay, so apparently last night when I first, thanks to the new #ElevenLabs dubbing feature, heard the Radio Station WWV announcer guy say, "Sneezing time of day," I must have been too sleepy to really take it in. This morning? Couldn't stop laughing. @BorrisInABox@cordova5029
Okay, so #ElevenLabs has a new dubbing feature. Of course I had to try it, and I tried it with a certain German song that a few of my friends know very well, especially this version. I'm not sure, but I think this version is even funnier than the original! @BorrisInABox@cordova5029
The other day, I was configuring a new Windows 11 PC. It made me angry. I shouted at it for a while, which felt great at the time, but which totally screwed up my voice, so I recorded this.
I ran a recording of some kittens talking to their mother through Whisper. The result was in another language, so I ran it through Google Translate. What I got in response was mildly creepy. Here it is in full:
Again? It's you. It's you again. What are you going to do? You?
@bryansmart I’ve been giving thought to that extremely garbled audio problem that I was demonstrating the other day on RVC.
I even tried reformatting the system to see if that would fix it but sadly no. It wouldn’t be a problem with any of the encoders that I use on the system would it? the program needs ffmpg and a few other things so I’m not sure.
I know this probably nothing in it but I'm just at a complete loss.
I've been playing a bit tonight with #ElevenLabs Projects, the new alpha feature that lets you create and render long form projects. I've been making documents in Markdown, and using Pandoc to convert them to Epub for upload into the Projects system. Note that it seems to support only a very limited set of elements, which I can get into more detail on later if anyone's interested. Anyway, here's a story I posted a few months back, Asm's Story. It's about an evil robot who eats the entire population of a small planet, then finds friendship and defends herself from the evil scientists who created her. At the very end, you'll hear an announcement not heard since the early to mid 90's, if at all.
So this is almost perfect…but not quite. Here, I have a randomly-generated #ElevenLabs voice read a piece of text with which some of you will no doubt be very familiar. It gets all the pauses correct…except it misses one that really needs to be there. I wish there was a way we could force it to put a pause in a particular place.
list with 1 item list with 1 item list with 1 item list with 1 item list with 1 item list with 1 item list with 1 item list with 1 item list item level 8 button
@TheQuinbox What the actual? This just happened when I ran that, and some additional text through #ElevenLabs English V2, my voice, sliders at 0, 1 and 1.
Pay particular attention to the ending.
Here's my professionally cloned #ElevenLabs voice, using the English v2 model, reading the first part of a story generated by GPT4. For some reason they only gave me 1000 characters so it cut off, but still, you can hear the difference.
#ElevenLabs is testing a new alpha feature which they're calling 'English V2' and in this demo I'm testing it with my own voice.
I fed the exact clip you hear to Eleven Labs both as audio and made a new voice from that clip, so the first recording you hear is the real me, and the second is the English V2 me.
If you want to try this on your own account, go here: https://elevenlabs.io/request-projects-access