ZBennoui

@ZBennoui@dragonscave.space

I'm a blind guy interested in all things music, technology, and Machine Learning. I'm particularly passionate about speech synthesis and music production. College grad, attended Berklee College of Music.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

ZBennoui, to random

What the hell is wrong with Microsoft??? Who thought Recall was a good idea? I'm all for AI in computing but this is insane.

ZBennoui, to random

Ok, I totally understand that no one here might have the answers, but does anyone know if Cerence has released any new voices in the past year? Apple disclosed that new voices are coming to VoiceOver, and I'm trying to understand if they mean more stuff from Cerence or new voices Apple has built in house. Personally I hope it's new stuff from Apple, I'd love for them to get back into the TTS game as Alex is still one of the best voices currently available in my opinion.

ZBennoui,

@miki I'm sure the only reason Eloquence got included is because Cerence now owns it and Apple has an ongoing deal with them to provide TTS in their products. Apple has a lot of stuff themselves, mostly the Siri voices but also Alex and some of the older Macintalk stuff as well. I'm going to assume whatever they're doing is based on machine learning, seems they have a new engine optimized for that because the American Siri voices are using it. I guess we'll just have to wait and see.

ZBennoui,

@miki Just speculating here, but several months ago they released a research paper about longform speech synthesis and how a lot of users preferred that style of speech over traditional approaches. I wonder if what they're doing could be based on that concept, could be great for reading books and so on.

ZBennoui,

@miki That would be a a pretty big deal if they got that to run on device, traditionally systems like that require a lot of beefy hardware. Regardless we don't have that long to wait, only about three weeks and will see for ourselves. Really exciting stuff.

masukomi, to random
@masukomi@connectified.com avatar

Audible started allowing “AI” generated audiobooks ( > 40,000 so far ) but provides no mechanism to filter them out.

Voice actors bring stories to life. Computer voices destroy emotional moments and massacre the sentiment with which phrases are expressed.

This is absolutely terrible

ZBennoui,

@masukomi I don't think I would go that far, as with everything there's nuance, the Voice/TTS service you pick will have a massive effect on how the book will end up coming across. Overall though I agree, audible is for audiobooks, if I want AI generated stuff I will use a TTS service.

ZBennoui, to random

Goodbye Logic Pro X, you gave me a great nine years. Let's hope 11 is much better.

ZBennoui,

@zersiax There were some images floating around on Reddit that Apple published a few days ago and some people are thinking they may have done a redesign. If that's the case, it's likely it will be accessible because the iPad version is currently much better than the desktop in many areas.

ZBennoui, to random

So over the last several months, I've been looking at all of these AI generated music services. Suno, Udio, and now the new one from Eleven. As someone who generally has a pretty positive outlook on machine learning/AI, I think these tools are really interesting and a great way to help people who don't have any musical ability make personalized tracks. However, I take issue with how these systems are trained. It's pretty much been confirmed that Udio is trained on vast amounts of copyrighted material, very likely without consent considering how new the company is. With Suno it's hard for me to tell, but others have theorized that there's copyrighted stuff in there as well. These companies are telling you that you are allowed to use whatever you generate for commercial purposes, but I fail to see how they have the right to do so. I wonder if artists are even aware that their songs could potentially be included in these models, and honestly just the whole ethos of these companies is disgusting. What gives them the right to scrape massive amounts of copyrighted material these people spent a crazy amount of time on, just to dump it into a model that can generate whatever you want based on a simple text prompt? Let me be clear that I have no problem with the Technology itself, I think it's really cool, but the only reason it's able to sound as good as it does is because they are training it on a lot of music that they don't have the rights for. Take a look at Stable Audio if you want to see whats possible with just licensed royalty-free tracks, spoiler it's nowhere near as good. Some of that could of course be due to the architecture, but more likely it's the data they had access to while training. I wonder what Eleven used to train their models, but considering how clean the results are, I suspect they got custom multi-tracks from whoever they decided to work with. I'm personally far more excited about what Apple is doing in this space with the new session players in the upcoming Logic updates, and I hope this will be the path forward rather than massive audio generation models trained on unlicensed material.

ZBennoui,

@miki Yes, this makes sense from a logical perspective, as a musician/producer myself I totally get the sentiment. The difference with AI/ML that a lot of people who only work in tech don't fully understand is that these models are actually listening to copyrighted audio files in order to train. While a human musician would be doing a similar thing, they are not then taking whatever information they've learned directly from that audio and "remixing" it for lack of a better term. These models are trying to replicate the human brain, and technically the output they produce is synthesized, however the way these diffusion models work will output bits of the training data, so it is still technically producing copyrighted material even if in a different form. Whether that counts as fair use remains to be seen, but personally I don't see whatever arguments they're going to make holding up in court for very long.

ZBennoui,

@miki Yeah I agree. Let me be totally clear that I have zero problem with AI itself, and this audio generation stuff is extremely cool and innovative. Really my only issue is with companies like Udio who think it's OK to train their models on other people's work without consent. I certainly don't think it should be outlawed or anything like that, but companies who get away with scraping vast amounts of copyrighted data should be held accountable in some way. Whether that's monetary or otherwise I have no idea, but a solution will need to be found at some point.

ZBennoui,

@miki But that's the thing, in some cases these models are copying what they've seen in the training data. A human is not going to be able to copy the voice of a famous artist accurately in most cases, but these models have been known to do just that. Udio was almost immediately called out for being able to replicate famous artist's voices, albeit with some clever prompting to nudge the model in the right direction. This is why when you create a model that generates images or music or whatever, you have to be very careful about what data you train with. They are going on record and saying you're allowed to use this for whatever you want, you can even make songs and release them on Spotify. If it's going to replicate peoples voices accurately, I'd say that counts as impersonation and should not be tolerated. If I were included in the dataset and someone made a track using me and made money from it, even if unintentionally, I'm obviously not gonna be happy about that.

ZBennoui,

@miki Yeah, I guess that makes sense. Like you said before there aren't any clear answers yet, and I'll be really curious to see how this is dealt with in the coming years.

simon, to random
ZBennoui,

@simon Yeah, fuck Stack and Open AI. Sam honestly makes me kind of uncomfortable, and their stance on completely close source models is abhorrent to me. Can they change their name already?

talon, to random
@talon@dragonscave.space avatar

I dunno.wav

ZBennoui,

@talon Change name to Epic_Banger.wav because this shit goes hard. Where do you get your samples from, Splice?

zersiax, to accessibility
@zersiax@cupoftea.social avatar

The case for AI in is a hotly contested one but I do feel the baby's being tossed out with the bathwater just a tiny bit. Yes, it is bad that AI is being used to phase out hoomans in all sorts of pursuits. And yes, it is also true that at least at the moment, Ai-generated anything is generally lower quality than hooman-generated stuff. And yes, it is also true that we're seeing AI in places we really shouldn't be seeing it (MDN anyone?) and that people, just like always with a new toy, are going absolutely nuts with it and putting it front and center like it's Cthulhu's new miracle to end all toilet paper shortages. But it CAN, at times, actually be an enabler. It CAN, OCCASIONALLY, actually be used for good, and I don't think people who find this out and do this should be villified

ZBennoui,

@zersiax The thing I find a lot of people, especially on here, don't seem to understand is that it will improve. In the next five years we're not gonna have the same tools that we do now, and while I'm definitely sick of the hype I don't think AI/ML is inherently bad. There are a lot of really good use cases.

ZBennoui, to random

absolutely love this track and everything this girl does. She produces most of her stuff herself. Huge inspiration. https://www.youtube.com/watch?v=HMyJBH0-tjc

objectinspace, to random
@objectinspace@freeradical.zone avatar

Why buy Waves Creative?

I snagged the vocal production bundle a few years ago for around $350. That also let me pick three free plugins, and free updates for a year. Now I need a new plugin, so I bought one new plugin, and a second one while I was there, for $65. This gives me another free plugin, another another year of updates, which includes plugins that were newly added to the bundle.

I now have all the plugins I will ever need! The subscription would drown me in plugins, and cost more!

ZBennoui,

@objectinspace Honestly, that seems like a scam to me. Arbitrarily enforcing a year of updates on a per plug-in basis just seems really user hostile in my opinion. Eventually they're plug-ins will stop working if you update macOS to a new version for example, then you have to pay possibly a ton of money depending on how many plug-ins you bought from them just to use what you purchased. Not OK.

ZBennoui, to random

Reminder That I run a Discord server to talk about all things TTS related. Join if you like. https://discord.gg/tzXnHkJQeK

ppatel, to apple
@ppatel@mstdn.social avatar

The US DOJ is poised to sue as soon as Thursday, accusing it of violating laws by blocking rivals from accessing features.

https://www.bloomberg.com/news/articles/2024-03-20/doj-to-sue-apple-for-antitrust-violations-as-soon-as-thursday

ZBennoui,

@ppatel Good, it's about time.

ZBennoui, to random

In Macos 14.4, Apple broke per voice settings. You can no longer see what parameters you are adjusting, and interacting with them to change the values does nothing. The only way you're able to change them is by using the up-and-down arrow keys when you're focused on the slider you want, and none of them have labels. The only way you can see what you're changing is by interacting with the slider and listening to what voiceover says. Good job guys, you actually broke accessibility for an accessibility feature. Starting to think that post on Applevis about Apple accessibility specialists had more of a point than I realized.

dgar, to random
@dgar@aus.social avatar

You hear about the guy that invented Lifesavers?

He made a mint!

ZBennoui,

@dgar Lmao I literally had to paste this into Mistral AI's new LLM and have it explain the joke to me because I had no idea why this was funny. Never thought I'd see the day where I'd have to have a computer explain humor to me, but there you go. I'll paste the response so you can tell me if it got it right.
"The humor in this joke comes from a play on words. When we say someone "made a mint," it's an idiomatic expression that means they made a lot of money. In this case, the guy who invented Lifesavers, which are a type of mint-flavored candy, literally made a product that tastes like mint and also figuratively "made a mint" in terms of financial gain. So, the joke is a pun that combines the literal and figurative meanings of "making a mint.""

FreakyFwoof, to random

In 7 live-streams I've already racked up 23 hours 25 minutes.

ZBennoui,

@FreakyFwoof What do you use for streaming? Is it OBS? Want to get into it myself, was thinking of doing some production streams where I produce a track live.

ZBennoui,

@FreakyFwoof awesome, I'll be sure to check out your streams. Is it easy to use and set up? It definitely looks really powerful from reading the description on their website, I'll download it and take a look.

ZBennoui, to random

Just in case anyone cares, Microsoft just rolled out a bunch of new voices for Edge read aloud. These are some of the latest they've come out with, and are mostly designed for conversational applications like IVR. Some of them also work well for reading articles, which is why they're included, but you will find that some are better optimized than others. These are in several languages, but currently my favorites for US English are Ava and Andrew. Check them out.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • rosin
  • thenastyranch
  • ethstaker
  • DreamBathrooms
  • osvaldo12
  • magazineikmin
  • tacticalgear
  • Youngstown
  • everett
  • mdbf
  • slotface
  • ngwrru68w68
  • kavyap
  • provamag3
  • Durango
  • InstantRegret
  • GTA5RPClips
  • tester
  • cubers
  • cisconetworking
  • normalnudes
  • khanakhh
  • modclub
  • anitta
  • Leos
  • megavids
  • lostlight
  • All magazines