phillycodehound, to ai
@phillycodehound@masto.ai avatar

So over on Hacker News they report that Zoom is using user data to train their AI and there is no way to opt out. I mean there is a way... don't use Zoom. Though I'm going to keep using it. It's the best in class and pretty much everyone knows how to use it.

cragsand, (edited ) to ai

I learned how to train LoRa AI models using open source StableDiffusion...

For the purpose of recreating the appearance of my 3D VR roleplaying character the results I got were amazingly good... almost frighteningly so.

I'll go through the process, results and some thoughts.

🧵 part 1of4

PrivacyDigest, to ai
@PrivacyDigest@mas.to avatar

It’s a “fake PR stunt”: Artists hate Meta’s data deletion process | Ars Technica

This is a misconception. In reality, there is no functional way to opt out of Meta’s generative .

... In it, says it is “unable to process the request” until the requester submits evidence that their personal info appears in responses from Meta’s .

https://arstechnica.com/ai/2023/10/its-a-fake-pr-stunt-artists-hate-metas-ai-data-deletion-process/#p3

larkin, to random
@larkin@genart.social avatar
remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "The people who build generative AI have a huge influence on what it is good at, and who does and doesn’t benefit from it. Understanding how generative AI is shaped by the objectives, intentions, and values of its creators demystifies the technology, and helps us to focus on questions of accountability and regulation. In this explainer, we tackle one of the most basic questions: What are some of the key moments of human decision-making in the development of generative AI products? This question forms the basis of our current research investigation at Mozilla to better understand the motivations and values that guide this development process. For simplicity, let’s focus on text-generators like ChatGPT.

We can roughly distinguish between two phases in the production process of generative AI. In the pre-training phase, the goal is usually to create a Large Language Model (LLM) that is good at predicting the next word in a sequence (which can be words in a sentence, whole sentences, or paragraphs) by training it on large amounts of data. The resulting pre-trained model “learns” how to imitate the patterns found in the language(s) it was trained on.

This capability is then utilized by adapting the model to perform different tasks in the fine-tuning phase. This adjusting of pre-trained models for specific tasks is how new products are created. For example, OpenAI’s ChatGPT was created by “teaching” a pre-trained model — called GPT-3 — how to respond to user prompts and instructions. GitHub Copilot, a service for software developers that uses generative AI to make code suggestions, also builds on a version of GPT-3 that was fine-tuned on “billions of lines of code.”"

https://foundation.mozilla.org/en/blog/the-human-decisions-that-shape-generative-ai-who-is-accountable-for-what/

maggiemaybe, to privacy
joycebell, to ai
@joycebell@mas.to avatar

Authors are finding out that their books are being used to train AI without permission. https://authorsguild.org/news/you-just-found-out-your-book-was-used-to-train-ai-now-what/

chris, to random
@chris@social.losno.co avatar

Cool, cool. hCaptcha, on this one specific EveryMac.com page, is now asking me to train AI for military vehicle identification. https://everymac.com/ultimate-mac-lookup/?search_keywords=PowerBook2,1

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Datasets are the building blocks of every AI generated image and text. Diffusion models break images in these datasets down into noise, learning how the images “diffuse.” From that information, the models can reassemble them. The models then abstract those formulas into categories using related captions, and that memory is applied to random noise, so as not to duplicate the actual content of training data, though it sometimes happens. An AI-generated image of a child is assembled from thousands of abstractions of these genuine photographs of children. In the case of Stable Diffusion and Midjourney, these images come from the LAION-5B dataset, a collection of captions and links to 2.3 billion images. If there are hundreds of images of a single child in that archive of URLs, that child could influence the outcomes of these models.

The presence of child pornography in this training data is obviously disturbing. An additional point of serious concern is the likelihood that images of children who experienced traumatic abuse are influencing the appearance of children in the resulting model’s synthetic images, even when those generated images are not remotely sexual.

The presence of this material in AI training data points to an ongoing negligence of the AI data pipeline. This crisis is partly the result of who policymakers talk with and allow to define AI: too often, it is industry experts who have a vested interest in deterring attention from the role of training data, and the facts of what lies within it. As with Omelas, we each face a decision of what to do now that we know these facts."

https://www.techpolicy.press/laion5b-stable-diffusion-and-the-original-sin-of-generative-ai/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "The University of Michigan is selling hours of audio recordings of study groups, office hours, lectures, and more to outside third-parties for tens of thousands of dollars for the purpose of training large language models (LLMs). 404 Media has downloaded a sample of the data, which includes a one hour and 20 minute long audio recording of what appears to be a lecture.

The news highlights how some LLMs may ultimately be trained on data with an unclear level of consent from the source subjects. The University of Michigan did not immediately respond to a request for comment, and neither did Catalyst Research Alliance, which is part of the sale process.

“The University of Michigan has recorded 65 speech events from a wide range of academic settings, including lectures, discussion sections, interviews, office hours, study groups, seminars and student presentations,” a page on Catalyst’s website about the University of Michigan data reads. “Speakers represent broad demographics, including male and female and native and non-native English speakers from a wide variety of academic disciplines.”"

https://www.404media.co/university-of-michigan-sells-recordings-of-study-groups-and-office-hours-to-train-ai/

larkin, to random
@larkin@genart.social avatar
larkin, to random
@larkin@genart.social avatar
remixtures, to Bulgaria Portuguese
@remixtures@tldr.nettime.org avatar

: "As well as the Belgian Data Protection Authority decision I criticised earlier this week, it appears the French DPA has issued similar guidance on the use of personal data to train AI models. My detailed analysis below shows that, in relation to purpose-specific AI systems, it makes no sense: the training of the system cannot be separated from the ultimate purpose of the system. This has a major bearing on the issue of compatibility.

As a matter of principle and law, the creation and training of AI models/profiles for a specific purpose (be that direct marketing or health care) must be based on the legal basis relied on for that ultimate purpose.

The fact that the creation and training of the models/profiles is a “first phase” in a two-phase process (with the deployment of the models/profiles forming the “second phase”) does not alter that.

However, as an exception to this, under the GDPR, the processing can also be authorised by law or by means of an authorisation issued by a DPA under the relevant law (as in France), provided the law or DPA authorisation lays down appropriate safeguards. That is the only qualification I accept to the above principle." https://www.ianbrown.tech/2024/04/16/more-on-french-and-belgian-gdpr-guidance-on-ai-training/

emkingma, to ai
@emkingma@mstdn.social avatar

Go on LinkedIn for a bit this morning and I'm greeted with a message and an ad inviting me to screw over my own future and that of others.

No, I'm not going to teach your generative AI model how to f**king write.

An ad from Outlier that appeared in my LinkedIn feed, encouraging me to sign up for the role I was messaged about.

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "This paper is a snapshot of an idea that is as underexplored as it is rooted in decades of existing work. The concept of mass digitization of books, including to support text and data mining, of which AI is a subset, is not new. But AI training is newly of the zeitgeist, and its transformative use makes questions about how we digitize, preserve, and make accessible knowledge and cultural heritage salient in a distinct way.

As such, efforts to build a books data commons need not start from scratch; there is much to glean from studying and engaging existing and previous efforts. Those learnings might inform substantive decisions about how to build a books data commons for AI training. For instance, looking at the design decisions of HathiTrust may inform how the technical infrastructure and data management practices for AI training might be designed, as well as how to address challenges to building a comprehensive, diverse, and useful corpus. In addition, learnings might inform the process by which we get to a books data commons — for example, illustrating ways to attend to the interests of those likely to be impacted by the dataset’s development." https://openfuture.pubpub.org/pub/towards-a-book-data-commons-for-ai-training/release/1

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Scarcely a day goes by without news of exciting breakthroughs in the world of AI. In the face of disruptive waves of technological change and mounting uncertainty, the law cannot help but take on an “experimental” character, with lawmakers and lawyers often caught on the back foot, struggling to keep up with the sweeping winds of change. But whatever the next steps may be, one thing is certain: litigation surrounding generative AI marks an important crossroads, and whichever path we choose is likely to shape the future of the technology. The rising litigation around generative AI is not targeting image by image or specific excerpts of infringing texts produced by AI models. Rather, the whole technique behind the system is hanging in the balance.

Another key takeaway that merits attention relates to the fragmentary landscape of copyright that seems to be unfolding in the wake of the rapid advances in AI technology. Although the emerging European legal framework offers strict rules yet solid ground for AI technology to flourish on the continent, it’s worth wondering what will happen if the “Brussels effect” fails to reach the shores the other side of the Atlantic and the use of copyrighted works for training purposes is found to be transformative fair use in common law jurisdictions, while a relevant portion of these works are opted-out of AI models on European soil. That would mark a yawning gap between two copyright regimes, opening a new chapter in this old tale and potentially disadvantaging would-be European generative AI providers." https://copyrightblog.kluweriplaw.com/2024/04/08/the-stubborn-memory-of-generative-ai-overfitting-fair-use-and-the-ai-act/

larkin, to random
@larkin@genart.social avatar
remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Representative Adam Schiff (D-Calif.) introduced new legislation in the U.S. House of Representatives on Tuesday (April 9) which, if passed, would require AI companies to disclose which copyrighted works were used to train their models, or face a financial penalty. Called the Generative AI Copyright Disclosure Act, the new bill would apply to both new models and retroactively to previously released and used generative AI systems.

The bill requires that a full list of copyrighted works in an AI model’s training data set be filed with the Copyright Office no later than 30 days before the model becomes available to consumers. This would also be required when the training data set for an existing model is altered in a significant manner. Financial penalties for non-compliance would be determined on a case-by-case basis by the Copyright Office, based on factors like the company’s history of noncompliance and the company’s size." https://www.billboard.com/business/legal/federal-bill-ai-training-require-disclosure-songs-used-1235651089/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "A lawsuit is alleging Amazon was so desperate to keep up with the competition in generative AI it was willing to breach its own copyright rules.…

The allegation emerges from a complaint [PDF] accusing the tech and retail mega-corp of demoting, and then dismissing, a former high-flying AI scientist after it discovered she was pregnant.

The lawsuit was filed last week in a Los Angeles state court by Dr Viviane Ghaderi, an AI researcher who says she worked successfully in Amazon's Alexa and LLM teams, and achieved a string of promotions, but claims she was later suddenly demoted and fired following her return to work after giving birth. She is alleging discrimination, retaliation, harassment and wrongful termination, among other claims.

Montana MacLachlan, an Amazon spokesperson, said of the suit: "We do not tolerate discrimination, harassment, or retaliation in our workplace. We investigate any reports of such conduct and take appropriate action against anyone found to have violated our policies.""

https://www.msn.com/en-us/news/crime/ex-amazon-exec-claims-she-was-asked-to-break-copyright-law-in-race-to-ai/ar-AA1nrNEG

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "[A]s the lawsuits and investigations around generative AI and its opaque data practices pile up, there have been small moves to give people more control over what happens to what they post online. Some companies now let individuals and business customers opt out of having their content used in AI training or being sold for training purposes. Here’s what you can—and can’t—do.

Before we get to how you can opt out, it’s worth setting some expectations. Many companies building AI have already scraped the web, so anything you’ve posted is probably already in their systems. Companies are also secretive about what they have actually scraped, purchased, or used to train their systems. “We honestly don't know that much,” says Niloofar Mireshghallah, a researcher who focuses on AI privacy at the University of Washington. “In general, everything is very black-box.”" https://www.wired.com/story/how-to-stop-your-data-from-being-used-to-train-ai/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Stack Overflow, a legendary internet forum for programmers and developers, is coming under heavy fire from its users after it announced it was partnering with OpenAI to scrub the site's forum posts to train ChatGPT. Many users are removing or editing their questions and answers to prevent them from being used to train AI — decisions which have been punished with bans from the site's moderators.

Stack Overflow user Ben posted on Mastodon about his experience editing his most successful answers to try to avoid having his work stolen by OpenAI.

@ben on Mastodon posts, "Stack Overflow announced that they are partnering with OpenAI, so I tried to delete my highest-rated answers. Stack Overflow does not let you delete questions that have accepted answers and many upvotes because it would remove knowledge from the community. So instead I changed my highest-rated answers to a protest message. Within an hour mods had changed the questions back and suspended my account for 7 days."

Ben continues in his thread, "[The moderator crackdown is] just a reminder that anything you post on any of these platforms can and will be used for profit. It's just a matter of time until all your messages on Discord, Twitter etc. are scraped, fed into a model and sold back to you."

https://www.tomshardware.com/tech-industry/artificial-intelligence/stack-overflow-bans-users-en-masse-for-rebelling-against-openai-partnership-users-banned-for-deleting-answers-to-prevent-them-being-used-to-train-chatgpt

remixtures, to ArtificialIntelligence Portuguese
@remixtures@tldr.nettime.org avatar

: "Roboticists believe that by using new AI techniques, they will achieve something the field has pined after for decades: more capable robots that can move freely through unfamiliar environments and tackle challenges they’ve never seen before.
(...)
But something is slowing that rocket down: lack of access to the types of data used to train robots so they can interact more smoothly with the physical world. It’s far harder to come by than the data used to train the most advanced AI models like GPT—mostly text, images, and videos scraped off the internet. Simulation programs can help robots learn how to interact with places and objects, but the results still tend to fall prey to what’s known as the “sim-to-real gap,” or failures that arise when robots move from the simulation to the real world.

For now, we still need access to physical, real-world data to train robots. That data is relatively scarce and tends to require a lot more time, effort, and expensive equipment to collect. That scarcity is one of the main things currently holding progress in robotics back."

https://www.technologyreview.com/2024/04/30/1091907/the-robot-race-is-fueling-a-fight-for-training-data/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Creating an individual bargainable copyright over training will not improve the material conditions of artists' lives – all it will do is change the relative shares of the value we create, shifting some of that value from tech companies that hate us and want us to starve to entertainment companies that hate us and want us to starve.

As an artist, I'm foursquare against anything that stands in the way of making art. As an artistic worker, I'm entirely committed to things that help workers get a fair share of the money their work creates, feed their families and pay their rent.

I think today's AI art is bad, and I think tomorrow's AI art will probably be bad, but even if you disagree (with either proposition), I hope you'll agree that we should be focused on making sure art is legal to make and that artists get paid for it.

Just because copyright won't fix the creative labor market, it doesn't follow that nothing will. If we're worried about labor issues, we can look to labor law to improve our conditions."

https://pluralistic.net/2024/05/13/spooky-action-at-a-close-up/#invisible-hand

remixtures, to Sony Portuguese
@remixtures@tldr.nettime.org avatar

Sony Music is the prototype of the company that uses artists as mere puppets for getting the only thing it really wants: free money extracted through IP rents. It's a parasite that doesn't contribute at all to the promotion of arts and science.

: "Sony Music is sending warning letters to more than 700 artificial intelligence developers and music streaming services globally in the latest salvo in the music industry’s battle against tech groups ripping off artists.

The Sony Music letter, which has been seen by the Financial Times, expressly prohibits AI developers from using its music — which includes artists such as Harry Styles, Adele and Beyoncé — and opts out of any text and data mining of any of its content for any purposes such as training, developing or commercialising any AI system.

Sony Music is sending the letter to companies developing AI systems including OpenAI, Microsoft, Google, Suno and Udio, according to those close to the group.

The world’s second-largest music group is also sending separate letters to streaming platforms, including Spotify and Apple, asking them to adopt “best practice” measures to protect artists and songwriters and their music from scraping, mining and training by AI developers without consent or compensation. It has asked them to update their terms of service, making it clear that mining and training on its content is not permitted.

Sony Music declined to comment further."

https://www.ft.com/content/c5b93b23-9f26-4e6b-9780-a5d3e5e7a409

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

#AI #GenerativeAI #Slack #AITraining #Copyright: "It all kicked off last night, when a note on Hacker News raised the issue of how Slack trains its AI services, by way of a straight link to its privacy principles — no additional comment was needed. That post kicked off a longer conversation — and what seemed like news to current Slack users — that Slack opts users in by default to its AI training, and that you need to email a specific address to opt out.

That Hacker News thread then spurred multiple conversations and questions on other platforms: There is a newish, generically named product called “Slack AI” that lets users search for answers and summarize conversation threads, among other things, but why is that not once mentioned by name on that privacy principles page in any way, even to make clear if the privacy policy applies to it? And why does Slack reference both “global models” and “AI models?”

Between people being confused about where Slack is applying its AI privacy principles, and people being surprised and annoyed at the idea of emailing to opt-out — at a company that makes a big deal of touting that “Your control your data” — Slack does not come off well."

https://techcrunch.com/2024/05/17/slack-under-attack-over-sneaky-ai-training-policy/?guccounter=1

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • kavyap
  • DreamBathrooms
  • InstantRegret
  • magazineikmin
  • ngwrru68w68
  • GTA5RPClips
  • thenastyranch
  • Youngstown
  • rosin
  • slotface
  • cisconetworking
  • mdbf
  • ethstaker
  • megavids
  • Durango
  • khanakhh
  • cubers
  • anitta
  • osvaldo12
  • everett
  • normalnudes
  • tester
  • tacticalgear
  • provamag3
  • modclub
  • Leos
  • lostlight
  • All magazines