#aitraining - kbin.social

remixtures, 1 month ago to Bulgaria Portuguese

#EU #Belgium #France #AI #GenerativeaAI #AITraining #DataProtection #GDPR: "As well as the Belgian Data Protection Authority decision I criticised earlier this week, it appears the French DPA has issued similar guidance on the use of personal data to train AI models. My detailed analysis below shows that, in relation to purpose-specific AI systems, it makes no sense: the training of the system cannot be separated from the ultimate purpose of the system. This has a major bearing on the issue of compatibility.

As a matter of principle and law, the creation and training of AI models/profiles for a specific purpose (be that direct marketing or health care) must be based on the legal basis relied on for that ultimate purpose.

The fact that the creation and training of the models/profiles is a “first phase” in a two-phase process (with the deployment of the models/profiles forming the “second phase”) does not alter that.

However, as an exception to this, under the GDPR, the processing can also be authorised by law or by means of an authorisation issued by a DPA under the relevant law (as in France), provided the law or DPA authorisation lays down appropriate safeguards. That is the only qualification I accept to the above principle." https://www.ianbrown.tech/2024/04/16/more-on-french-and-belgian-gdpr-guidance-on-ai-training/

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kcarruthers, gimulnautti

larkin, 1 month ago to random

Overly stimulated #augmented #digitaldrawing #aicollaboration #aitraining #diydiffusionmodels

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ loosenut

emkingma, 2 months ago to ai

Go on LinkedIn for a bit this morning and I'm greeted with a message and an ad inviting me to screw over my own future and that of others.

No, I'm not going to teach your generative AI model how to f**king write.

#AI #AITraining #GenerativeAI

An ad from Outlier that appeared in my LinkedIn feed, encouraging me to sign up for the role I was messaged about.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

larkin, 2 months ago to random

Processed #digitaldrawing #aicollaboration #aitraining #diydiffusionmodels

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

larkin, 2 months ago to random

In a comic strip from the future #digitaldrawing #aicollaboration #aitraining #diydiffusionmodels

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ travisfw, loosenut, stefan

remixtures, 2 months ago to ai Portuguese

#AI #GenerativeAI #AITraining #GhostWork #Kenya #Uganda: "Magic is often used as a metaphor for complex technological processes and systems, and this is why in the marketing rhetoric of AI systems, magic has been such a powerful metaphor. We are told of its amazing, un-ending capabilities; its power to both save and ruin the world and of God like qualities just round the corner. It is a powerful metaphor, that is easy to get swept up. But a metaphor is all it is. AI is not untethered, immaterial magic. It is structurally reliant on a vast number of people providing a myriad of tasks in not so magical working conditions.

Everyone likes to believe in magic. But where AI is concerned, awe should be reserved for the workers performing the tasks behind the curtain. It is only because of them that the systems can do what they do. The least they deserve is basic minimum standards at work.

As Fairwork, we will be continuing our investigation into AI supply chains in the new year with new studies. We will be shifting our attention to business process outsourcing companies in Latin America with further support from the Global Partnership on AI. There is nothing inevitable about poor working conditions in the digital economy. Despite their claims to the contrary, companies have substantial control over the nature of the jobs that they provide. Fairwork’s aim is to hold them to account."

https://futureofwork.fes.de/news-list/e/ai-value-chain

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ TEG

josemurilo, 2 months ago to random

how should #CreativeCommons respond to the use of CC-licensed work in #AItraining?
"In 2023, the theme of the CC Global Summit was AI and the Commons, focused on supporting better sharing in a world with artificial intelligence."
There were 3 opinion groups that resulted from this conversation:
A: Moat Protectors - 16%
B: AI Oversight Maximalists - 36%
C: Equitable Benefit Seekers - 32%
https://creativecommons.org/2024/02/08/what-does-the-cc-community-think-about-regulating-generative-ai/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kcarruthers

larkin, 3 months ago to random

Juggling #augmented #digitaldrawing #aicollaboration #aitraining #diydiffusionmodels

video/mp4

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ stefan

remixtures, 3 months ago to ai Portuguese

#AI #GenerativeAI #LLMs #AITraining #GeneratedImages #Copyright #IP: "A lot of early AI research was done in an academic setting; the law specifically mentions teaching, scholarship, and research as examples of fair use. As a result, the machine-learning community has traditionally taken a relaxed attitude toward copyright. Early training sets frequently included copyrighted material.

As academic researchers took jobs in the burgeoning commercial AI sector, many assumed they would continue to enjoy wide latitude to train on copyrighted material. Some feel blindsided by copyright holders’ demands for cash.

“We all learn for free,” Daniel Jeffries wrote in his tweet summing up the view of many in the AI community. “We learn from the world around us and so do machines.”

The argument seems to be that if it’s legal for a human being to learn from one copyrighted book, it must also be legal for a large language model to learn from a million copyrighted books—even if the training process requires making copies of the books.

As MP3.com and Texaco learned, this isn't always true. A use that’s fair at a small scale can be unfair when it’s scaled up and commercialized.

But AI advocates like Jeffries are right that sometimes it is true. There are cases where courts have held that bulk technological uses of copyrighted works are fair use. The most important example is almost certainly the Google Books case."

https://www.understandingai.org/p/the-ai-community-needs-to-take-copyright

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ACM, CerstinMahlow

larkin, 3 months ago to random

Adrift and waiting for direction #augmented #digitaldrawing #aicollaboration #aitraining #diydiffusionmodels

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ travisfw

okpierre, 3 months ago to reddit

Large AI company is paying about $60 million for access to Reddit (YOUR content) so it can train its AI models

Fediverse does have open source alternatives like lemmy, kbin, mbin etc that you can try

#reddit #community #portal #aggregation #socialnetwork #ai #media #aitraining #software #training #fediverse #kbin #mbin #lemmy #activitypub

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ indieterminacy

remixtures, 3 months ago to ai Portuguese

#AI #GenerativeAI #UMichigan #LLMs #DataProtection #Privacy #AITraining #HigherEd #Universities #USA: "The University of Michigan is selling hours of audio recordings of study groups, office hours, lectures, and more to outside third-parties for tens of thousands of dollars for the purpose of training large language models (LLMs). 404 Media has downloaded a sample of the data, which includes a one hour and 20 minute long audio recording of what appears to be a lecture.

The news highlights how some LLMs may ultimately be trained on data with an unclear level of consent from the source subjects. The University of Michigan did not immediately respond to a request for comment, and neither did Catalyst Research Alliance, which is part of the sale process.

“The University of Michigan has recorded 65 speech events from a wide range of academic settings, including lectures, discussion sections, interviews, office hours, study groups, seminars and student presentations,” a page on Catalyst’s website about the University of Michigan data reads. “Speakers represent broad demographics, including male and female and native and non-native English speakers from a wide variety of academic disciplines.”"

https://www.404media.co/university-of-michigan-sells-recordings-of-study-groups-and-office-hours-to-train-ai/

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ SparkIT, oblomov

larkin, 3 months ago to random

Exploring new parameters #augmented #digitaldrawing #aicollaboration #aitraining #diydiffusionmodels

video/mp4
video/mp4
video/mp4
video/mp4

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ flockofnazguls, loosenut

larkin, 3 months ago to random

Rediscovering serendipity #augmented #digitaldrawing #aicollaboration #aitraining #diydiffusionmodels

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ loosenut

larkin, 3 months ago to random

Alive #augmented #digitaldrawing #aicollaboration #aitraining #diydiffusionmodels

video/mp4

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ DemocracySpot

larkin, 4 months ago to random

Augmented #digitaldrawing #aicollaboration #aitraining #diydiffusionmodel

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ loosenut

remixtures, 4 months ago to ai Portuguese

#AI #GenerativeAI #GeneratedImages #Datasets #AITraining: "Datasets are the building blocks of every AI generated image and text. Diffusion models break images in these datasets down into noise, learning how the images “diffuse.” From that information, the models can reassemble them. The models then abstract those formulas into categories using related captions, and that memory is applied to random noise, so as not to duplicate the actual content of training data, though it sometimes happens. An AI-generated image of a child is assembled from thousands of abstractions of these genuine photographs of children. In the case of Stable Diffusion and Midjourney, these images come from the LAION-5B dataset, a collection of captions and links to 2.3 billion images. If there are hundreds of images of a single child in that archive of URLs, that child could influence the outcomes of these models.

The presence of child pornography in this training data is obviously disturbing. An additional point of serious concern is the likelihood that images of children who experienced traumatic abuse are influencing the appearance of children in the resulting model’s synthetic images, even when those generated images are not remotely sexual.

The presence of this material in AI training data points to an ongoing negligence of the AI data pipeline. This crisis is partly the result of who policymakers talk with and allow to define AI: too often, it is industry experts who have a vested interest in deterring attention from the role of training data, and the facts of what lies within it. As with Omelas, we each face a decision of what to do now that we know these facts."

https://www.techpolicy.press/laion5b-stable-diffusion-and-the-original-sin-of-generative-ai/

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ gimulnautti

rant.vpalepu.com, 4 months ago to Japan

Tech regulations are going to be an important thing to watch out for in 2024. So, this year, I am going to keep documenting news about tech regulations that show up on my news feeds.

Two things popped up today:

Japan: Copyright and AI Training

A story from last year about Japan’s stance on AI training and data copyright seems to have caught wind on Hacker News today. The original news seems to by technomancers.ai, but here is the regurgitated version of the story on ACM’s news site (archived link from last year):

https://winterrant.files.wordpress.com/2024/01/screenshot-2024-01-02-at-3.00.32e280afpm.png?w=1024Japanese publishers already seem to be up in arms about this:

“The Japan Newspaper Publishers & Editors Association and three other industry groups released a joint statement Thursday expressing concern that copyright protection is not being adequately considered in the development of generative artificial intelligence.

The other organizations are the Japan Magazine Publishers Association, the Japan Photographic Copyright Association and the Japan Book Publishers Association.

In the joint statement, the organizations said that current generative AI creates content based on the analysis of large amounts of data collected from the internet without the consent of and payments to copyright holders.”

– https://www.japantimes.co.jp/news/2023/08/17/japan/crime-legal/japan-publisher-ai-copyright-concern/ (archived link)

Montana and North Carolina: Internet Identity

New Internet identification laws went into effect on Jan 1, 2024 in Montana and N.C.

“[…] laws that went into effect in both states on January 1st. Montana passed a standalone ID verification law in May, and North Carolina’s new law was tacked onto a bill regarding the high school computer curriculum. The laws require sites to either use third-party verification or, in the case of Montana, “digitized identification” to verify a visitor’s age. Both states also leave enforcement as a civil matter, allowing individuals to sue if they think a site violates the law.”

– https://www.theverge.com/2024/1/2/24022539/pornhub-blocked-montana-north-carolina-age-verification-law-protest (archived link)

These laws and many others like them are starting to require ID verification before accessing sites on the Internet. While these laws will have an outsized impact on porn hosting websites, their impact will likely be felt by other internet services that may have restrictions around how children use sites and services on the internet.

While the laws are well-meaning and well-intentioned, it is unclear how they will not violate user privacy. If the idea is to protect children, and affirm every user’s age through a well established digital (or physical) identity, then such sensitive identity data will need to make its way through the internet and reside on some web server (or data center). If such identity data ever leaks, it will be a major headache for the impacted users.

UK has passed a similar law that tightens its existing regulations around internet identities and protecting children on the internet. I am unclear on the status of that law: not sure if it has gone into effect, or if there are revisions to be made to it.

https://rant.vpalepu.com/2024/01/02/tech-regulations-update-japan-montana-north-carolina/

#AITraining #copyright #internetIdentityLaws #Japan #Montana #NorthCarolina

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ danyork

chris, 6 months ago to random

Cool, cool. hCaptcha, on this one specific EveryMac.com page, is now asking me to train AI for military vehicle identification. #aitraining #military https://everymac.com/ultimate-mac-lookup/?search_keywords=PowerBook2,1

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, quixoticgeek

PrivacyDigest, 6 months ago to ai

It’s a “fake PR stunt”: Artists hate Meta’s #AI data deletion process | Ars Technica

This is a misconception. In reality, there is no functional way to opt out of Meta’s generative #AItraining.

... In it, #Meta says it is “unable to process the request” until the requester submits evidence that their personal info appears in responses from Meta’s #generativeAI.
#privacy #rights #copyright

https://arstechnica.com/ai/2023/10/its-a-fake-pr-stunt-artists-hate-metas-ai-data-deletion-process/#p3

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

joycebell, 7 months ago to ai

Authors are finding out that their books are being used to train AI without permission. https://authorsguild.org/news/you-just-found-out-your-book-was-used-to-train-ai-now-what/ #authors #copyright #AI #aitraining #aiethics

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ai6yr

maggiemaybe, 8 months ago to privacy

This is disappointing, this is a cool app.

#aitraining #privacy

https://emails.bemyeyes.com/email/view/6503321fb4e8a819747231

Screenshot from link about the changes taking effect September 18, 2023. Continued use after that date is consenting to the changes.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

SteveThompson, 8 months ago to ai

"Pulitzer winner Chabon, other authors sue Meta over AI program"

https://www.reuters.com/technology/pulitzer-winner-chabon-other-authors-sue-meta-over-ai-program-2023-09-12/ #AI #AIethics #AItraining #Meta

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ joycebell

rexi, 8 months ago to machinelearning

https://www.pbs.org/video/ai-protection-1693683970/

"…you give it a large set of information and you ask it to detect a certain pattern…program can improve its performance based on number of trials and number of times, hence the term #machinelearning.

The crux of this matter when it comes to the #writers and #actors strike is that the large sets of data come from the #content that writers and actors have generated, and they have not been compensated for any #AItraining that has been done on that data…"
#wga #sagaftra

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ RandomCanuck

pmj, 9 months ago to ai

stop using #zoom immediately!!
they basically steal your personality, your manners, your gestures to make money out of it!
this is waaay beyond text or images!
and you can't opt-out!
https://zoomai.info/
#AI #AITraining #YouAreTheProduct

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ caos, lukas, kubikpixel