remixtures, to Bulgaria Portuguese
@remixtures@tldr.nettime.org avatar

: "As well as the Belgian Data Protection Authority decision I criticised earlier this week, it appears the French DPA has issued similar guidance on the use of personal data to train AI models. My detailed analysis below shows that, in relation to purpose-specific AI systems, it makes no sense: the training of the system cannot be separated from the ultimate purpose of the system. This has a major bearing on the issue of compatibility.

As a matter of principle and law, the creation and training of AI models/profiles for a specific purpose (be that direct marketing or health care) must be based on the legal basis relied on for that ultimate purpose.

The fact that the creation and training of the models/profiles is a “first phase” in a two-phase process (with the deployment of the models/profiles forming the “second phase”) does not alter that.

However, as an exception to this, under the GDPR, the processing can also be authorised by law or by means of an authorisation issued by a DPA under the relevant law (as in France), provided the law or DPA authorisation lays down appropriate safeguards. That is the only qualification I accept to the above principle." https://www.ianbrown.tech/2024/04/16/more-on-french-and-belgian-gdpr-guidance-on-ai-training/

larkin, to random
@larkin@genart.social avatar
emkingma, to ai
@emkingma@mstdn.social avatar

Go on LinkedIn for a bit this morning and I'm greeted with a message and an ad inviting me to screw over my own future and that of others.

No, I'm not going to teach your generative AI model how to f**king write.

An ad from Outlier that appeared in my LinkedIn feed, encouraging me to sign up for the role I was messaged about.

larkin, to random
@larkin@genart.social avatar
larkin, to random
@larkin@genart.social avatar
remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Magic is often used as a metaphor for complex technological processes and systems, and this is why in the marketing rhetoric of AI systems, magic has been such a powerful metaphor. We are told of its amazing, un-ending capabilities; its power to both save and ruin the world and of God like qualities just round the corner. It is a powerful metaphor, that is easy to get swept up. But a metaphor is all it is. AI is not untethered, immaterial magic. It is structurally reliant on a vast number of people providing a myriad of tasks in not so magical working conditions.

Everyone likes to believe in magic. But where AI is concerned, awe should be reserved for the workers performing the tasks behind the curtain. It is only because of them that the systems can do what they do. The least they deserve is basic minimum standards at work.

As Fairwork, we will be continuing our investigation into AI supply chains in the new year with new studies. We will be shifting our attention to business process outsourcing companies in Latin America with further support from the Global Partnership on AI. There is nothing inevitable about poor working conditions in the digital economy. Despite their claims to the contrary, companies have substantial control over the nature of the jobs that they provide. Fairwork’s aim is to hold them to account."

https://futureofwork.fes.de/news-list/e/ai-value-chain

josemurilo, to random
@josemurilo@mato.social avatar

how should respond to the use of CC-licensed work in ?
"In 2023, the theme of the CC Global Summit was AI and the Commons, focused on supporting better sharing in a world with artificial intelligence."
There were 3 opinion groups that resulted from this conversation:
A: Moat Protectors - 16%
B: AI Oversight Maximalists - 36%
C: Equitable Benefit Seekers - 32%
https://creativecommons.org/2024/02/08/what-does-the-cc-community-think-about-regulating-generative-ai/

larkin, to random
@larkin@genart.social avatar
remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "A lot of early AI research was done in an academic setting; the law specifically mentions teaching, scholarship, and research as examples of fair use. As a result, the machine-learning community has traditionally taken a relaxed attitude toward copyright. Early training sets frequently included copyrighted material.

As academic researchers took jobs in the burgeoning commercial AI sector, many assumed they would continue to enjoy wide latitude to train on copyrighted material. Some feel blindsided by copyright holders’ demands for cash.

“We all learn for free,” Daniel Jeffries wrote in his tweet summing up the view of many in the AI community. “We learn from the world around us and so do machines.”

The argument seems to be that if it’s legal for a human being to learn from one copyrighted book, it must also be legal for a large language model to learn from a million copyrighted books—even if the training process requires making copies of the books.

As MP3.com and Texaco learned, this isn't always true. A use that’s fair at a small scale can be unfair when it’s scaled up and commercialized.

But AI advocates like Jeffries are right that sometimes it is true. There are cases where courts have held that bulk technological uses of copyrighted works are fair use. The most important example is almost certainly the Google Books case."

https://www.understandingai.org/p/the-ai-community-needs-to-take-copyright

larkin, to random
@larkin@genart.social avatar
okpierre, to reddit
@okpierre@mastodon.social avatar

Large AI company is paying about $60 million for access to Reddit (YOUR content) so it can train its AI models

Fediverse does have open source alternatives like lemmy, kbin, mbin etc that you can try

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "The University of Michigan is selling hours of audio recordings of study groups, office hours, lectures, and more to outside third-parties for tens of thousands of dollars for the purpose of training large language models (LLMs). 404 Media has downloaded a sample of the data, which includes a one hour and 20 minute long audio recording of what appears to be a lecture.

The news highlights how some LLMs may ultimately be trained on data with an unclear level of consent from the source subjects. The University of Michigan did not immediately respond to a request for comment, and neither did Catalyst Research Alliance, which is part of the sale process.

“The University of Michigan has recorded 65 speech events from a wide range of academic settings, including lectures, discussion sections, interviews, office hours, study groups, seminars and student presentations,” a page on Catalyst’s website about the University of Michigan data reads. “Speakers represent broad demographics, including male and female and native and non-native English speakers from a wide variety of academic disciplines.”"

https://www.404media.co/university-of-michigan-sells-recordings-of-study-groups-and-office-hours-to-train-ai/

larkin, to random
@larkin@genart.social avatar
larkin, to random
@larkin@genart.social avatar
larkin, to random
@larkin@genart.social avatar
larkin, to random
@larkin@genart.social avatar
remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Datasets are the building blocks of every AI generated image and text. Diffusion models break images in these datasets down into noise, learning how the images “diffuse.” From that information, the models can reassemble them. The models then abstract those formulas into categories using related captions, and that memory is applied to random noise, so as not to duplicate the actual content of training data, though it sometimes happens. An AI-generated image of a child is assembled from thousands of abstractions of these genuine photographs of children. In the case of Stable Diffusion and Midjourney, these images come from the LAION-5B dataset, a collection of captions and links to 2.3 billion images. If there are hundreds of images of a single child in that archive of URLs, that child could influence the outcomes of these models.

The presence of child pornography in this training data is obviously disturbing. An additional point of serious concern is the likelihood that images of children who experienced traumatic abuse are influencing the appearance of children in the resulting model’s synthetic images, even when those generated images are not remotely sexual.

The presence of this material in AI training data points to an ongoing negligence of the AI data pipeline. This crisis is partly the result of who policymakers talk with and allow to define AI: too often, it is industry experts who have a vested interest in deterring attention from the role of training data, and the facts of what lies within it. As with Omelas, we each face a decision of what to do now that we know these facts."

https://www.techpolicy.press/laion5b-stable-diffusion-and-the-original-sin-of-generative-ai/

rant.vpalepu.com, to Japan

Tech regulations are going to be an important thing to watch out for in 2024. So, this year, I am going to keep documenting news about tech regulations that show up on my news feeds.

Two things popped up today:

  1. Japan: Copyright and AI Training

A story from last year about Japan’s stance on AI training and data copyright seems to have caught wind on Hacker News today. The original news seems to by technomancers.ai, but here is the regurgitated version of the story on ACM’s news site (archived link from last year):

https://winterrant.files.wordpress.com/2024/01/screenshot-2024-01-02-at-3.00.32e280afpm.png?w=1024Japanese publishers already seem to be up in arms about this:

“The Japan Newspaper Publishers & Editors Association and three other industry groups released a joint statement Thursday expressing concern that copyright protection is not being adequately considered in the development of generative artificial intelligence.

The other organizations are the Japan Magazine Publishers Association, the Japan Photographic Copyright Association and the Japan Book Publishers Association.

In the joint statement, the organizations said that current generative AI creates content based on the analysis of large amounts of data collected from the internet without the consent of and payments to copyright holders.”

https://www.japantimes.co.jp/news/2023/08/17/japan/crime-legal/japan-publisher-ai-copyright-concern/ (archived link)

  1. Montana and North Carolina: Internet Identity

New Internet identification laws went into effect on Jan 1, 2024 in Montana and N.C.

“[…] laws that went into effect in both states on January 1st. Montana passed a standalone ID verification law in May, and North Carolina’s new law was tacked onto a bill regarding the high school computer curriculum. The laws require sites to either use third-party verification or, in the case of Montana, “digitized identification” to verify a visitor’s age. Both states also leave enforcement as a civil matter, allowing individuals to sue if they think a site violates the law.”

https://www.theverge.com/2024/1/2/24022539/pornhub-blocked-montana-north-carolina-age-verification-law-protest (archived link)

These laws and many others like them are starting to require ID verification before accessing sites on the Internet. While these laws will have an outsized impact on porn hosting websites, their impact will likely be felt by other internet services that may have restrictions around how children use sites and services on the internet.

While the laws are well-meaning and well-intentioned, it is unclear how they will not violate user privacy. If the idea is to protect children, and affirm every user’s age through a well established digital (or physical) identity, then such sensitive identity data will need to make its way through the internet and reside on some web server (or data center). If such identity data ever leaks, it will be a major headache for the impacted users.

UK has passed a similar law that tightens its existing regulations around internet identities and protecting children on the internet. I am unclear on the status of that law: not sure if it has gone into effect, or if there are revisions to be made to it.

https://rant.vpalepu.com/2024/01/02/tech-regulations-update-japan-montana-north-carolina/

chris, to random
@chris@social.losno.co avatar

Cool, cool. hCaptcha, on this one specific EveryMac.com page, is now asking me to train AI for military vehicle identification. https://everymac.com/ultimate-mac-lookup/?search_keywords=PowerBook2,1

PrivacyDigest, to ai
@PrivacyDigest@mas.to avatar

It’s a “fake PR stunt”: Artists hate Meta’s data deletion process | Ars Technica

This is a misconception. In reality, there is no functional way to opt out of Meta’s generative .

... In it, says it is “unable to process the request” until the requester submits evidence that their personal info appears in responses from Meta’s .

https://arstechnica.com/ai/2023/10/its-a-fake-pr-stunt-artists-hate-metas-ai-data-deletion-process/#p3

joycebell, to ai
@joycebell@mas.to avatar

Authors are finding out that their books are being used to train AI without permission. https://authorsguild.org/news/you-just-found-out-your-book-was-used-to-train-ai-now-what/

maggiemaybe, to privacy
SteveThompson, to ai
@SteveThompson@mastodon.social avatar
rexi, to machinelearning
@rexi@mastodon.social avatar

https://www.pbs.org/video/ai-protection-1693683970/

"…you give it a large set of information and you ask it to detect a certain pattern…program can improve its performance based on number of trials and number of times, hence the term .

The crux of this matter when it comes to the and strike is that the large sets of data come from the that writers and actors have generated, and they have not been compensated for any that has been done on that data…"

pmj, to ai
@pmj@social.pmj.rocks avatar

stop using immediately!!
they basically steal your personality, your manners, your gestures to make money out of it!
this is waaay beyond text or images!
and you can't opt-out!
https://zoomai.info/

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • kavyap
  • DreamBathrooms
  • normalnudes
  • magazineikmin
  • InstantRegret
  • GTA5RPClips
  • thenastyranch
  • Youngstown
  • rosin
  • slotface
  • osvaldo12
  • ngwrru68w68
  • ethstaker
  • JUstTest
  • everett
  • Durango
  • Leos
  • cubers
  • mdbf
  • khanakhh
  • tester
  • modclub
  • cisconetworking
  • anitta
  • tacticalgear
  • provamag3
  • lostlight
  • All magazines