remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Since the emergence of Midjourney and other image generators, artists have been watching and wondering whether AI is a great opportunity or an existential threat. Now, after a list of 16,000 names emerged of artists whose work Midjourney had allegedly used to train its AI – including Bridget Riley, Damien Hirst, Rachel Whiteread, Tracey Emin, David Hockney and Anish Kapoor – the art world has issued a call to arms against the technologists.

British artists have contacted US lawyers to discuss joining a class action against Midjourney and other AI firms, while others have told the Observer that they may bring their own legal action in the UK.

“What we need to do is come together,” said Tim Flach, president of the Association of Photographers and an internationally acclaimed photographer whose name is on the list.

“This public showing of this list of names is a great catalyst for artists to come together and challenge it. I personally would be up for doing that.”

The 24-page list of names forms Exhibit J in a class action brought by 10 American artists in California against Midjourney, Stability AI, Runway AI and DeviantArt. Matthew Butterick, one of the lawyers representing the artists, said: “We’ve had interest from artists around the world, including the UK.”

The tech firms have until 8 February to respond to the claim. Midjourney did not respond to requests for comment."

https://www.theguardian.com/technology/2024/jan/21/we-need-to-come-together-british-artists-team-up-to-fight-ai-image-generating-software

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "We have previously analysed US class actions against Open AI (here) and Google (here) for unauthorized use of copyright works in the training of generative AI tools, respectively ChatGPT, Google Bard and Gemini. To further develop this excursus on the US case law, in this post we consider two recent class actions against Meta launched by copyright holders (mainly book authors), for alleged infringement of IP in their books and written works through use in training materials for LLaMA (Large Language Model Meta AI). Such case law is interesting for the reconstruction of the technology deployed by Meta and the training methodology (at least from the plaintiff’s perspective) but also because the court has had the chance to preliminarily evaluate the robustness of the claims. Given the similarity of the legal arguments and the same technology being at stake (Meta’s LLaMA), upon the request of the parties, the Court treated the two class actions jointly (here)."

https://copyrightblog.kluweriplaw.com/2024/01/17/generative-ai-admissibility-and-infringement-in-the-two-us-class-actions-against-metas-llama/

remixtures, to meta Portuguese
@remixtures@tldr.nettime.org avatar

: "These are noteworthy developments but not all complaints can be resolved with promises. Several lawsuits against OpenAI and Meta remain ongoing, accusing the companies of using the Books3 dataset to train their models.

While OpenAI and Meta are very cautious about discussing the subject in public, Meta provided more context in a California federal court this week.

Responding to a lawsuit from writer/comedian Sarah Silverman, author Richard Kadrey, and other rights holders, the tech giant admits that “portions of Books3” were used to train the Llama AI model before its public release.

“Meta admits that it used portions of the Books3 dataset, among many other materials, to train Llama 1 and Llama 2,” Meta writes in its answer."

https://torrentfreak.com/meta-admits-use-of-pirated-book-dataset-to-train-ai-240111/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Datasets are the building blocks of every AI generated image and text. Diffusion models break images in these datasets down into noise, learning how the images “diffuse.” From that information, the models can reassemble them. The models then abstract those formulas into categories using related captions, and that memory is applied to random noise, so as not to duplicate the actual content of training data, though it sometimes happens. An AI-generated image of a child is assembled from thousands of abstractions of these genuine photographs of children. In the case of Stable Diffusion and Midjourney, these images come from the LAION-5B dataset, a collection of captions and links to 2.3 billion images. If there are hundreds of images of a single child in that archive of URLs, that child could influence the outcomes of these models.

The presence of child pornography in this training data is obviously disturbing. An additional point of serious concern is the likelihood that images of children who experienced traumatic abuse are influencing the appearance of children in the resulting model’s synthetic images, even when those generated images are not remotely sexual.

The presence of this material in AI training data points to an ongoing negligence of the AI data pipeline. This crisis is partly the result of who policymakers talk with and allow to define AI: too often, it is industry experts who have a vested interest in deterring attention from the role of training data, and the facts of what lies within it. As with Omelas, we each face a decision of what to do now that we know these facts."

https://www.techpolicy.press/laion5b-stable-diffusion-and-the-original-sin-of-generative-ai/

rant.vpalepu.com, to Japan

Tech regulations are going to be an important thing to watch out for in 2024. So, this year, I am going to keep documenting news about tech regulations that show up on my news feeds.

Two things popped up today:

  1. Japan: Copyright and AI Training

A story from last year about Japan’s stance on AI training and data copyright seems to have caught wind on Hacker News today. The original news seems to by technomancers.ai, but here is the regurgitated version of the story on ACM’s news site (archived link from last year):

https://winterrant.files.wordpress.com/2024/01/screenshot-2024-01-02-at-3.00.32e280afpm.png?w=1024Japanese publishers already seem to be up in arms about this:

“The Japan Newspaper Publishers & Editors Association and three other industry groups released a joint statement Thursday expressing concern that copyright protection is not being adequately considered in the development of generative artificial intelligence.

The other organizations are the Japan Magazine Publishers Association, the Japan Photographic Copyright Association and the Japan Book Publishers Association.

In the joint statement, the organizations said that current generative AI creates content based on the analysis of large amounts of data collected from the internet without the consent of and payments to copyright holders.”

https://www.japantimes.co.jp/news/2023/08/17/japan/crime-legal/japan-publisher-ai-copyright-concern/ (archived link)

  1. Montana and North Carolina: Internet Identity

New Internet identification laws went into effect on Jan 1, 2024 in Montana and N.C.

“[…] laws that went into effect in both states on January 1st. Montana passed a standalone ID verification law in May, and North Carolina’s new law was tacked onto a bill regarding the high school computer curriculum. The laws require sites to either use third-party verification or, in the case of Montana, “digitized identification” to verify a visitor’s age. Both states also leave enforcement as a civil matter, allowing individuals to sue if they think a site violates the law.”

https://www.theverge.com/2024/1/2/24022539/pornhub-blocked-montana-north-carolina-age-verification-law-protest (archived link)

These laws and many others like them are starting to require ID verification before accessing sites on the Internet. While these laws will have an outsized impact on porn hosting websites, their impact will likely be felt by other internet services that may have restrictions around how children use sites and services on the internet.

While the laws are well-meaning and well-intentioned, it is unclear how they will not violate user privacy. If the idea is to protect children, and affirm every user’s age through a well established digital (or physical) identity, then such sensitive identity data will need to make its way through the internet and reside on some web server (or data center). If such identity data ever leaks, it will be a major headache for the impacted users.

UK has passed a similar law that tightens its existing regulations around internet identities and protecting children on the internet. I am unclear on the status of that law: not sure if it has gone into effect, or if there are revisions to be made to it.

https://rant.vpalepu.com/2024/01/02/tech-regulations-update-japan-montana-north-carolina/

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Creative Commons has been used for around 20 years, and the number of lawsuits involving works released under these licences has been minimal. People choose to share their works with CC licenses for various reasons, some are selfish, some altruistic, and some pragmatic. Personally, I have always enjoyed sharing. Since I don’t anticipate earning money from my writing, I prefer making my works freely available with minimal restrictions. CC licenses facilitate this by signalling to others that they can share my work. However, this philosophy might not resonate with everyone. For those who do not share this view, CC may not be the ideal choice. If you prefer not to have your works widely shared, avoiding open licenses and utilizing technical tools and opt-outs might be a better approach. Respecting individual preferences will be crucial moving forward. I believe we are approaching a landscape similar to what we have seen with open access and open content, where such considerations are increasingly significant."

https://www.technollama.co.uk/creative-commons-and-ai-training

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "The agency cautioned that generative AI could mimic “artists’ faces, voices, and performances without permission,” deceiving consumers about a work’s true authorship. FTC officials also expressed concerns about copyright violations, stating AI systems are trained on “pirated content” scraped “without consent.”

On copyright infringement, the FTC stated that “the use of pirated or misuse of copyrighted materials could be an unfair practice or unfair method of competition under Section 5 of the FTC Act.”

Separately but relatedly, leading AI companies such as OpenAI and Anthropic are facing lawsuits accusing them of violating copyright by using copyrighted content in their training data."

https://venturebeat.com/ai/ftc-takes-shots-at-ai-in-rare-filing-to-us-copyright-office/

chris, to random
@chris@social.losno.co avatar

Cool, cool. hCaptcha, on this one specific EveryMac.com page, is now asking me to train AI for military vehicle identification. https://everymac.com/ultimate-mac-lookup/?search_keywords=PowerBook2,1

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Web crawlers and scrapers can easily access data from just about anywhere that’s not behind a login page. Social media profiles set to private aren’t included. But data that are viewable in a search engine or without logging into a site, such as a public LinkedIn profile, might still be vacuumed up, Dodge says. Then, he adds, “there’s the kinds of things that absolutely end up in these Web scrapes”—including blogs, personal webpages and company sites. This includes anything on popular photograph-sharing site Flickr, online marketplaces, voter registration databases, government webpages, Wikipedia, Reddit, research repositories, news outlets and academic institutions. Plus, there are pirated content compilations and Web archives, which often contain data that have since been removed from their original location on the Web. And scraped databases do not go away. “If there was text scraped from a public website in 2018, that’s forever going to be available, whether [the site or post has] been taken down or not,” Dodge notes."

https://www-scientificamerican-com.cdn.ampproject.org/c/s/www.scientificamerican.com/article/your-personal-information-is-probably-being-used-to-train-generative-ai-models/?amp=true

PrivacyDigest, to ai
@PrivacyDigest@mas.to avatar

It’s a “fake PR stunt”: Artists hate Meta’s data deletion process | Ars Technica

This is a misconception. In reality, there is no functional way to opt out of Meta’s generative .

... In it, says it is “unable to process the request” until the requester submits evidence that their personal info appears in responses from Meta’s .

https://arstechnica.com/ai/2023/10/its-a-fake-pr-stunt-artists-hate-metas-ai-data-deletion-process/#p3

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "Like many data centers, the LLSC has seen a significant uptick in the number of AI jobs running on its hardware. Noticing an increase in energy usage, computer scientists at the LLSC were curious about ways to run jobs more efficiently. Green computing is a principle of the center, which is powered entirely by carbon-free energy.

Training an AI model — the process by which it learns patterns from huge datasets — requires using graphics processing units (GPUs), which are power-hungry hardware. As one example, the GPUs that trained GPT-3 (the precursor to ChatGPT) are estimated to have consumed 1,300 megawatt-hours of electricity, roughly equal to that used by 1,450 average U.S. households per month.

While most people seek out GPUs because of their computational power, manufacturers offer ways to limit the amount of power a GPU is allowed to draw. "We studied the effects of capping power and found that we could reduce energy consumption by about 12 percent to 15 percent, depending on the model," Siddharth Samsi, a researcher within the LLSC, says."

https://news.mit.edu/2023/new-tools-available-reduce-energy-that-ai-models-devour-1005

joycebell, to ai
@joycebell@mas.to avatar
joycebell, to ai
@joycebell@mas.to avatar

Authors are finding out that their books are being used to train AI without permission. https://authorsguild.org/news/you-just-found-out-your-book-was-used-to-train-ai-now-what/

maggiemaybe, to privacy
SteveThompson, to ai
@SteveThompson@mastodon.social avatar
rexi, to machinelearning
@rexi@mastodon.social avatar

https://www.pbs.org/video/ai-protection-1693683970/

"…you give it a large set of information and you ask it to detect a certain pattern…program can improve its performance based on number of trials and number of times, hence the term .

The crux of this matter when it comes to the and strike is that the large sets of data come from the that writers and actors have generated, and they have not been compensated for any that has been done on that data…"

pmj, to ai
@pmj@social.pmj.rocks avatar

stop using immediately!!
they basically steal your personality, your manners, your gestures to make money out of it!
this is waaay beyond text or images!
and you can't opt-out!
https://zoomai.info/

gianmarcogg03, to ai
@gianmarcogg03@mastodon.uno avatar

One more reason to ditch : they changed their ToS to pretend they won't take user data to train third-party AIs, even though the rest of the ToS pretty much says they can do that and beyond. Sounds like the usual Google/Apple "backing down but not really" stunt.

https://www.computerworld.com/article/3704489/zoom-goes-for-a-blatant-genai-data-grab-enterprises-beware.html

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "The people who build generative AI have a huge influence on what it is good at, and who does and doesn’t benefit from it. Understanding how generative AI is shaped by the objectives, intentions, and values of its creators demystifies the technology, and helps us to focus on questions of accountability and regulation. In this explainer, we tackle one of the most basic questions: What are some of the key moments of human decision-making in the development of generative AI products? This question forms the basis of our current research investigation at Mozilla to better understand the motivations and values that guide this development process. For simplicity, let’s focus on text-generators like ChatGPT.

We can roughly distinguish between two phases in the production process of generative AI. In the pre-training phase, the goal is usually to create a Large Language Model (LLM) that is good at predicting the next word in a sequence (which can be words in a sentence, whole sentences, or paragraphs) by training it on large amounts of data. The resulting pre-trained model “learns” how to imitate the patterns found in the language(s) it was trained on.

This capability is then utilized by adapting the model to perform different tasks in the fine-tuning phase. This adjusting of pre-trained models for specific tasks is how new products are created. For example, OpenAI’s ChatGPT was created by “teaching” a pre-trained model — called GPT-3 — how to respond to user prompts and instructions. GitHub Copilot, a service for software developers that uses generative AI to make code suggestions, also builds on a version of GPT-3 that was fine-tuned on “billions of lines of code.”"

https://foundation.mozilla.org/en/blog/the-human-decisions-that-shape-generative-ai-who-is-accountable-for-what/

deltatux, to privacy
@deltatux@infosec.town avatar

With the uproar on social media over Zoom's recent privacy policy changes, the company tries to reassure users about what these changes mean. They go on to say:

To reiterate: we do not use audio, video, or chat content for training our models without customer consent.

https://blog.zoom.us/zooms-term-service-ai/

phillycodehound, to ai
@phillycodehound@masto.ai avatar

So over on Hacker News they report that Zoom is using user data to train their AI and there is no way to opt out. I mean there is a way... don't use Zoom. Though I'm going to keep using it. It's the best in class and pretty much everyone knows how to use it.

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "ChatGPT and other AI-powered bots may soon be "running out of text in the universe" that trains them to know what to say, an artificial intelligence expert and professor at the University of California, Berkeley says.

Stuart Russell said that the technology that hoovers up mountains of text to train artificial intelligence bots like ChatGPT is "starting to hit a brick wall." In other words, there's only so much digital text for these bots to ingest, he told an interviewer last week from the International Telecommunication Union, a UN communications agency.

This may impact the way generative AI developers collect data and train their technologies in the coming years, but Russell still thinks AI will replace humans in many jobs that he characterized in the interview as "language in, language out.""

https://www.businessinsider.com/ai-could-run-out-text-train-chatbots-chatgpt-llm-2023-7

cragsand, (edited ) to ai

I learned how to train LoRa AI models using open source StableDiffusion...

For the purpose of recreating the appearance of my 3D VR roleplaying character the results I got were amazingly good... almost frighteningly so.

I'll go through the process, results and some thoughts.

🧵 part 1of4

#stablediffusion #loramodel #characterart #ai #aiart #aitraining #sd #lora #tech #technews #vr #3d #vrchat #roleplay #neondivide #cragsand

cragsand, (edited )

So much of the majority of AI art I see is terrible, bad looking and deformed. But with the right instruction's it doesn't have to be.

To accurately create fictional characters, this is probably the best method I've seen so far. There's work involved for training a model like this and not something you can just give a bunch of prompts and expect good results.

I started by gathering 64 screenshots of my 3D VRChat model from Blender in various positions and angles in different lighting while wearing select clothing of choice. Then I added proper tags describing each image in a respective text file.

Based on the training data and they keywords I specified, you can input various clothing alternatives including:

  • armor
  • jacket
  • shirt
  • barechest
    Training took about 30 minutes using an RTX2080Ti GPU.

cragsand, (edited )

These guides were really useful for explaining complex concepts without having the understanding of the mathematics involved. It tends to get really complicated the deeper you delve into it.
https://rentry.org/lora_train
https://rentry.org/59xed3

The GUI for stable diffusion I'm using is completely open source written in Python for everyone to use https://github.com/AUTOMATIC1111/stable-diffusion-webui

I posted the LoRa AI training model publicly for download here for anyone curious: https://civitai.com/models/70408/cragsand-character-lora-sten-berglund

Since I really liked the results I ended retouching some parts manually. Things like eye color, fingers and random clutter where there were details that looked weird.

The end results are available in this gallery here: https://www.deviantart.com/cragsand/gallery/87722291/sten-berglund-character-artwork

cragsand, (edited )

End thoughts...

With this technology gaining traction I certainly sympathize with artists concerned about their profession being in danger. It's a topic worth discussing and what it's societal effects will be. I can certainly see it ending up being bad and requiring proper regulation.

One thought on my mind would be that these are "tools" for us to use, and as with any tools if they're good or bad ends up being determined by how they're used. It's not the technology itself that is the danger, but rather how corporations and bad interests may exploit it to the detriment of everyone else.

Curious to hear other's thoughts about this and how we can approach it in a way that is beneficial for everyone.

Thank you for reading the thread ! 💙

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • tacticalgear
  • magazineikmin
  • cubers
  • everett
  • rosin
  • Youngstown
  • ngwrru68w68
  • slotface
  • osvaldo12
  • Durango
  • kavyap
  • InstantRegret
  • DreamBathrooms
  • JUstTest
  • khanakhh
  • GTA5RPClips
  • normalnudes
  • thenastyranch
  • mdbf
  • ethstaker
  • modclub
  • Leos
  • tester
  • provamag3
  • cisconetworking
  • anitta
  • lostlight
  • All magazines