strypey, (edited ) to trustandsafety
@strypey@mastodon.nzoss.nz avatar

"While large platforms with robust trust & safety teams are able to be more discerning in their moderation..."

, , , 2023

https://cyber.fsi.stanford.edu/io/news/common-abuses-mastodon-primer

Are they though?

Centralised moderation teams often lack the context to know what they're looking at. Fediverse admins each take care of a small, well-defined bit of overall moderation; the bit that affects accounts on their server. They know what's acceptable in their community.

(1/3)

strypey,
@strypey@mastodon.nzoss.nz avatar

"... the incentive to over-block in the fediverse is more compelling than the risk of being held liable for CSAM on your server."

, , , 2023

https://cyber.fsi.stanford.edu/io/news/common-abuses-mastodon-primer

Exactly.

(3/3)

strypey,
@strypey@mastodon.nzoss.nz avatar

"Mastodon users probably aren’t aware of CSAM on the platform unless it leaks into their federated timelines. This can happen when a fellow user on their instance follows an account posting CSAM. Ways to handle this problem are few. Though users who follow CSAM-disseminating accounts can be suspended from an instance by administrators, they can easily set up a new account on another..."

, , , 2023

https://cyber.fsi.stanford.edu/io/news/common-abuses-mastodon-primer

(1/2)

strypey,
@strypey@mastodon.nzoss.nz avatar

"It is just much harder for a volunteer-run, distributed system to roll out protections like E2EE than a centralized company."

, , , 2023

https://cyber.fsi.stanford.edu/io/news/common-abuses-mastodon-primer

Explain the logic underlying that conclusion. Counterexample, the Matrix network. A distributed system, much of which is volunteer-run.

alecm, to ArtificialIntelligence

“Suffice it to say that everyone in possession of a copy of the LAION-5B images has hundreds if not thousands of instances of CSAM” | …so that’s 0.0001% of the content, then

So David Thiel at Stanford has posted a much-reported paper/story which tells us that the dataset which drives Stable Diffusion and a bunch of other AI systems, has scraped:

hundreds if not thousands of instances of CSAM (and a much larger number of instances of NCII more broadly)

https://www.threads.net/


…and it struck me to ask “how many images are there in LAION-5B so we can get a percentage?”

It turns out that the number of images in LAION-5B is five billion – hence the 5B:

LAION-5B was released in early 2022 by a German nonprofit that has received funding from several AI startups. The dataset comprises more than 5 billion images scraped from the web and accompanying captions. It’s an upgraded version of earlier AI training dataset, called LAION-400M, that was published by the same nonprofit a few months earlier and includes about 400 million images.

https://siliconangle.com/2023/12/20/researchers-find-csam-images-laion-5b-ai-training-dataset/


So if we generously interpret “…if not thousands…” to mean “five thousand” then some simple maths tells us that this is 0.0001% of the content, or literally “one in a million”.

This is the “needle in a haystack” ballpark – again, literally, if a heavyweight darning needle weighs 1 gram, then one million needles would weigh 1000kg, and the largest 4x4x8 haybales max-out at 2000lb / a little over 900kg.

The US Food & Drug Administration permits “defects” of up to “[an] Average of 9 mg or more rodent excreta pellets and/or pellet fragments per kilogram” – which works out as:

(9mg / 1kg) * 100 = 0.0009%

So there can be more than 9x more mouse poop in the flour which makes your bread, than there generously is CSAM in the LAION-5B dataset.

“But this is all guesswork on your part / One image is one too many…”

The numbers are all above. Feel free to nitpick. Pick your own percentages. The FDA acknowledges that that poop in food is unavoidable, and the unstated goal of “Zero CSAM in a scraped dataset” will probably likewise be unavoidable. Thiel himself acknowledges:

While it’s not surprising that a crawl of the public internet will contain some CSAM, there’s no reason to go gather data on that scale without appropriate safeguards. The project that seeded the LAION sets made some efforts to filter content with CLIP, but it didn’t do enough.

https://www.threads.net/


Perhaps some enterprising journalist should ask Thiel “how much would be enough?” and then go ask the FDA the same question?

https://www.addtoany.com/add_to/copy_link?linkurl=https%3A%2F%2Falecmuffett.com%2Farticle%2F108656&linkname=%E2%80%9CSuffice%20it%20to%20say%20that%20everyone%20in%20possession%20of%20a%20copy%20of%20the%20LAION-5B%20images%20has%20hundreds%20if%20not%20thousands%20of%20instances%20of%20CSAM%E2%80%9D%20%7C%20%E2%80%A6so%20that%E2%80%99s%200.0001%25%20of%20the%20content%2C%20thenhttps://www.addtoany.com/add_to/facebook?linkurl=https%3A%2F%2Falecmuffett.com%2Farticle%2F108656&linkname=%E2%80%9CSuffice%20it%20to%20say%20that%20everyone%20in%20possession%20of%20a%20copy%20of%20the%20LAION-5B%20images%20has%20hundreds%20if%20not%20thousands%20of%20instances%20of%20CSAM%E2%80%9D%20%7C%20%E2%80%A6so%20that%E2%80%99s%200.0001%25%20of%20the%20content%2C%20thenhttps://www.addtoany.com/add_to/linkedin?linkurl=https%3A%2F%2Falecmuffett.com%2Farticle%2F108656&linkname=%E2%80%9CSuffice%20it%20to%20say%20that%20everyone%20in%20possession%20of%20a%20copy%20of%20the%20LAION-5B%20images%20has%20hundreds%20if%20not%20thousands%20of%20instances%20of%20CSAM%E2%80%9D%20%7C%20%E2%80%A6so%20that%E2%80%99s%200.0001%25%20of%20the%20content%2C%20thenhttps://www.addtoany.com/add_to/mastodon?linkurl=https%3A%2F%2Falecmuffett.com%2Farticle%2F108656&linkname=%E2%80%9CSuffice%20it%20to%20say%20that%20everyone%20in%20possession%20of%20a%20copy%20of%20the%20LAION-5B%20images%20has%20hundreds%20if%20not%20thousands%20of%20instances%20of%20CSAM%E2%80%9D%20%7C%20%E2%80%A6so%20that%E2%80%99s%200.0001%25%20of%20the%20content%2C%20thenhttps://www.addtoany.com/add_to/email?linkurl=https%3A%2F%2Falecmuffett.com%2Farticle%2F108656&linkname=%E2%80%9CSuffice%20it%20to%20say%20that%20everyone%20in%20possession%20of%20a%20copy%20of%20the%20LAION-5B%20images%20has%20hundreds%20if%20not%20thousands%20of%20instances%20of%20CSAM%E2%80%9D%20%7C%20%E2%80%A6so%20that%E2%80%99s%200.0001%25%20of%20the%20content%2C%20thenhttps://www.addtoany.com/add_to/hacker_news?linkurl=https%3A%2F%2Falecmuffett.com%2Farticle%2F108656&linkname=%E2%80%9CSuffice%20it%20to%20say%20that%20everyone%20in%20possession%20of%20a%20copy%20of%20the%20LAION-5B%20images%20has%20hundreds%20if%20not%20thousands%20of%20instances%20of%20CSAM%E2%80%9D%20%7C%20%E2%80%A6so%20that%E2%80%99s%200.0001%25%20of%20the%20content%2C%20thenhttps://www.addtoany.com/add_to/twitter?linkurl=https%3A%2F%2Falecmuffett.com%2Farticle%2F108656&linkname=%E2%80%9CSuffice%20it%20to%20say%20that%20everyone%20in%20possession%20of%20a%20copy%20of%20the%20LAION-5B%20images%20has%20hundreds%20if%20not%20thousands%20of%20instances%20of%20CSAM%E2%80%9D%20%7C%20%E2%80%A6so%20that%E2%80%99s%200.0001%25%20of%20the%20content%2C%20thenhttps://www.addtoany.com/add_to/threads?linkurl=https%3A%2F%2Falecmuffett.com%2Farticle%2F108656&linkname=%E2%80%9CSuffice%20it%20to%20say%20that%20everyone%20in%20possession%20of%20a%20copy%20of%20the%20LAION-5B%20images%20has%20hundreds%20if%20not%20thousands%20of%20instances%20of%20CSAM%E2%80%9D%20%7C%20%E2%80%A6so%20that%E2%80%99s%200.0001%25%20of%20the%20content%2C%20thenhttps://www.addtoany.com/share

https://alecmuffett.com/article/108656

ubru, to random German

Ich erlaube mir, nochmal zu fragen:

Das Stanford Internet Observatory findet auf Mastodon in kurzer Zeit sehr viele pädophile Inhalte.

https://www.washingtonpost.com/politics/2023/07/24/twitter-rival-mastodon-rife-with-child-abuse-material-study-finds/

Werden die Probleme der dezentralisierten Kontrolle hier irgendwo diskutiert?

ubru, to random German

Das Stanford Internet Observatory findet auf Mastodon in kurzer Zeit sehr viel Inhalte zu sexuellem Missbrauch von Kindern.
https://www.washingtonpost.com/politics/2023/07/24/twitter-rival-mastodon-rife-with-child-abuse-material-study-finds/
Wird das hier irgendwo diskutiert?

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • mdbf
  • ngwrru68w68
  • modclub
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • megavids
  • GTA5RPClips
  • tacticalgear
  • normalnudes
  • tester
  • osvaldo12
  • everett
  • cubers
  • ethstaker
  • anitta
  • provamag3
  • Leos
  • cisconetworking
  • lostlight
  • All magazines