There's a significant #spam wave floating through the #Fediverse right now.... - Fediverse

wholesomedonut, 3 months ago

There's a significant #spam wave floating through the #Fediverse right now.

On behalf of all of #fosstodon 's mods and admins, we thank you for reporting them where they crop up.

It's hard to play whack-a-mole, because these spammers are popping up on random one-off instances and known, well-established ones.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mike

Image

Image alternative text

badrihippo, 3 months ago

@wholesomedonut you're welcome. And thanks for the note; it helps to know that my reports are helping, because I wasn't sure if I should do that or just be silent and let it pass! Speaking of which, two new reports coming in 😛

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amoroso, 3 months ago

@wholesomedonut I wonder what the spammers think they can accomplish, given typical Fediverse users are highly unlikely to bait to spam.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wholesomedonut, 3 months ago

deleted_by_author

Loading...

amoroso, 3 months ago

@wholesomedonut Maybe the spammers are just clueless? It wouldn't be the first time.

@urusan

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

urusan, 3 months ago

@amoroso @wholesomedonut A lot of the spam I'm getting right now is pictures of spam.

Also, none of the spam I've gotten is advertising.

So that points to one of these possibilities:

Malicious attack, they're deliberately taunting us

They think this is funny, as many people have suggested the sophistication of the attack is low so it could just be a script kiddie

This is some sort of experiment in AI attack generation, hence why it doesn't make any sense

Or some combination of the above

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mehrad, 3 months ago

@wholesomedonut @mike @fosstodon
Is it possible to collect their link/profile and share it? I'm thinking that it can be a nice project to use machine learning for spam detection. The only issue now is limited data I have.

We can build a tidy data out of these and then ask the community in a hackathon or fun competition fashion to come up with predictive models. This can be later published as a FLOSS tool for instance administration.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

badrihippo, 3 months ago

@Mehrad @wholesomedonut @mike @fosstodon ooh I like this idea!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mehrad, 3 months ago

@jerry I'm suggesting a project to collect data and attack the spam situation (read this 👆 thread)

Would it be also possible for you and moderators on your instance to contribute to data collection?

Also please suggest other admins that might be interested in this project an/or the outcome.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mehrad, 3 months ago

@jerry
@fosstodon @mike @wholesomedonut

Would it be possible to discuss this matter? Are you folks interested at all?

This is a good opportunity to bring fediverse together and also get some media exposure behind it, unfortunately so far we have got ... well, see for yourself:

https://techcrunch.com/2024/02/20/spam-attack-on-twitter-x-rival-mastodon-highlights-fediverse-vulnerabilities/

I believe we fedizens collectively have enough skills to pull this off and also to turn it to something that benefits everyone.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mike, 3 months ago

@Mehrad Sure. What do you have in mind?

@jerry @fosstodon @wholesomedonut

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mehrad, 3 months ago

@mike @jerry @fosstodon @wholesomedonut

I'm not sure if it is already late as spams are deleted/blocked, but my suggestion is simple:

Collect spam data (the content, account names, domains, posting rates, etc)

Have a small team to curate the data and add real accounts and post as control (optional)

Retain some part of the data (e.g 25%) as scoring test set

Create a competition/hackathon event and ask machine learning community (R Julia, Python, ...) to train and tune classifiers

🧵👇

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mehrad, 3 months ago

🧵👆
@mike @jerry @fosstodon @wholesomedonut

When deadline comes, we ask the teams to submit their model/method in form of XXX (either of docker, Nix shell, Guix shell, Apptainer, ... basically whatever the organizers of the event decides)

We run the submissions and test them on our internal test set (from step 3)

We announce the top 3 winners

Use one or combination of the models to assist instances in detecting spams (posts, accounts, ...) and flag them to admins.

🧵👇

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mehrad, 3 months ago

🧵👆
@mike @jerry @fosstodon @wholesomedonut

This can be then become a FLOSS project, and with the help of fediverse platform developers be fitted as a plugin into them, or provide an API that these platforms can pass info to and get the "spam score".

Competition participants might even give SpamAssassin a try (although we are dealing with slightly different type of data than email).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mehrad, 3 months ago

🧵👆
@mike @jerry @fosstodon @wholesomedonut

What this project needs:

Contribution from some major instances for data collection and even ultimately testing

Collection of organizers team (tech-savvy, data-savvy, ..., maybe even graphic designer)

Few meetings to iron out the rough edges of the process

Curating the data

After the deadline use a pipeline to sequentially run the submissions and rank their performance.

And hopefully some understanding that
statistics == ML != AI (GPT)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mike, 3 months ago

@Mehrad Well, the data is pretty straight forward actually. All the accounts originated on instances with open registrations. The account names were randomized and exactly 10 characters each. There was no profile picture (/avatars/original/missing.png) and no header (/headers/original/missing.png). They followed nobody and had no followers (usually). They usually posted one of two images. Examples can be found with profile lm9ztxuz4q at instance don.neet.co.jp.

@jerry @wholesomedonut

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mehrad, 3 months ago

@mike @jerry @wholesomedonut

Yes, this is indeed one type of spam (actually a stupid one because a relatively simple query to database and some frequency analysis to confirm randomness could easily detect those.

But:

There are (and for sure will be) actual spams. Like accounts who spam hashtags or tag others and etc.

The media coverage portrait the recent issue as problematic and convoluted, and as illustration of Fediverse lack of spam detection

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mehrad, 3 months ago (edited 3 months ago)

@mike @wholesomedonut

Tbh, at some point we will get more advanced spams, and if we don't have a oven-ready (as Boris called it 🤣) system to be deployed, what we get is bad publicity, frustrated admins and moderators, and annoyed users.

@thelinuxcast & @BrodieOnLinux were the first folks that I saw talking about spams as users, and @jerry and other frustrated admins and moderators.

But do you think there is no necessity for :air_quotes_left: Fedi Spam Assassin :air_quotes_right: ?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mike, 3 months ago

@Mehrad Absolutely there's a need of some way to recognize spam and deal with it. This spam wave was stupid. It wasn't subtle and how it was spreading was obvious, and we still barely had the tools required to keep it in check. It required hours of work from moderation teams. A ridiculous amount actually.

@wholesomedonut @thelinuxcast @BrodieOnLinux @jerry

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mehrad, 3 months ago

@mike
Many years back I submitted a research plan to Twitter to detect fake/spam/bot accounts based behavioral characteristics. They granted me developer access API for that and I pulled some data and did some work on it. It never materialized into a tangible result because I was also manually collecting data in form of lists to compare, and I never managed to have large enough cohort (twitter and people reporting was also shrinking my list).

But here we have a chance if we form teams I guess.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wholesomedonut, 3 months ago

deleted_by_author

Loading...

Mehrad, 3 months ago

@wholesomedonut @mike @fosstodon
I've participated in multiple international ML competitions and I know a thing or two about the process and practices. Considering that Fediverse has high concentration of techy folks from different fields of expertise, I'd say it would be beneficial, fun and yet achievable to have a ML method for spam detection classification.

What we need are:

The question which is done already

The data (training and test sets)

Planning

Infrastructure for scoring

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mehrad, 3 months ago

@wholesomedonut @mike @fosstodon

For the infrastructure, that would be easy to pull off as we want the ultimate model to be lightweight, fast, and deployable on VPS instead of beefy computer cluster. So we can allocate few normal computers (I have one spare Core i5) to run these models.

The models can come in form of docker, apptainer, nix, or guix containers (we should settle on one) and they can be run sequentially.

It's straightforward and doable. 😊

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment