wholesomedonut,
@wholesomedonut@fosstodon.org avatar

There's a significant wave floating through the right now.

On behalf of all of 's mods and admins, we thank you for reporting them where they crop up.

It's hard to play whack-a-mole, because these spammers are popping up on random one-off instances and known, well-established ones.

badrihippo,
@badrihippo@fosstodon.org avatar

@wholesomedonut you're welcome. And thanks for the note; it helps to know that my reports are helping, because I wasn't sure if I should do that or just be silent and let it pass! Speaking of which, two new reports coming in 😛

amoroso,
@amoroso@fosstodon.org avatar

@wholesomedonut I wonder what the spammers think they can accomplish, given typical Fediverse users are highly unlikely to bait to spam.

wholesomedonut,
@wholesomedonut@fosstodon.org avatar

deleted_by_author

  • Loading...
  • amoroso,
    @amoroso@fosstodon.org avatar

    @wholesomedonut Maybe the spammers are just clueless? It wouldn't be the first time.

    @urusan

    urusan,
    @urusan@fosstodon.org avatar

    @amoroso @wholesomedonut A lot of the spam I'm getting right now is pictures of spam.

    Also, none of the spam I've gotten is advertising.

    So that points to one of these possibilities:

    • Malicious attack, they're deliberately taunting us
    • They think this is funny, as many people have suggested the sophistication of the attack is low so it could just be a script kiddie
    • This is some sort of experiment in AI attack generation, hence why it doesn't make any sense

    Or some combination of the above

    Mehrad,
    @Mehrad@fosstodon.org avatar

    @wholesomedonut @mike @fosstodon
    Is it possible to collect their link/profile and share it? I'm thinking that it can be a nice project to use machine learning for spam detection. The only issue now is limited data I have.

    We can build a tidy data out of these and then ask the community in a hackathon or fun competition fashion to come up with predictive models. This can be later published as a FLOSS tool for instance administration.

    badrihippo,
    @badrihippo@fosstodon.org avatar

    @Mehrad @wholesomedonut @mike @fosstodon ooh I like this idea!

    Mehrad,
    @Mehrad@fosstodon.org avatar

    @jerry I'm suggesting a project to collect data and attack the spam situation (read this 👆 thread)

    Would it be also possible for you and moderators on your instance to contribute to data collection?

    Also please suggest other admins that might be interested in this project an/or the outcome.

    Mehrad,
    @Mehrad@fosstodon.org avatar

    @jerry
    @fosstodon @mike @wholesomedonut

    Would it be possible to discuss this matter? Are you folks interested at all?

    This is a good opportunity to bring fediverse together and also get some media exposure behind it, unfortunately so far we have got ... well, see for yourself:

    https://techcrunch.com/2024/02/20/spam-attack-on-twitter-x-rival-mastodon-highlights-fediverse-vulnerabilities/

    I believe we fedizens collectively have enough skills to pull this off and also to turn it to something that benefits everyone.

    mike,
    @mike@fosstodon.org avatar

    @Mehrad Sure. What do you have in mind?

    @jerry @fosstodon @wholesomedonut

    Mehrad,
    @Mehrad@fosstodon.org avatar

    @mike @jerry @fosstodon @wholesomedonut

    I'm not sure if it is already late as spams are deleted/blocked, but my suggestion is simple:

    1. Collect spam data (the content, account names, domains, posting rates, etc)
    2. Have a small team to curate the data and add real accounts and post as control (optional)
    3. Retain some part of the data (e.g 25%) as scoring test set
    4. Create a competition/hackathon event and ask machine learning community (R Julia, Python, ...) to train and tune classifiers

    🧵👇

    Mehrad,
    @Mehrad@fosstodon.org avatar

    @mike @jerry @fosstodon @wholesomedonut

    I'm not sure if it is already late as spams are deleted/blocked, but my suggestion is simple:

    1. Collect spam data (the content, account names, domains, posting rates, etc)
    2. Have a small team to curate the data and add real accounts and post as control (optional)
    3. Retain some part of the data (e.g 25%) as scoring test set
    4. Create a competition/hackathon event and ask machine learning community (R Julia, Python, ...) to train and tune classifiers

    🧵👇

    Mehrad,
    @Mehrad@fosstodon.org avatar

    🧵👆
    @mike @jerry @fosstodon @wholesomedonut

    1. When deadline comes, we ask the teams to submit their model/method in form of XXX (either of docker, Nix shell, Guix shell, Apptainer, ... basically whatever the organizers of the event decides)
    2. We run the submissions and test them on our internal test set (from step 3)
    3. We announce the top 3 winners
    4. Use one or combination of the models to assist instances in detecting spams (posts, accounts, ...) and flag them to admins.

    🧵👇

    Mehrad,
    @Mehrad@fosstodon.org avatar

    🧵👆
    @mike @jerry @fosstodon @wholesomedonut

    This can be then become a FLOSS project, and with the help of fediverse platform developers be fitted as a plugin into them, or provide an API that these platforms can pass info to and get the "spam score".

    Competition participants might even give SpamAssassin a try (although we are dealing with slightly different type of data than email).

    Mehrad,
    @Mehrad@fosstodon.org avatar

    🧵👆
    @mike @jerry @fosstodon @wholesomedonut

    What this project needs:

    1. Contribution from some major instances for data collection and even ultimately testing
    2. Collection of organizers team (tech-savvy, data-savvy, ..., maybe even graphic designer)
    3. Few meetings to iron out the rough edges of the process
    4. Curating the data
    5. After the deadline use a pipeline to sequentially run the submissions and rank their performance.

    And hopefully some understanding that
    statistics == ML != AI (GPT)

    mike,
    @mike@fosstodon.org avatar

    @Mehrad Well, the data is pretty straight forward actually. All the accounts originated on instances with open registrations. The account names were randomized and exactly 10 characters each. There was no profile picture (/avatars/original/missing.png) and no header (/headers/original/missing.png). They followed nobody and had no followers (usually). They usually posted one of two images. Examples can be found with profile lm9ztxuz4q at instance don.neet.co.jp.

    @jerry @wholesomedonut

    Mehrad,
    @Mehrad@fosstodon.org avatar

    @mike @jerry @wholesomedonut

    Yes, this is indeed one type of spam (actually a stupid one because a relatively simple query to database and some frequency analysis to confirm randomness could easily detect those.

    But:

    1. There are (and for sure will be) actual spams. Like accounts who spam hashtags or tag others and etc.
    2. The media coverage portrait the recent issue as problematic and convoluted, and as illustration of Fediverse lack of spam detection
    Mehrad, (edited )
    @Mehrad@fosstodon.org avatar

    @mike @wholesomedonut

    Tbh, at some point we will get more advanced spams, and if we don't have a oven-ready (as Boris called it 🤣) system to be deployed, what we get is bad publicity, frustrated admins and moderators, and annoyed users.

    @thelinuxcast & @BrodieOnLinux were the first folks that I saw talking about spams as users, and @jerry and other frustrated admins and moderators.

    But do you think there is no necessity for :air_quotes_left: Fedi Spam Assassin :air_quotes_right: ?

    mike,
    @mike@fosstodon.org avatar

    @Mehrad Absolutely there's a need of some way to recognize spam and deal with it. This spam wave was stupid. It wasn't subtle and how it was spreading was obvious, and we still barely had the tools required to keep it in check. It required hours of work from moderation teams. A ridiculous amount actually.

    @wholesomedonut @thelinuxcast @BrodieOnLinux @jerry

    Mehrad,
    @Mehrad@fosstodon.org avatar

    @mike
    Many years back I submitted a research plan to Twitter to detect fake/spam/bot accounts based behavioral characteristics. They granted me developer access API for that and I pulled some data and did some work on it. It never materialized into a tangible result because I was also manually collecting data in form of lists to compare, and I never managed to have large enough cohort (twitter and people reporting was also shrinking my list).

    But here we have a chance if we form teams I guess.

    wholesomedonut,
    @wholesomedonut@fosstodon.org avatar

    deleted_by_author

  • Loading...
  • Mehrad,
    @Mehrad@fosstodon.org avatar

    @wholesomedonut @mike @fosstodon
    I've participated in multiple international ML competitions and I know a thing or two about the process and practices. Considering that Fediverse has high concentration of techy folks from different fields of expertise, I'd say it would be beneficial, fun and yet achievable to have a ML method for spam detection classification.

    What we need are:

    1. The question which is done already
    2. The data (training and test sets)
    3. Planning
    4. Infrastructure for scoring
    Mehrad,
    @Mehrad@fosstodon.org avatar

    @wholesomedonut @mike @fosstodon

    For the infrastructure, that would be easy to pull off as we want the ultimate model to be lightweight, fast, and deployable on VPS instead of beefy computer cluster. So we can allocate few normal computers (I have one spare Core i5) to run these models.

    The models can come in form of docker, apptainer, nix, or guix containers (we should settle on one) and they can be run sequentially.

    It's straightforward and doable. 😊

  • All
  • Subscribed
  • Moderated
  • Favorites
  • fediverse
  • ngwrru68w68
  • DreamBathrooms
  • modclub
  • GTA5RPClips
  • InstantRegret
  • magazineikmin
  • Youngstown
  • thenastyranch
  • rosin
  • slotface
  • mdbf
  • khanakhh
  • kavyap
  • Leos
  • tester
  • normalnudes
  • provamag3
  • cisconetworking
  • osvaldo12
  • everett
  • Durango
  • tacticalgear
  • anitta
  • megavids
  • ethstaker
  • cubers
  • JUstTest
  • lostlight
  • All magazines