ghalfacree,
@ghalfacree@mastodon.social avatar

Google, following the industry trend to AI-all-the-things, has released Magika - a machine learning model which can identify file types. It claims it can outperform traditional methods by 20 per cent.

I pitted it against BSD File on something I figured Google hadn't included in its million-file-strong corpus: CU Amiga's Mega CD-ROM coverdisc from November 1995.

Magika identified... one file correctly, a plain-text document. File? File got 'em all, and quicker too.

(An unfair test, I know!)

A screenshot of a terminal session in which BSD File, a tool for identifying file types without machine learning, is run against the root directory of a CU Amiga coverdisc. Every single file is correctly identified.

feld,
@feld@bikeshed.party avatar

deleted_by_author

  • Loading...
  • ghalfacree,
    @ghalfacree@mastodon.social avatar

    @feld That's fair: my experience of libmagic has (thankfully) been purely as a consumer.

    lanodan,
    @lanodan@queer.hacktivis.me avatar

    @feld @ghalfacree That said if Unix would have a standard way to point commands to a shared memory location, then file(1) could be used instead of libmagic, avoiding a lot of issues.

    lhp,
    @lhp@mastodon.social avatar

    @ghalfacree To me it's so weird that anyone would even try to use machine learning for this. Look at the magic number, the extension, the first few lines of the file or the immediate directory it's in and in most cases you immediately know what kind file you have. If that fails, sure feed it to a model, but why skip the proven heuristics? The answer likely is VC money, but still...

    bluGill,
    bluGill avatar

    @lhp

    @ghalfacree people have been using ai innappropriatly for decades. Expert systems once got abused the same way.

    lhp, (edited )
    @lhp@mastodon.social avatar

    @ghalfacree "What do you mean we can solve this with a few if-else's? No, obviously we have to multiply high-dimensional vectors and matrices!"

    Elucidating,
    @Elucidating@mastodon.social avatar

    @lhp @ghalfacree To be fair, even File is much, much more sophisticated than that in many cases. Stopping at those is "half-assing it."

    ghalfacree,
    @ghalfacree@mastodon.social avatar

    Oh, and how long did each take to run?

    Magika:
    real: 0m0.289s
    user: 0m1.282s
    sys: 0m0.717s

    File:
    real: 0m0.006s
    user: 0m0.003s
    sys: 0m0.003s

    The Future™!

    jacqueline,
    @jacqueline@chaos.social avatar

    @ghalfacree losing it at this ai nonsense encountering an icon and being like "oh yeah that's totally an iso file"

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • ngwrru68w68
  • rosin
  • GTA5RPClips
  • osvaldo12
  • love
  • Youngstown
  • slotface
  • khanakhh
  • everett
  • kavyap
  • mdbf
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • megavids
  • InstantRegret
  • normalnudes
  • tacticalgear
  • cubers
  • ethstaker
  • modclub
  • cisconetworking
  • Durango
  • anitta
  • Leos
  • tester
  • provamag3
  • JUstTest
  • All magazines