rysiek,
@rysiek@mstdn.social avatar

Dear #AI #Fediverse, there's been some buzz recently about #LanguageModels that are not gigantic black boxes, and #MachineLearning in general, developed as #FLOSS.

There's this Google internal document, for example, that points out FLOSS community is close to eating Google's and OpenAI's cake:
ttps://www.semianalysis.com/p/google-we-have-no-moat-and-neither

So here is my question to you:

What are the best examples of useful, small, on-device models already out there?

:boost_requested:

rysiek,
@rysiek@mstdn.social avatar

My examples would be:

  1. Mozilla's in-browser automatic translation:
    https://www.mozilla.org/en-US/firefox/features/translate/

  2. Apple's OCR in iOS is another decent example (not FLOSS, but that's okay for what I need it for)
    https://support.apple.com/en-us/HT212630

  3. Not exactly "on-device", but Mastodon's OCR available when attaching images:
    https://github.com/mastodon/mastodon/issues/7419

Anything else?

Context: I am writing something about and I want to show examples of useful small models in order to help non-techies understand why they are important.

ljrk,
@ljrk@todon.eu avatar

@rysiek Yess, this so much! There's actually use in AI (despite me hating almost any AI tool out there). There's a huge perspective around personal empowerment and independence. Incidentally, those tools are usually also more ethical, IMHO, when it comes to how they're built or used (self promo
https://ljrk.codeberg.page/ethical-ai.html)

harshad,
@harshad@sharma.io avatar

@rysiek I've been using Whisper to transcribe voice notes to my journal.

rysiek, (edited )
@rysiek@mstdn.social avatar

@harshad am I missing something or is this neither small, nor FLOSS, nor self-hostable?

Edit: ah no, seems open source even though it's OpenAI
https://github.com/openai/whisper

Thanks!

harshad,
@harshad@sharma.io avatar

@rysiek it is self-hosted, and MIT licensed: https://github.com/openai/whisper/blob/main/LICENSE

As for size, see https://github.com/ggerganov/whisper.cpp , 75MB for smallest model seems reasonable.

rysiek,
@rysiek@mstdn.social avatar

@harshad amazing, thank you!

idlestate,

@rysiek
@harshad

Unless the training data are also free and the production models used to perform the work are substantially reproducible, then those models and the code to deploy them propagate power imbalances as fraught as proprietary object code: The model acts as a form of unauditable, AI-authored code.

Some folks in Debian did some work years ago, now, to try to parse out these issues, but scant notice has been afforded these challenges otherwise.

As a practical matter, the ability to bring enough computing to bear on running the training data through the training software offers a similar, but more quantitative rather than qualitative challenge to the abilities to make forks asserting typical software freedoms.

idlestate,

to be clear, by "scant notice" I mean in a FOSS-forward framing.

There is of course vigorous criticism of the misuses of AI. I wish not to contribute to how badly these have been disregarded more broadly, including but also beyond that frame.

@rysiek
@harshad

rysiek,
@rysiek@mstdn.social avatar

@idlestate @harshad yeah, and I wrote at length about it in Polish media:
https://oko.press/chatgpt-cala-prawda-o-wielkich-modelach-jezykowych

Now I am trying to find examples of smaller models that do as well or better than LLMs from Big Tech, to demonstrate the point that we might be able to do just fine without LLMs, regardless of what Big Tech is trying to convince us of.

idlestate,

@rysiek

(incidentally, speaking of language models, free or not, DeepL did much better rendering your abstract into English, as accessed through the F-Droid-provided free-frontend-to-proprietary-backend than when I pasted it into LibreText's web form.)

@harshad

rysiek,
@rysiek@mstdn.social avatar
idlestate,

@rysiek

I am keenly interested to know of such models. So far as I can see, though, whisper isn't it, despite the freedom of the code to deploy the model.

Model size may impact ease of deployment, but otherwise it's not clear to me how it relates to the challenges on the build/training side?

Been a while since I've touched it, but Mozilla Common Voice is the only effort I've seen so far that touches on the freedom of the training side

https://commonvoice.mozilla.org/en

@harshad

ryanfb,
@ryanfb@digipres.club avatar

@rysiek @harshad people have also been doing amazing things with improving Whisper performance, check out WhisperX https://github.com/m-bain/whisperX and whisper.cpp https://github.com/ggerganov/whisper.cpp

noodlejetski,
@noodlejetski@masto.ai avatar

@rysiek man, the fancy ML stuff in the photos app is one of the few things I envy iOS users.

OutOnTheMoors,
@OutOnTheMoors@beige.party avatar

@rysiek The Mastodon OCR is very helpful for encouraging AltText

rysiek,
@rysiek@mstdn.social avatar

@OutOnTheMoors indeed!

ralismark,

@rysiek there's some machine learning models for removing noise from microphone input, at least on Linux:

XavCC,
@XavCC@todon.eu avatar

@rysiek Voice to text, scribe
[french description,
https://scribe.cemea.org/pourquoi/https://gitlab.cemea.org/mallette/scribe
related: Vosk & Common Moz Voice
I'm sure it fits with you request

yesitsanna,
@yesitsanna@hachyderm.io avatar

@rysiek being able to search my iOS photo album for, say, "cats" is pretty cool.

rysiek,
@rysiek@mstdn.social avatar

@yesitsanna that's using Visual Look Up, right?

yesitsanna,
@yesitsanna@hachyderm.io avatar

@rysiek um i press the search button and type in "cat" :D

rysiek,
@rysiek@mstdn.social avatar

@yesitsanna haha, fair enough! :D

sv1,
@sv1@mastodon.social avatar

@rysiek Face recognition in the Digikam photo-management suite is quite handy for private collections, based on OpenCV, no networking necessary.

sqrt2,
@sqrt2@chaos.social avatar

@rysiek Mastodon's OCR is on-device, actually. It downloads the weights into IndexedDB once, but from there it's all tesseract.js in your browser.

rysiek,
@rysiek@mstdn.social avatar

@sqrt2 amazing!

sqrt2,
@sqrt2@chaos.social avatar

@rysiek It's pretty neat how much can be done in WASM. Like, lichess.org embeds very strong in-browser chess position analysis with Stockfish.js, which can do both classical evaluation and evaluation with an NNUE.

farshidhakimy,
@farshidhakimy@chaos.social avatar

@rysiek Samsung has OCR and "object cut-out" in their Gallery app too.

mathew,

@rysiek Apple's photo recognition too. It's not great, but it's steadily getting better — it can now do an OK job of recognizing birds and hedgehogs, for example.

neofreko,

@rysiek I'm going to add https://mlc.ai/mlc-llm/. It's not yet on the practical level tho.

blub,
@blub@norden.social avatar

@rysiek opendata.cam uses existing CV ML models on edge devices like nvidia jetson or coral tpu with SBC to count cars, people, ..

richard_merren,
@richard_merren@mastodon.social avatar

@rysiek Are we only talking about the large language models as "AI" or are we following the trend of rebranding everything that was "ML" last year to be included in this year's new "AI" hype category?

blub,
@blub@norden.social avatar

@rysiek Not NLP but voice recognition based on Voice dataset like

rysiek,
@rysiek@mstdn.social avatar

@blub thanks!

Are you aware of it being used anywhere "user-facing" yet?

blub,
@blub@norden.social avatar

@rysiek personal voice assistants like

rysiek,
@rysiek@mstdn.social avatar

@blub fantastic!

pre,

@rysiek As far as I can tell Facebook's "LLaMA" is the biggest open-sourced language model that people are running various variants of on their home GPUs.

They remain a black-box of course, nobody has any idea what's going on inside any of those multi-billion parameter confusions of virtual wires and matrices.

But the model is freely redistrible and you can read the value of every node at every microsecond if you want.

Just that nobody knows what any of them mean.

Sad that it's Facebook, but FB are pretty good at Open-Source software libraries.

https://ai.facebook.com/blog/large-language-model-llama-meta-ai/

rysiek,
@rysiek@mstdn.social avatar

@pre yeah, that's true. I've seen a rust thing too:
https://github.com/rustformers/llm

But I am guessing that is just some kind of wrapper over Facebook's LLaMa.

pre,

@rysiek People are taking the Llama model and further training it to be specialized at whatever their own itch is.

You can feed it all of your own Fedi posts and have a virtual Rysiek I guess? 🤷

rysiek,
@rysiek@mstdn.social avatar

@pre finally, I can optimize my shitposting!

ErikJonker,
@ErikJonker@mastodon.social avatar

@rysiek GPT4ALL (on a laptop) ? Although you could argue about it's usefulness.

rysiek,
@rysiek@mstdn.social avatar

@ErikJonker I'll have a look thanks!

  • All
  • Subscribed
  • Moderated
  • Favorites
  • ai
  • Durango
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • tacticalgear
  • khanakhh
  • Youngstown
  • mdbf
  • slotface
  • rosin
  • everett
  • ngwrru68w68
  • kavyap
  • InstantRegret
  • JUstTest
  • cubers
  • GTA5RPClips
  • cisconetworking
  • ethstaker
  • osvaldo12
  • modclub
  • normalnudes
  • provamag3
  • tester
  • anitta
  • Leos
  • megavids
  • lostlight
  • All magazines