changelog, to opensource
@changelog@changelog.social avatar

🗞 New episode of Changelog News!

💰 Rune’s $100k for indie game devs
🤲 The Zed editor is now open source
🦙 Ollama’s new JS & Python libs
🤝 @tekknolagi's Scrapscript story
🗒️ Pooya Parsa's notes from a tired maintainer
🎙 hosted by @jerod

🎧 https://changelog.com/news/79

joe, (edited ) to ai

Around a year ago, I started hearing more and more about OpenAI‘s ChatGPT. I didn’t pay much attention to it until this past summer when I watched the intern use it where I would normally use Stack Overflow. After that, I started poking at it and created things like the Milwaukee Weather Limerick and a bot that translates my Mastodon toots to Klingon. Those are cool tricks but eventually, I realized that you could ask it for detailed datasets like “the details of every state park“, “a list of three-ingredient cocktails“, or “a CSV of counties in Wisconsin.” People are excited about getting it to write code for you or to do a realistic rendering of a bear riding a unicycle through the snow but I think that is just the tip of the iceberg in a world where it can do research for you.

The biggest limitation of something like ChatGPT, Copilot, or Bard is that your data leaves your control when you use the AI. I believe that the future of AI is AI that remains in your control. The only issue with running your own, local AI is that a large learning model (LLM) needs a lot of resources to run. You can’t do it on your old laptop. It can be done, though. Last month, I bought a new Macbook Pro with an M1Pro CPU and 32GB of unified RAM to test this stuff out.

If you are in a similar situation, Mozilla’s Llamafile project is a good first step. A llamafile can run on multiple CPU microarchitectures. It uses Cosmopolitan Libc to provide a single 4GB executable that can run on macOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD. It contains a web client, the model file, and the rule-based inference engine. You can just download the binary, execute it, and interact with it through your web browser. This has very limited utility, though.

So, how do you get from a proof of concept to something closer to ChatGPT or Bard? You are going to need a model, a rule-based inference engine or reasoning engine, and a client.

The Rule-Based Inference Engine

A rule-based inference engine is a piece of software that derives answers or conclusions based on a set of predefined rules and facts. You load models into it and it handles the interface between the model and the client. The two major players in the space are Llama.cpp and Ollama. Getting Ollama is as easy as downloading the software and running ollama run [model] from the terminal.

Screenshot of Ollama running in the terminal on MacOS

In the case of Ollama, you can even access it via an API to the inference engine.

A screenshot of Postman interacting with Ollama via a local JSON API

You will notice that the result isn’t easy to parse. Last week, Ollama announced Python and JavaScript libraries to make it much easier.

The Models

A model consists of numerous parameters that adjust during the learning process to improve its predictions. They employ learning algorithms that draw conclusions or predictions from past data. I’m going to be honest with you. This is the bit that I understand the least. The key attributes to be aware of with models are what it is trained on, how many parameters big the model is, and the model’s benchmark numbers.

If you browse Hugging Face or the Ollama model library, you will see that there are plenty of 7b, 13b, and 70b models. That number tells you how many parameters are in the model. Generally, a 70b model is going to be more competent than a 7b model. A 7b model has 7 billion parameters whereas a 70b model has 70 billion parameters. To give you a point of comparison, ChatGPT 4 reportedly has 1.76 trillion parameters.

The number of parameters isn’t the end-all-be-all, though. There are leaderboards and benchmarks (like HellaSwag, ARC, and TruthFulQA) for determining comparative model quality.

If you are running Ollama, downloading and running a new model is as easy as browsing the model library, finding the right one for your purposes, and running ollama run [model] from the terminal. You can manage the installed models from the Ollama Web UI also, though.

A screenshot from the Ollama Web UI, showing how to manage models

The Client

The client is what the user of the AI uses to interact with the rule-based inference engine. If you are using Ollama, the Ollama Web UI is a great option. It gives you a web interface that acts and behaves a lot like the ChatGPT web interface. There are also desktop clients like Ollamac and MacGPT but my favorite so far is MindMac. It not only gives you a nice way to switch from model to model but it also gives you the ability to switch between providers (Ollama, OpenAI, Azure, etc).

A screenshot of the MindMac settings panel, showing how to add new accounts

The big questions

I have a few big questions, right now. How well does Ollama scale from 1 user to 100 users? How do you finetune a model? How do you secure Ollama? Most interesting to me, how do you implement something like Stable Diffusion XL with this stack? I ordered a second-hand Xeon workstation off of eBay to try to answer some of these questions. In the workplace setting, I’m also curious what safeguards are needed to insulate the company from liability. These are all things that need addressing over time.

I created a new LLM / ML category here and I suspect that this won’t be my last post on the topic. As a blanket warning, all of these posts are personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

Have a question or comment? Please drop a comment, below.

https://jws.news/2024/ai-basics/

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

After a long night, a short tutorial for getting started with the Ollama Python version is now available here:

https://github.com/RamiKrispin/ollama-poc

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

Running Mistral LLM locally with Ollama's 🦙 new Python 🐍 library inside a dockerized 🐳 environment with the allocation of 4 CPUs and 8 GB RAM. It took 19 sec to get a response 🚀. The last time I tried to run LLM locally, it took 10 minutes to get a response 🤯

joe, to ai
@joe@toot.works avatar

Anyone out there dabbling with on-prem AI? All of the numbers that I’m seeing for RAM requirements on 7b, 13b, 70b models seem to be correct for a 1-user scenario but I’m curious what folks are seeing for 2, 10, or 50 users.

joe, to random
@joe@toot.works avatar

I really need to work on diagramming out how to deploy at scale, this weekend.

boilingsteam, to linux
@boilingsteam@mastodon.cloud avatar
joe, to random
@joe@toot.works avatar

I just updated https://jws.dev: I added MacGPT, Ollamac, and Ollama Web UI to the Resources page

joe,
@joe@toot.works avatar

I got running via a web interface that's running locally in docker. This ain't half bad.

joe,
@joe@toot.works avatar

My next big goal is to convince my employer to host on a machine that is within the corporate network. I asked IT if they have a box kicking around with "a fantastic amount of RAM in it" for a proof-of-concept. I'm curious how this thing will run with a load on it.

joe, to ai
@joe@toot.works avatar

I followed https://www.youtube.com/watch?v=Kg588OVYTiw to try to get Llama 2 working locally with Llama.cpp but no luck. 😒

Does anyone know how to fix it? I do have llama-2-13b-chat.ggmlv3.q4_0.bin downloaded into the root of the app.

joe,
@joe@toot.works avatar

I found Ollama (https://ollama.ai/) and used that instead of Llama.cpp to get it running. Cool!

joe,
@joe@toot.works avatar
bkoehn, to random
@bkoehn@hachyderm.io avatar

It’s kind of amazing what you can build with and . In a few clicks you can make the mother of all email classifiers.

Mrw, to homelab
@Mrw@hachyderm.io avatar

Saturday.
The goal today was to get deployed on the cluster. That’s a fun way to run your own models on whatever accelerators you have handy. It’ll run on your CPU, sure, but man is it slow.

Nvidia now ships a GPU operator, which handles annotating nodes and managing the resource type. “All you need to do” — the most dangerous phrase in computers — is smuggle the GPUs through whatever virtualization you’re doing, and expose them to containerd properly.

But I got there! Yay.

greg, to homeassistant
@greg@clar.ke avatar

W̶a̶k̶e̶ ̶o̶n̶ ̶L̶A̶N̶ Wake-on-Zigbee

Maintaining Wake-on-LAN on a dual-boot Windows 10 / Ubuntu 22.04LTS system is a hassle. So I went with a simple Fingerbot solution. Now I have Wake-on-Zigbee!

By default the system boots into Ubuntu which hosts an Ollama server and does some video compression jobs (I wanted to be able to start those remotely). I only use Windows for VR gaming when I'm physically in the room and therefore can select the correct partition at boot.

Using Zigbee Fingerbot to turn on PC

boilingsteam, to foss
@boilingsteam@mastodon.cloud avatar
  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • GTA5RPClips
  • DreamBathrooms
  • cubers
  • mdbf
  • everett
  • magazineikmin
  • Durango
  • Youngstown
  • rosin
  • slotface
  • modclub
  • kavyap
  • ethstaker
  • megavids
  • ngwrru68w68
  • thenastyranch
  • cisconetworking
  • khanakhh
  • osvaldo12
  • InstantRegret
  • Leos
  • tester
  • tacticalgear
  • normalnudes
  • provamag3
  • anitta
  • lostlight
  • All magazines