elevenhsoft, to System76 Polish
@elevenhsoft@mastodon.social avatar

Hello friends! New applet is coming soon....

This time I'm working on Ollama applet for our lovely :)

kevinctofel, to til
@kevinctofel@hachyderm.io avatar

🆕 blog post: May 10, 2024 - #TIL

Personalizing my local #AI, replacing paper towels, and the world's longest (unofficial) ski jump of 291 meters.
#Obsidian #Ollama #LLM

https://myconscious.stream/blog/May-10-2024-TIL

kevinctofel, to ai
@kevinctofel@hachyderm.io avatar

I had been meaning to look at using #Ollama for local #AI & today I caught this excellent video how-to. Still not sold on the current hype of AI but it would be foolish to overlook its place in our future.

Honestly, this little project was the most fun I’ve had in a while: installing/using different AI models, using AI in the #terminal, then in a #browser, even creating a persona for the prompts. And it’s all running locally (no internet needed!), so it’s #private. 🤓

https://youtube.com/watch?v=Wjrdr0NU4Sk&si=x5I0kG2iwZGefc7d

mjaschen, to php German
@mjaschen@digitalcourage.social avatar

Wochenrückblick, Ausgabe 39 (2024-18).

Diesmal mit

  • 🗺️ der Bikerouter Hall of Fame aller Supporterinnen bisher ❤️
  • 🚵‍♂️ neuen Reifen für das Crosser-Gravel-Dings und der ersten Ausfahrt damit – unter Anderem zu Deutschlands größter (!) Wüste (!!)
  • 💻 den Erkenntnissen von @atomicpoet zur Ergonomie beim Arbeiten mit dem Notebook vs. Desktop-Computer und wie sich das mit meinen Erfahrungen deckt
  • 🤖 ollama, welches die Möglichkeit bietet, LLMs bequem auf dem eigenen Rechner laufen zu lassen
  • 🐘 cachetool, einem Werkzeug zum Verwalten des PHP opcache
  • 🍎 dem Blick in mein macOS-Applications-Directory: diesmal gibt's den Blick auf alle Apps, die mit „F“ beginnen
  • 🛠️ noch mal Deployment des Blogs, das läuft jetzt tatsächlich mit einem simplen post-merge Hook in Git
  • 🔊 und wie immer Techno

https://www.marcusjaschen.de/blog/2024/2024-18/

davep, to python
@davep@fosstodon.org avatar

Doing some pre-dinner #Python coding, working some more on my toy #Textual #Ollama client for the #terminal.

https://www.youtube.com/watch?v=7dTwJQn_Ggw

obrhoff, to llm
@obrhoff@chaos.social avatar

The amazing thing about LLMs is how much knowledge they posess in their small size. The llama3-8b model, for instance, weighs only 4.7GB yet can still answer your questions about everything (despite some hallucinations).

pjk, to python
@pjk@www.peterkrupa.lol avatar

One thing you notice right away about LLMs is they bear a striking resemblance to that ubiquitous internet character, the reply-guy: they always have an answer, they are brimming with confidence, and they are often completely wrong.

It seemed only natural, then, to turn one into a full-time online version of itself, RGaaS (Reply-Guy as a Service), always on, always ready to reply to anything when summoned, with an “Actually,” followed by his (it’s always a “he”) opinion on what you should be thinking.

And so I built the Actually Bot.

https://www.peterkrupa.lol/wp-content/uploads/2024/05/actually_bot1.pngBasically, the way it works is that any user can reply to any Mastodon post (or I guess any post on the ActivityPub protocol) and tag the Actually Bot (@actuallybot) and it will reply with its most mediocre take on why everyone else is wrong, or at least not quite right.

The reply-guys can all move on to something else now, I have automated them out of a job.

This was a really fun project and has a lot of separate parts. First, I had to modify an LLM to give it the right “personality” for this job. To do this, I used Ollama, a tool for running open source LLMs locally, on your own machine. I used the “customized prompt feature”: You basically write a little file that Ollama uses to tweak the LLM of your choosing to give it certain characteristics. I went with the new Llama3:8b model from Meta as my base model, then put the following instructions in my modelfile:

FROM llama3PARAMETER temperature 3SYSTEM """You are receiving social media posts as prompts, and your job is to reply to these prompts. Please start all your replies with the word "actually". Reply as if you know more than everyone. You must recontextualize every prompt and correct or amend it, informing the prompter that they are wrong, even if ever so slightly. You write in short, sentences. Your replies must be short. Keep them to about three sentences. This is essential: keep your replies short. They must be under 500 characters."""

Then I ran the following command in the console:

ollama create actually_llama -f ./actually_llama

… and my model was ready to roll. Next, I needed a program to connect to the Ollama API to send the LLM prompts and get responses. Python was great for that, as both Ollama and Mastodon have solid Python libraries. Probably the slowest part was picking through Mastodon.py to figure out how the methods work and what exactly they return. It’s a very robust library with a million options, and fortunately it’s also extremely well documented, so while it was slow going, I was able to whack it together without too much trouble.

I’m not going to get into all the code here, but basically, I wrote a simple method that checks mentions, grabs the text of a post and the post it is replying to, and returns them for feeding into the LLM as the prompt.

Despite my very careful, detailed, and repetitive instructions to be sure replies are no more than 500 characters, LLMs can’t count, and they are very verbose, so I had to add a cleanup method that cuts the reply down to under 500 characters. Then I wrote another method for sending that cleaned-up prompt to Ollama and returning the response.

The main body starts off by getting input for the username and password for login, then it launches a while True loop that calls my two functions, checking every 60 seconds to see if there are any mentions and replying to them if there are.

OK it works! Now came the hard part, which was figuring out how to get to 100% uptime. If I want the Actually Bot to reply every time someone mentions it, I need it to be on a machine that is always on, and I was not going to leave my PC on for this (nor did I want it clobbering my GPU when I was in the middle of a game).

So my solution was this little guy:

https://www.peterkrupa.lol/wp-content/uploads/2024/05/lenovo.jpg… a Lenovo ThinkPad with a 3.3GHz quad-core i7 and 8gb of RAM. We got this refurbished machine when the pandemic was just getting going and it was my son’s constant companion for 18 months. It’s nice to be able to put it to work again. I put Ubuntu Linux on it and connected it to the home LAN.

I actually wasn’t even sure it would be able to run Llama3:8b. My workstation has an Nvidia GPU with 12gb of VRAM and it works fine for running modest LLMs locally, but this little laptop is older and not built for gaming and I wasn’t sure how it would handle such a heavy workload.

Fortunately, it worked with no problems. For running a chatbot, waiting 2 minutes for a reply is unacceptable, but for a bot that posts to social media, it’s well within range of what I was shooting for, and it didn’t seem to have any performance issues as far as the quality of the responses either.

The last thing I had to figure out was how to actually run everything from the Lenovo. I suppose I could have copied the Python files and tried to recreate the virtual environment locally, but I hate messing with virtual environments and dependencies, so I turned to the thing everyone says you should use in this situation: Docker.

This was actually great because I’d been wanting to learn how to use Docker for awhile but never had the need. I’d installed it earlier and used it to run the WebUI front end for Ollama, so I had a little bit of an idea how it worked, but the Actually Bot really made me get into its working parts.

So, I wrote a Docker file for my Python app, grabbed all the dependencies and plopped them into a requirements.txt file, and built the Docker image. Then I scr’d the image over to the Lenovo, spun up the container, and boom! The Actually Bot was running!

Well, OK, it wasn’t that simple. I basically had to learn all this stuff from scratch, including the console commands. And once I had the Docker container running, my app couldn’t connect to Ollama because it turns out, because Ollama is a server, I had to launch the container with a flag indicating that it shared the host’s network settings.

Then once I had the Actually Bot running, it kept crashing when people tagged it in a post that wasn’t a reply to another post. So, went back to the code, squashed bug, redeploy container, bug still there because I didn’t redeploy the container correctly. There was some rm, some prune, some struggling with the difference between “import” and “load” and eventually I got everything working.

Currently, the Actually Bot is sitting on two days of uninterrupted uptime with ~70 successful “Actually,” replies, and its little laptop home isn’t even on fire or anything!

Moving forward, I’m going to tweak a few things so I can get better logging and stats on what it’s actually doing so I don’t have to check its posting history on Mastodon. I just realized you can get all the output that a Python script running in a Docker container prints with the command docker logs [CONTAINER], so that’s cool.

The other thing I’d like to do is build more bots. I’m thinking about spinning up my own Mastodon instance on a cheap hosting space and loading it with all kinds of bots talking to each other. See what transpires. If Dead Internet Theory is real, we might as well have fun with it!

https://www.peterkrupa.lol/2024/05/01/actually-building-a-bot-is-fun/

image/jpeg

davep, to python
@davep@fosstodon.org avatar

Going on stream to tinker some more with an Ollama client I’m building for myself: https://www.youtube.com/watch?v=LzHUdfR4PRg

#Python #Terminal #Ollama #Textual

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

In case you are wondering, the new Microsoft mini LLM - phi3, can handle code generation, in this case, SQL.

I compared the runtime (locally on CPU) with respect to codellama:7B using Ollama, and surprisingly the Phi3 runtime was significantly slower.

davep, to python
@davep@fosstodon.org avatar

TIL Midlothian is in a different timezone from Hampshire.

#Python #Programming #Ollama

joe, to ai

Earlier this year, I started looking at how to run a fully on-prem AI. In February, I bought a machine to run the inference engine on and set up Tailscale (which works similarly to Hamachi) to connect to it remotely. If you want to use it remotely, there are a lot of options for native clients.

MacOS

My favorite client for MacOS is MindMac. You can buy it for under $30, it works with multiple models, servers, and server types, and it is easy to use.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot-2024-04-20-at-2.34.12%E2%80%AFPM.png?resize=1024%2C690&ssl=1

If you want to look further into it, you can check it out at mindmac.app.

Android

My favorite client for Android is Amallo. It is $23 and like MindMac, it works with multiple models, servers, and server types. My only complaint would be that uploading a base64-encoded image to the model doesn’t seem to work well.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/Screenshot_20240420-143906.png?resize=461%2C1024&ssl=1

If you want to look further into it, you can check it out at doppeltilde.com.

ipadOS

There is a version of Amallo for iPadOS but I have been liking Enchanted LLM more. If you like it, there is a version for macOS as well. It has the added benefit of being free.

https://i0.wp.com/jws.news/wp-content/uploads/2024/04/IMG_0088.jpg?resize=672%2C1024&ssl=1

If you want to look further into it, you can check it out at the project’s GitHub page.

Have any questions, comments, etc? Please feel free to drop a comment, below.

https://jws.news/2024/how-i-use-ai/

#AI #Amallo #Enchanted #LLM #MindMac #Ollama

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

New release to Ollama 🎉

A major release to Ollama - version 0.1.32 is out. The new version includes:
✅ Improvement of the GPU utilization and memory management to increase performance and reduce error rate
✅ Increase performance on Mac by scheduling large models between GPU and CPU
✅ Introduce native AI support in Supabase edge functions

More details on the release notes 👇🏼
https://github.com/ollama/ollama/releases

Image credit: release notes

#DataScience #MachineLearning #llm #ollama #llama #python

ironicbadger, to selfhosted
@ironicbadger@techhub.social avatar

I hooked the Enchanted LLM app up to my instance tonight. Running on the epyc box with an nvidia a4000.

I can’t notice a difference in speed between this and the real chat-gpt tbh. And I own the whole chain locally. Man this is cool!!

I even shared the ollama api endpoint with a buddy over Tailscale and now they’re whipping the llamas 🦙 api as well. Super fun times.

https://apps.apple.com/us/app/enchanted-llm/id6474268307

DavidMarzalC, to Symfony Spanish
@DavidMarzalC@mastodon.escepticos.es avatar

Encaramos el final de marzo con otro nuevo episodio de "Accesibilidad con Tecnologías libres", para los amigues.

En https://accesibilidadtl.gitlab.io/05 tenéis las notas del programa con los temas tratados y los enlaces.

Si seguís a este alias @6706483 , tendréis automáticamente las publicaciones del podcast dentro del Fediverso

Y por si aún no lo tenéis el feed RSS para vuestras aplicaciones es:
https://accesibilidadtl.gitlab.io/feed

En este episodio 05 participan entre otras personas:

Esperamos que os resulte interesante.

orhun, to rust
@orhun@fosstodon.org avatar

Here is how you can use ChatGPT in your terminal - with an interface! 🔥

🦾 tenere: TUI for LLMs written in Rust.

🚀 Supports ChatGPT, llama.cpp & ollama

🦀 Built with @ratatui_rs

⭐ GitHub: https://github.com/pythops/tenere

#rustlang #tui #ratatui #chatgpt #terminal #llm #ollama #llama

video/mp4

ironicbadger, to NixOS
@ironicbadger@techhub.social avatar

A fully self-hosted, totally offline, and local AI using and open-webui as a front-end for

my_actual_brain, to llm
@my_actual_brain@fosstodon.org avatar

I’m surprised at the performance an can run at on my 8700-cpu.

It’s a bit slow and I don’t think it’s worth getting a gpu just to make it run at a faster speed, but maybe in time I’ll reconsider it.

If I were going to get a , what would you recommend?

dliden, to emacs
@dliden@emacs.ch avatar

Has anyone here worked much with generators in #emacs ?

I am looking for a good solution for streaming outputs in my ollama-elisp-sdk project. I think there's a good angle using generators to make a workflow fairly similar to e.g. the OpenAI API. Not sure yet though.

#elisp #ollama #ai #emacs

joe, to random

This past month, I was talking about how I spent $528 to buy a machine with enough guts to run more demanding AI models in Ollama. That is good and all but if you are not on that machine (or at least on the same network), it has limited utility. So, how do you use it if you are at a library or a friend’s house? I just discovered Tailscale. You install the Tailscale app on the server and all of your client devices and it creates an encrypted VPN connection between them. Each device on your “tailnet” has 4 addresses you can use to reference it:

  • Machine name: my-machine
  • FQDN: my-machine.tailnet.ts.net
  • IPv4: 100.X.Y.Z
  • IPv6: fd7a:115c:a1e0::53

If you remember Hamachi from back in the day, it is kind of the spiritual successor to that.

https://i0.wp.com/jws.news/wp-content/uploads/2024/03/Screenshot-2024-03-04-at-2.37.06%E2%80%AFPM.png?resize=1024%2C592&ssl=1

There is no need to poke holes in your firewall or expose your Ollama install to the public internet. There is even a client for iOS, so you can run it on your iPad. I am looking forward to playing around with it some more.

https://jws.news/2024/joe-discovered-tailscale/

sesivany, (edited ) to ai
@sesivany@floss.social avatar

#Ollama is the easiest way to run local #AI I've tried so far. In 5 minutes you can have a chatbot running on a local model. Dozens of models and UIs to choose from.
Just the speed is not great, but what can I expect on an Intel-only laptop.

bmp, to random
@bmp@mastodon.sdf.org avatar

Completely forgot I had made this #fountainpen database a while ago when I was bored: https://codeberg.org/bmp/flock, it is written in Go, and was generated with #ollama if I remember correctly. Maybe I'll pick it up again, given that newer models seem to be better.

joe, to llm

Back in December, I paid $1,425 to replace my MacBook Pro to make my LLM research at all possible. That had an M1Pro CPU and 32GB of RAM, which (as I said previously) is kind of a bare minimum spec to run a useful local AI. I quickly wished I had enough RAM to run a 70B model, but you can’t upgrade Apple products after the fact and a 70B model needs 64GB of RAM. That led me to start looking for a second-hand Linux desktop that can handle a 70B model.

I ended up finding a 4yr-old HP Z4 G4 workstation with a Xeon® W-2125 Processor,128 GB of DDR4 2666 MHz RAM, a 512GB SAMSUNG nVme SSD, and a NVIDIA Quadro P4000 GPU with 8GB of GDDR5 GPU Memory. I bought it before Ollama released their Windows preview, so I planned to throw the latest Ubuntu LTS on it.

Going into this experiment, I was expecting that Ollama would thrash the GPU and the RAM but would use the CPU sparingly. I was not correct.

This is what the activity monitor looked like when I asked various models to tell me about themselves:

Mixtral

An ubuntu activity monitor while running mixtral

Llama2:70b

An ubuntu activity monitor while running Llama2:70b

Llama2:7b

An ubuntu activity monitor while running llama2:7b

Codellama

An ubuntu activity monitor while running codellama

The Xeon W-2125 has 8 threads and 4 cores, so I think that CPU1-CPU8 are threads. My theory going into this was that the models would go into memory and then the GPU would do all of the processing. The CPU would only be needed to serve the results back to the user. It looks like the full load is going to the CPU. For a moment, I thought that the 8 GB of video RAM was the limitation. That is why I tried running a 7b model for one of the tests. I am still not convinced that Ollama is even trying to use the GPU.

A screenshot of the "additional drivers" screen in ubuntu

I am using a proprietary Nvidia driver for the GPU but maybe I’m missing something?

I was recently playing around with Stability AI’s Stability Cascade. I might need to run those tests on this machine to see what the result is. It may be an Ollama-specific issue.

Have any questions, comments, or concerns? Please feel free to drop a comment, below. As a blanket warning, all of these posts are personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

https://jws.news/2024/hp-z4-g4-workstation/

ironicbadger, to selfhosted
@ironicbadger@techhub.social avatar

Did someone say self-hosted LLMs?

FreakyFwoof, to random

A long long time ago, @arfy made a lua script for Dolphin screen-readers that allowed you to type in plus or minus number of days and get the date. I just asked Dolphin Mixtral to do the same as an apple script using #Ollama running locally and it actually did it. It runs and works just as I wanted. Madness.

set numDays to text returned of (display dialog "Enter the number of days:" default answer "")
set targetDate to current date
set newDate to targetDate + numDays * days
display dialog "The future date will be: " & (newDate as string)

ramikrispin, to python
@ramikrispin@mstdn.social avatar

(1/3) Last Friday, I was planning to watch Masters of the Air ✈️, but my ADHD had different plans 🙃, and I ended up running a short POC and creating a tutorial for getting started with Ollama Python 🚀. The settings are available for both Docker 🐳 and locally.

TLDR: It is straightforward to run LLM models locally with the Ollama Python library. Models with up to ~7B parameters run smoothly with low compute resources.

#python #ollama #llama #mistral #llm #DataScience #ai

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • GTA5RPClips
  • magazineikmin
  • InstantRegret
  • thenastyranch
  • cubers
  • Youngstown
  • ethstaker
  • slotface
  • mdbf
  • rosin
  • Durango
  • kavyap
  • DreamBathrooms
  • megavids
  • khanakhh
  • tacticalgear
  • ngwrru68w68
  • cisconetworking
  • modclub
  • everett
  • osvaldo12
  • normalnudes
  • provamag3
  • anitta
  • tester
  • Leos
  • lostlight
  • All magazines