my_actual_brain, to llm

I’m surprised at the performance an #llm can run at on my 8700-cpu.

It’s a bit slow and I don’t think it’s worth getting a gpu just to make it run at a faster speed, but maybe in time I’ll reconsider it.

If I were going to get a #gpu, what would you recommend?

#gpt #ollama #llama2

AdamBishop, to ai
@AdamBishop@floss.social avatar

😂 ha ha:

Researchers jailbreak AI chatbots with ASCII art -

  • ArtPrompt bypasses safety measures to unlock malicious queries

| Tom's Hardware

https://www.tomshardware.com/tech-industry/artificial-intelligence/researchers-jailbreak-ai-chatbots-with-ascii-art-artprompt-bypasses-safety-measures-to-unlock-malicious-queries

#ai #chatBots #ASCII #ChatGPT #Gemini #Clause #Llama2

wagesj45, to ai
@wagesj45@mastodon.jordanwages.com avatar

So #Steeve got a major upgrade recently. He moved from a #gptneo (2.4B) model to a #llama2 (7B) model. Trained on 300k messages from our private chat history, Steeve is way more capable of following the conversation now. He used to have some "favorite phrases" he would say a lot, and I'm seeing less of that. His vision and reading models also got upgraded, so he gets more detail about the links and memes we share. Long live Steeve! :steeve:

#ai #chatbot #llm #llama #gpt #transformers

osi, to StableDiffusion
@osi@opensource.org avatar

A new position paper cites insufficient evidence to effectively characterize the marginal risk of AI models like or XL relative to other technologies. Read more: https://opensource.org/blog/new-risk-assessment-framework-offers-clarity-for-open-ai-models

ed, to opensource
@ed@opensource.org avatar

Gemma is not !

ed,
@ed@opensource.org avatar

That said, Google's Prohibited Use Policy is an interesting read: the terms of use are not trying to capture the economic value generated by the modifications in exclusivity like the #LLama2 license does. Google's policy is all about reducing harm and risk. These raise good questions for the Definition of Open Source AI discussion
/cc @luis_in_brief
https://ai.google.dev/gemma/prohibited_use_policy

itnewsbot, to cs

Google goes “open AI” with Gemma, a free, open-weights chatbot family - Enlarge (credit: Google)

On Wednesday, Google announced a new ... - https://arstechnica.com/?p=2005035 -availableai

joe, to llm

Back in December, I paid $1,425 to replace my MacBook Pro to make my LLM research at all possible. That had an M1Pro CPU and 32GB of RAM, which (as I said previously) is kind of a bare minimum spec to run a useful local AI. I quickly wished I had enough RAM to run a 70B model, but you can’t upgrade Apple products after the fact and a 70B model needs 64GB of RAM. That led me to start looking for a second-hand Linux desktop that can handle a 70B model.

I ended up finding a 4yr-old HP Z4 G4 workstation with a Xeon® W-2125 Processor,128 GB of DDR4 2666 MHz RAM, a 512GB SAMSUNG nVme SSD, and a NVIDIA Quadro P4000 GPU with 8GB of GDDR5 GPU Memory. I bought it before Ollama released their Windows preview, so I planned to throw the latest Ubuntu LTS on it.

Going into this experiment, I was expecting that Ollama would thrash the GPU and the RAM but would use the CPU sparingly. I was not correct.

This is what the activity monitor looked like when I asked various models to tell me about themselves:

Mixtral

An ubuntu activity monitor while running mixtral

Llama2:70b

An ubuntu activity monitor while running Llama2:70b

Llama2:7b

An ubuntu activity monitor while running llama2:7b

Codellama

An ubuntu activity monitor while running codellama

The Xeon W-2125 has 8 threads and 4 cores, so I think that CPU1-CPU8 are threads. My theory going into this was that the models would go into memory and then the GPU would do all of the processing. The CPU would only be needed to serve the results back to the user. It looks like the full load is going to the CPU. For a moment, I thought that the 8 GB of video RAM was the limitation. That is why I tried running a 7b model for one of the tests. I am still not convinced that Ollama is even trying to use the GPU.

A screenshot of the "additional drivers" screen in ubuntu

I am using a proprietary Nvidia driver for the GPU but maybe I’m missing something?

I was recently playing around with Stability AI’s Stability Cascade. I might need to run those tests on this machine to see what the result is. It may be an Ollama-specific issue.

Have any questions, comments, or concerns? Please feel free to drop a comment, below. As a blanket warning, all of these posts are personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

https://jws.news/2024/hp-z4-g4-workstation/

#Llama2 #LLM #Mac #Ollama #Ubuntu

joe, (edited ) to ai

Around a year ago, I started hearing more and more about OpenAI‘s ChatGPT. I didn’t pay much attention to it until this past summer when I watched the intern use it where I would normally use Stack Overflow. After that, I started poking at it and created things like the Milwaukee Weather Limerick and a bot that translates my Mastodon toots to Klingon. Those are cool tricks but eventually, I realized that you could ask it for detailed datasets like “the details of every state park“, “a list of three-ingredient cocktails“, or “a CSV of counties in Wisconsin.” People are excited about getting it to write code for you or to do a realistic rendering of a bear riding a unicycle through the snow but I think that is just the tip of the iceberg in a world where it can do research for you.

The biggest limitation of something like ChatGPT, Copilot, or Bard is that your data leaves your control when you use the AI. I believe that the future of AI is AI that remains in your control. The only issue with running your own, local AI is that a large learning model (LLM) needs a lot of resources to run. You can’t do it on your old laptop. It can be done, though. Last month, I bought a new Macbook Pro with an M1Pro CPU and 32GB of unified RAM to test this stuff out.

If you are in a similar situation, Mozilla’s Llamafile project is a good first step. A llamafile can run on multiple CPU microarchitectures. It uses Cosmopolitan Libc to provide a single 4GB executable that can run on macOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD. It contains a web client, the model file, and the rule-based inference engine. You can just download the binary, execute it, and interact with it through your web browser. This has very limited utility, though.

So, how do you get from a proof of concept to something closer to ChatGPT or Bard? You are going to need a model, a rule-based inference engine or reasoning engine, and a client.

The Rule-Based Inference Engine

A rule-based inference engine is a piece of software that derives answers or conclusions based on a set of predefined rules and facts. You load models into it and it handles the interface between the model and the client. The two major players in the space are Llama.cpp and Ollama. Getting Ollama is as easy as downloading the software and running ollama run [model] from the terminal.

Screenshot of Ollama running in the terminal on MacOS

In the case of Ollama, you can even access it via an API to the inference engine.

A screenshot of Postman interacting with Ollama via a local JSON API

You will notice that the result isn’t easy to parse. Last week, Ollama announced Python and JavaScript libraries to make it much easier.

The Models

A model consists of numerous parameters that adjust during the learning process to improve its predictions. They employ learning algorithms that draw conclusions or predictions from past data. I’m going to be honest with you. This is the bit that I understand the least. The key attributes to be aware of with models are what it is trained on, how many parameters big the model is, and the model’s benchmark numbers.

If you browse Hugging Face or the Ollama model library, you will see that there are plenty of 7b, 13b, and 70b models. That number tells you how many parameters are in the model. Generally, a 70b model is going to be more competent than a 7b model. A 7b model has 7 billion parameters whereas a 70b model has 70 billion parameters. To give you a point of comparison, ChatGPT 4 reportedly has 1.76 trillion parameters.

The number of parameters isn’t the end-all-be-all, though. There are leaderboards and benchmarks (like HellaSwag, ARC, and TruthFulQA) for determining comparative model quality.

If you are running Ollama, downloading and running a new model is as easy as browsing the model library, finding the right one for your purposes, and running ollama run [model] from the terminal. You can manage the installed models from the Ollama Web UI also, though.

A screenshot from the Ollama Web UI, showing how to manage models

The Client

The client is what the user of the AI uses to interact with the rule-based inference engine. If you are using Ollama, the Ollama Web UI is a great option. It gives you a web interface that acts and behaves a lot like the ChatGPT web interface. There are also desktop clients like Ollamac and MacGPT but my favorite so far is MindMac. It not only gives you a nice way to switch from model to model but it also gives you the ability to switch between providers (Ollama, OpenAI, Azure, etc).

A screenshot of the MindMac settings panel, showing how to add new accounts

The big questions

I have a few big questions, right now. How well does Ollama scale from 1 user to 100 users? How do you finetune a model? How do you secure Ollama? Most interesting to me, how do you implement something like Stable Diffusion XL with this stack? I ordered a second-hand Xeon workstation off of eBay to try to answer some of these questions. In the workplace setting, I’m also curious what safeguards are needed to insulate the company from liability. These are all things that need addressing over time.

I created a new LLM / ML category here and I suspect that this won’t be my last post on the topic. As a blanket warning, all of these posts are personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

Have a question or comment? Please drop a comment, below.

https://jws.news/2024/ai-basics/

judeswae, to llm
@judeswae@toot.thoughtworks.com avatar

Prompting -chat-7B: What is your context window size?

Response: As a responsible AI language model, I don't have a "context window" in the classical sense, as I am not a physical device with a fixed window size.

Good to know that Llama2 has absolutely no self-awareness.

joe, to random

I just updated https://jws.dev: I added MacGPT, Ollamac, and Ollama Web UI to the Resources page

joe,

I got #Ollama running via a web interface that's running locally in docker. This ain't half bad.

#llama2 #ai

joe,

Llama2-uncensored isn't great at describing images, I guess.

joe,

I asked to generate https://gist.github.com/steinbring/737aa8633125e9850916df2e40c7bdec and it mostly worked. I know from experience that Plus wouldn't be able to do that. I pegged everything on my Mac while doing that, though.

kellogh, to random
@kellogh@hachyderm.io avatar

#fossil update: i'm working on getting local models running. first step is making a little UI for managing models. tbqh i'm loving that hamburger menu :)

kellogh,
@kellogh@hachyderm.io avatar

unfortunately, LLM's broad support for many models means that they need to reduce the surface area for API features. for the most part, it doesn't bug me that, e.g. i can't use GBNF grammars with #llama2, but we don't need them either. OTOH i do really miss not knowing how big the context width of a model is. that seems like a very reasonable feature for LLM to support (i may send a pull request, idk)

joe, to ai

I followed https://www.youtube.com/watch?v=Kg588OVYTiw to try to get Llama 2 working locally with Llama.cpp but no luck. 😒

Does anyone know how to fix it? I do have llama-2-13b-chat.ggmlv3.q4_0.bin downloaded into the root of the app.

#Llama #Llamacpp #AI

joe,

I found Ollama (https://ollama.ai/) and used that instead of Llama.cpp to get it running. Cool!

#LLM #AI #Llama2 #Ollama

judeswae, to LLMs
@judeswae@toot.thoughtworks.com avatar

Thank you @bboeckel for pointing me to this wonderful presentation by @simon called "Making Large Language Models work for you".

https://simonwillison.net/2023/Aug/27/wordcamp-llms/

It was a really insightful and condensed presentation of #LLMs. Probably the best presentation I’ve watched so far (although did not watch many on the topic).

I also got started in no time running #Llama2 locally thx to https://llm.datasette.io/

I’m really liking playing with the tools and not sending my data and prompts to the cloud.

itnewsbot, to machinelearning

Elon Musk’s new AI model doesn’t shy from questions about cocaine and orgies - Enlarge (credit: Getty Images | Benj Edwards)

On Saturday, Elo... - https://arstechnica.com/?p=1981276 #largelanguagemodels #largelanguagemodel #machinelearning #culturewars #anthropic #elonmusk #chatgpt #chatgtp #claude2 #twitter #biz#llama2 #grok #meta #woke #x.ai #ai

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "The argument against releasing model weights relies on the assumption that there will be no malicious corporate actors, says Biderman, which history suggests is misplaced. Encouraging companies to keep the details of their models secret is likely to lead to “serious downstream consequences for transparency, public awareness, and science,” she adds, and will mainly impact independent researchers and hobbyists.

But it’s unclear if Meta’s approach is really open enough to derive the benefits of open source. Open-source software is considered trustworthy and safe because people are able to understand and probe it, says Park. That’s not the case with Meta’s models, because the company has provided few details about its training data or training code.

The concept of open-source AI has yet to be properly defined, says Stefano Maffulli, executive director of the Open Source Initiative (OSI). Different organizations are using the term to refer to different things. “It’s very confusing, because everyone is using it to mean different shades of ‘publicly available something,’ ” he says.

For a piece of software to be open source, says Maffulli, the key question is whether the source code is publicly available and reusable for any purpose. When it comes to making AI freely reproducible, though, you may have to share training data, how you collected that data, training software, model weights, inference code, or all of the above. That raises a host of new challenges, says Maffulli, not least of which are privacy and copyright concerns around the training data."

https://spectrum.ieee.org/meta-ai

mikarv, to Futurology
@mikarv@someone.elses.computer avatar

Meta's #Llama 2 license has an unusual clause whereby they withdraw your right to use the model if you allege #Meta has breached your own IP rights by training their stuff on your intellectual property. #copyright #genai #LLama2

itnewsbot, to machinelearning

Facebook’s new AI stickers can generate Mickey Mouse holding a machine gun - Enlarge / A selection of AI-generated stickers created in Facebook Mess... - https://arstechnica.com/?p=1973122 #largelanguagemodels #facebookmessenger #machinelearning #imagesynthesis #mickeymouse #thesimpsons #aiethics #facebook #thepope #biz#disney #llama2 #metaai #elmo #meta #emu #ai

rml, to LLMs

I would run #LLaMA2 to play Dwarf Fortress for me and post everything thats happening to fedi if #LLMs weren't the only software to make blockchain appear reasonably efficient

jbzfn, to llm
@jbzfn@mastodon.social avatar

🦾 Petals – Run LLMs at home, BitTorrent-style
➥ petals.dev

「 You load a small part of the model, then join a network of people serving the other parts. Single‑batch inference runs at up to 6 tokens/sec for Llama 2 (70B) and up to 4 tokens/sec for Falcon (180B) — enough for chatbots and interactive apps 」

https://petals.dev/

#LLM #Llama2 #Falcon

rml, to random

If it weren't so bad for the environment I'd setup #LLaMA2 to play dwarf fortress and provide counsel to me, ther sovereign

nixCraft, to random
@nixCraft@mastodon.social avatar

Well, actually, yes. 😂

davidak,

@nixCraft seems with it's latest llama2-70b-oasst-sft-v10 model is better at this.

But also fails at other very basic logic tasks.

kellogh, to llm
@kellogh@hachyderm.io avatar

a while back i recall there being some tool for exploring a database of embeddings that lets you visualize and locate duplicates, etc. anyone know what it's called? #llm #llms #ai #llama2

fosslife, to ai
@fosslife@fosstodon.org avatar
  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • Durango
  • thenastyranch
  • osvaldo12
  • magazineikmin
  • GTA5RPClips
  • rosin
  • InstantRegret
  • Youngstown
  • slotface
  • ngwrru68w68
  • kavyap
  • khanakhh
  • DreamBathrooms
  • megavids
  • mdbf
  • everett
  • ethstaker
  • normalnudes
  • cisconetworking
  • tacticalgear
  • cubers
  • modclub
  • provamag3
  • Leos
  • tester
  • anitta
  • lostlight
  • All magazines