chikim

@chikim@mastodon.social

Love music, technology, accessibility! Faculty at Berklee College of Music 👨🏻‍💻🎹🐕‍🦺

This profile is from a federated server and may be incomplete. Browse more on the original instance.

chikim, 4 months ago to random

VOLlama v0.1.0-alpha.2 has many new features: able to set system message, Save and recall chat history, Copy and delete model to use as presets, system message and host address in persistent settings, and bug fixes!
https://github.com/chigkim/VOLlama/releases/tag/v0.1.0-alpha.2
@vick21 @freakyfwoof @tristan @KyleBorah @Bri

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ pitermach, Binder

chikim, 4 months ago to random

Let's try again! I haven't found any UI for local LLMs that isn't annoying to use with screen readers, so I just made one for myself for Ollama called VOLlama. lol Hope someone finds it useful.
Windows users: follow the instruction on the release page to install Ollama with Docker.
Mac user: Install Ollama using the instruction on ollama.ai. Also, the app is not signed.
https://github.com/chigkim/VOLlama/releases/tag/v0.1.0-alpha.1
@vick21 @freakyfwoof @tristan @KyleBorah @Bri

reply

expand (46)

collapse (46)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ FreakyFwoof, datajake1999, Binder, pitermach

chikim, 4 months ago

@FreakyFwoof You can use Llava, but Llava is more designed for processing image in language. I't's not going to be as good as other regular LLMs specifically designed for chat like openhermes, solar, neural-chat, zephyr, etc.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ FreakyFwoof

chikim, 4 months ago

@FreakyFwoof You can read while it's streaming. You don't have to wait for it to finish. Just shift tab and read. Only prolem is cursor gets reset every time a new word arrives. You can command+a and command+c and paste somewhere too. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof If you want to expose your llama to another machine on the network, type this in terminal and quit Ollama from menu extras and open it again.
launchctl setenv OLLAMA_HOST "0.0.0.0"

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof @FreakyFwoof I believe it'll persist, but not sure 100%.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Also different models produce different things, so you might want to try different ones. Some are even specifically designed not to avoid engaging in NSFW chat. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof That's interesting. I didn't know Zephyr was not censored. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Yeah, I thought about it, so it might happen.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof @vick21 Close VOLlama, copy a model: ollama cp zephyr andre. Open VOLlama, talk to andre first. Then talk to something else. With VOLlama open, delete andre: ollama rm andre. Then if you try to talk to andre in VOLlama, you should get an error.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof @vick21 Maybe Ollama or llama.cpp that ollama uses have caching. Unfortunately I don't have control over behind what's going on.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Re caching: did you clear the previous message before asking the same question to another model? If not, the newly selected model will receive all the messages, including responses from the previous model you used before the switch. For the newly selected model, it will seem as though it has already answered the questions, and you are asking the exact question again.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Ah, you used the new chat. I also discovered that bug in alpha.2 while implementing saving history. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof I think You'd have alot of fun with alpha.2. You can set system message like You're a funniest comedian. Make every response as funny as possible. lol Also you can copy model and name it Alex. You can also play with adjusting bunch of other parameters in modelfile like temperature that makes model more or less creative/wild. https://github.com/ollama/ollama/blob/main/docs/modelfile.md

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Tell your wife my apollogy for ugly UI. I have no idea how interface looks visually. I have to ask my wife and fix things. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof There are many web based clients out there with variety of accessibility issues. lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof If you go to their library, click model, and click tags. It'll show you the size.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Also you'll see like differetn quantized model with different sizes. Higher quantization means more accurate but bigger size. However, 7Bq8 is less accurate than 13Bq4. Parameter count like 7B 13B matters more. Also I wouldn't use less than Q4 unless it's absolutely necessary.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof They have 34B. Slower but more accurate. You have to decide how much youre willing to tolerate the slow speed.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof Also that reminds me... they updated their model recently to v1.6, so I would update it with ollama pull llava:13b.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago to random

I haven't found any UI for local LLMs that isn't annoying to use with screen readers, so I just made one for myself for Ollama called VOLlama. lol Hope someone finds it useful.
Unfortunately for Windows users, easiest way to run Ollama is with Docker, and set host address on VOLlama. Control+m on focusses on the model list, an esc focusses on the prompt.
Not signed app and dropbox link for now.
https://www.dropbox.com/scl/fo/prrxp913orq2m9wx44hul/h?rlkey=cce95nuevc3d48e846hd75rc4&dl=1
@vick21 @freakyfwoof @tristan @KyleBorah @Bri

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ppatel

chikim, 4 months ago

@pitermach No it's cross platform mac and windows. Mac is easier to get going, because Ollama is easier to setup on Mac. Anyways, Dropbox said Already too many people downloaded, so they disabled. the link. It can't be that many people. Ha, What a BS! I'll uploaded to github momentarily.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FreakyFwoof, 4 months ago to random

In 7 live-streams I've already racked up 23 hours 25 minutes.

reply

expand (9)

collapse (9)

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof @ZBennoui You'll be fine since you're used to how QT interfaces behave with VoiceOver. :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago to random

VOCR v2.0.0-alpha.13: It now supports GPT, Ollama, Llama.cpp. Use models submenu in settings. @vick21 @talon @pixelate @KyleBorah @FreakyFwoof @Bri @pitermach https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.13

reply

expand (37)

collapse (37)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ppatel, FreakyFwoof

chikim, 4 months ago

@FreakyFwoof @vick21 I don't need to change anything. Also if you have multiple multimodal odels, VOCR will ask you to choose which one to use. However, it's interesting Llava 34B is based on newer Llava v1.6 architecture, and Llama.cpp doesn't support it yet. There's a PR for partial support.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 months ago

@FreakyFwoof @vick21 Worth to try it! It's on their website, so why not. Make sure to update Ollama though because Llava v1.6 has new architecture. FYI, 34B need over 20GB. Make sure you have enough space for both SSD and Memory.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...