@chikim@mastodon.social avatar

chikim

@chikim@mastodon.social

Love music, technology, accessibility! Faculty at Berklee College of Music 👨🏻‍💻🎹🐕‍🦺

This profile is from a federated server and may be incomplete. Browse more on the original instance.

chikim, to random
@chikim@mastodon.social avatar

VOLlama v0.1.0-alpha.2 has many new features: able to set system message, Save and recall chat history, Copy and delete model to use as presets, system message and host address in persistent settings, and bug fixes!
https://github.com/chigkim/VOLlama/releases/tag/v0.1.0-alpha.2
@vick21 @freakyfwoof @tristan @KyleBorah @Bri

chikim, to random
@chikim@mastodon.social avatar

Let's try again! I haven't found any UI for local LLMs that isn't annoying to use with screen readers, so I just made one for myself for Ollama called VOLlama. lol Hope someone finds it useful.
Windows users: follow the instruction on the release page to install Ollama with Docker.
Mac user: Install Ollama using the instruction on ollama.ai. Also, the app is not signed.
https://github.com/chigkim/VOLlama/releases/tag/v0.1.0-alpha.1
@vick21 @freakyfwoof @tristan @KyleBorah @Bri

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof You can use Llava, but Llava is more designed for processing image in language. I't's not going to be as good as other regular LLMs specifically designed for chat like openhermes, solar, neural-chat, zephyr, etc.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof You can read while it's streaming. You don't have to wait for it to finish. Just shift tab and read. Only prolem is cursor gets reset every time a new word arrives. You can command+a and command+c and paste somewhere too. lol

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof If you want to expose your llama to another machine on the network, type this in terminal and quit Ollama from menu extras and open it again.
launchctl setenv OLLAMA_HOST "0.0.0.0"

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof @FreakyFwoof I believe it'll persist, but not sure 100%.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Also different models produce different things, so you might want to try different ones. Some are even specifically designed not to avoid engaging in NSFW chat. lol

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof That's interesting. I didn't know Zephyr was not censored. lol

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Yeah, I thought about it, so it might happen.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof @vick21 Close VOLlama, copy a model: ollama cp zephyr andre. Open VOLlama, talk to andre first. Then talk to something else. With VOLlama open, delete andre: ollama rm andre. Then if you try to talk to andre in VOLlama, you should get an error.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof @vick21 Maybe Ollama or llama.cpp that ollama uses have caching. Unfortunately I don't have control over behind what's going on.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Re caching: did you clear the previous message before asking the same question to another model? If not, the newly selected model will receive all the messages, including responses from the previous model you used before the switch. For the newly selected model, it will seem as though it has already answered the questions, and you are asking the exact question again.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Ah, you used the new chat. I also discovered that bug in alpha.2 while implementing saving history. lol

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof I think You'd have alot of fun with alpha.2. You can set system message like You're a funniest comedian. Make every response as funny as possible. lol Also you can copy model and name it Alex. You can also play with adjusting bunch of other parameters in modelfile like temperature that makes model more or less creative/wild. https://github.com/ollama/ollama/blob/main/docs/modelfile.md

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Tell your wife my apollogy for ugly UI. I have no idea how interface looks visually. I have to ask my wife and fix things. lol

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof There are many web based clients out there with variety of accessibility issues. lol

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof If you go to their library, click model, and click tags. It'll show you the size.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Also you'll see like differetn quantized model with different sizes. Higher quantization means more accurate but bigger size. However, 7Bq8 is less accurate than 13Bq4. Parameter count like 7B 13B matters more. Also I wouldn't use less than Q4 unless it's absolutely necessary.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof They have 34B. Slower but more accurate. You have to decide how much youre willing to tolerate the slow speed.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof Also that reminds me... they updated their model recently to v1.6, so I would update it with ollama pull llava:13b.

chikim, to random
@chikim@mastodon.social avatar

I haven't found any UI for local LLMs that isn't annoying to use with screen readers, so I just made one for myself for Ollama called VOLlama. lol Hope someone finds it useful.
Unfortunately for Windows users, easiest way to run Ollama is with Docker, and set host address on VOLlama. Control+m on focusses on the model list, an esc focusses on the prompt.
Not signed app and dropbox link for now.
https://www.dropbox.com/scl/fo/prrxp913orq2m9wx44hul/h?rlkey=cce95nuevc3d48e846hd75rc4&dl=1
@vick21 @freakyfwoof @tristan @KyleBorah @Bri

chikim,
@chikim@mastodon.social avatar

@pitermach No it's cross platform mac and windows. Mac is easier to get going, because Ollama is easier to setup on Mac. Anyways, Dropbox said Already too many people downloaded, so they disabled. the link. It can't be that many people. Ha, What a BS! I'll uploaded to github momentarily.

FreakyFwoof, to random

In 7 live-streams I've already racked up 23 hours 25 minutes.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof @ZBennoui You'll be fine since you're used to how QT interfaces behave with VoiceOver. :)

chikim, to random
@chikim@mastodon.social avatar

VOCR v2.0.0-alpha.13: It now supports GPT, Ollama, Llama.cpp. Use models submenu in settings. @vick21 @talon @pixelate @KyleBorah @FreakyFwoof @Bri @pitermach https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.13

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof @vick21 I don't need to change anything. Also if you have multiple multimodal odels, VOCR will ask you to choose which one to use. However, it's interesting Llava 34B is based on newer Llava v1.6 architecture, and Llama.cpp doesn't support it yet. There's a PR for partial support.

chikim,
@chikim@mastodon.social avatar

@FreakyFwoof @vick21 Worth to try it! It's on their website, so why not. Make sure to update Ollama though because Llava v1.6 has new architecture. FYI, 34B need over 20GB. Make sure you have enough space for both SSD and Memory.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • anitta
  • InstantRegret
  • mdbf
  • ngwrru68w68
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • osvaldo12
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • JUstTest
  • tacticalgear
  • ethstaker
  • provamag3
  • cisconetworking
  • tester
  • GTA5RPClips
  • cubers
  • everett
  • modclub
  • megavids
  • normalnudes
  • Leos
  • lostlight
  • All magazines