chikim

@chikim@mastodon.social

Love music, technology, accessibility! Faculty at Berklee College of Music 👨🏻‍💻🎹🐕‍🦺

This profile is from a federated server and may be incomplete. Browse more on the original instance.

chikim, 5 months ago to random

VOCR v2.0.0-alpha.1 is here. It's an alpha release, so it's likely to be very buggy. If you feel adventurous, read the release notes and download from GitHub. New features include auto-scan, settings in menu extras instead of shortcuts, object detection, and UI exploration with GPT-4V, which is extremely unreliable but can be useful. https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.1

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ objectinspace, pitermach, Onj, FreakyFwoof +1 more

chikim, 4 months ago to random

Let's try again! I haven't found any UI for local LLMs that isn't annoying to use with screen readers, so I just made one for myself for Ollama called VOLlama. lol Hope someone finds it useful.
Windows users: follow the instruction on the release page to install Ollama with Docker.
Mac user: Install Ollama using the instruction on ollama.ai. Also, the app is not signed.
https://github.com/chigkim/VOLlama/releases/tag/v0.1.0-alpha.1
@vick21 @freakyfwoof @tristan @KyleBorah @Bri

reply

expand (46)

collapse (46)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ FreakyFwoof, datajake1999, Binder, pitermach

chikim, 4 months ago to accessibility

Can anyone suggest effective methods for teaching a blind person how to repair small, intricate items, a task that involves using razorblades, torches, hammers, and screwdrivers with great precision? #accessibility #teaching

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ datajake1999, objectinspace, purplepadma, Binder

chikim, 7 months ago to random

This person is using Google Sheets and Web MIDI API as midi sequencer. Why? Because you can. lol https://www.asepbagja.com/programming/making-music-with-google-sheets/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ objectinspace, datajake1999, rewarp, devinprater

chikim, 11 months ago to random

I was wondering why the letter W in Braille does not follow the same pattern as the rest of the alphabet, and Chat GPT gave me the answer! "Braille was first invented in France in the 1820s by Louis Braille, who was blind. The French language does not have the letter W, so it was not included in the original Braille alphabet. The letter W was added to Braille later, when it was adopted by other languages that use the letter W."

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ jaybird110127, TheQuinbox, datajake1999, devinprater

chikim, 1 month ago to macos

Cool tip for running LLMs on Apple Silicon! By default, MacOS allows GPU to use up to 2/3 of RAM on machines with <=36GB and 3/4 on machines with >36GB. I used the command sudo sysctl iogpu.wired_limit_mb=57344 to override and allocate 56GB/64GB for GPU. This allowed me to load all layers of larger models for a faster speed! #MacOS #LLM #AI #ML

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ppatel, jaybird110127, miki

chikim, 10 months ago to random

I am curious about how much the GPT-4 multimodal model confabulates. I asked an open source model to describe my headshot, and it said, "The man is smiling for the camera while wearing glasses and a pink shirt." Another model said that the person in the image is wearing a pink shirt, black slacks, and dress shoes. However, I am not wearing glasses, and the picture does not show below my shoulders. The only correct part of the description is that a man is smiling in a pink shirt!

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Binder, datajake1999, devinprater

chikim, 2 months ago to ai

Maybe we have an open source competitor for ElevenLabs? Check out their demo which they switch between original and synthesized. I can't tell. lol Apparently they're going to fully open source codebase and model weights. #TTS #AI #ML https://jasonppy.github.io/VoiceCraft_web/

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ objectinspace, datajake1999, ppatel

chikim, 4 months ago to random

VOCR v2.0.0-alpha.16
Takes a screenshot before asking for a prompt; Able to select which model for Ollama to use if multiple clip models are found
https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.16
@talon @vick21 @pixelate @KyleBorah @FreakyFwoof @Bri @pitermach

reply

expand (25)

collapse (25)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ objectinspace, ppatel, FreakyFwoof

chikim, 10 months ago to random

Any open source voice clone model for tts that sounds as good as 11labs yet? If not, what's the best one so far? #voiceclone #tts

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ rewarp, devinprater

chikim, 4 months ago to random

VOCR v2.0.0-alpha.13: It now supports GPT, Ollama, Llama.cpp. Use models submenu in settings. @vick21 @talon @pixelate @KyleBorah @FreakyFwoof @Bri @pitermach https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.13

reply

expand (37)

collapse (37)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ppatel, FreakyFwoof

chikim, 4 months ago to random

IMPORTANT VOCR v2.0.0-alpha.12: Everyone who upgraded to any v2 alpha build, you should upgrade to this one. Previous builds has a bug where it uses window size as a resolution instead of original resolution from screenshot. It will improve quality for OCR, object detection, identify object, etc. It also has new workflow. See the change log. @vick21 @pixelate @KyleBorah @FreakyFwoof @Bri @pitermach @talon https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.12

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ FreakyFwoof, pitermach

chikim, 4 months ago to random

VOCR v2.0.0-alpha.18
Autoupdater, logger FOR DEBUG, Realtime OCR shortcut toggles the feature
Eventually I need to split pre-release vs public release for autoupdater, but let's see how it goes.
https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.18
@vick21 @FreakyFwoof @KyleBorah @talon @pixelate @Bri @pitermach

reply

expand (17)

collapse (17)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ FreakyFwoof, pitermach

chikim, 9 months ago to llm

Falcon-180B: Biggest and first open source large language model that you can't run on consumer hardware. Even just to run Inference with 4-bit quantized model, yu need 320GB vram like 8xA100s. 180 billion parameters; trained on 3.5 trillion tokens using 4,096 A100 40GB GPUs; spent total of 7 million GPU hours. #llm #ai https://huggingface.co/blog/falcon-180b

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ bryansmart, devinprater

chikim, 19 days ago to random

Finally release VOCR 2.0.0. So many new features since 1.0! You can download and checkout the demo here. https://chigkim.github.io/VOCR/

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon, pitermach

chikim, 1 month ago to llm

Tired of neutral responses from LLMs? Llama-3 seems great at following system prompts, so try this system prompt for an opinionated chatbot.
"You are a helpful, opinionated, decisive assistant. When asked a yes/no question, begin your respond with one word answer: yes or no. For open-ended or complex questions, adopt a firm stance. Justify your views with well-reasoned arguments, robust evidence, and succinct explanations, ensuring clarity and confidence in every response."
#LLM #AI #ML

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ miki, datajake1999

chikim, 6 months ago to accessibility

Ableton Live, Reaper, Pro Tools, Logic, Sibelius, MuseScore, and Komplete Kontrol all have something in common. They are not only mainstream music software but also accessible to screen reader users! Twenty years ago, only Sonar and Sibelius were accessible. Although there is still more work to be done, there's no doubt that significant progress has been made in this space! #accessibility

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ devinprater, bryansmart

chikim, 6 months ago to jazz

One of my students, Ciara Moser, who plays the bass, released her #jazz album “Blind. So What?” It sounds awesome! Check it out!
Spotify:https://open.spotify.com/album/0dwFmVgwpLgz1CwuTDK5JP?si=hFxsJhTURh--krlFhsT0Mw
Apple Music: https://music.apple.com/us/album/blind-so-what/1712158793

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ objectinspace, devinprater

chikim, 1 month ago to llm

VOLlama v0.1.0, an open-source, accessible chat client for OLlama
Unfortunately, many user interfaces for open source large language models are either inaccessible or annoying to use with screen readers, so I decided to make one for myself and others. Non screen reder users are welcome to use it as well.
I hope that ML UI libraries like Streamlit and Gradio will become more friendly with screen readers in the future, so making apps like this is not necessary!
#LLM #AI #ML
https://chigkim.github.io/VOLlama/

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ jaybird110127, objectinspace

chikim, 7 months ago to random

Ok, I emailed #HCaptcha for my issues, and they asked me to try their demo, send them the log it generated, and also asked me to provide my IP address.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ datajake1999, devinprater

chikim, 4 months ago to random

VOLlama v0.1.0-alpha.2 has many new features: able to set system message, Save and recall chat history, Copy and delete model to use as presets, system message and host address in persistent settings, and bug fixes!
https://github.com/chigkim/VOLlama/releases/tag/v0.1.0-alpha.2
@vick21 @freakyfwoof @tristan @KyleBorah @Bri

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ pitermach, Binder

chikim, 5 months ago to random

VOCR-v2.0.0-alpha.6: Now you can OCR VOCursor Realtime, and you're able to toggle object detection. Every scan during realtime OCR triggers a sound, but I'll turn it off in next build. @FreakyFwoof @pitermach @vick21 @Bri https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.6

reply

expand (15)

collapse (15)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ objectinspace, FreakyFwoof

chikim, 7 months ago to accessibility

Besides Microsoft Defender, what are some free antivirus software that are accessible with screen reader? I tried Avast, Bitdefender, AVG, Avira, and all of their GUIs have poor accessibility for screen reader. OUt of those options, Avira was most tolerable, but still it's pretty poor. #accessibility

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ doug, joelanman

chikim, 5 months ago to random

VOCR v2.0.0-alpha.8: dismiss menu with command+z if navigation or realtime ocr is active; press return to ask gpt without editing the prompt; save last screenshot. Few more days before I go back to my regular job. lol @FreakyFwoof @vick21 @Bri @pitermach https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.8

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ objectinspace, FreakyFwoof

chikim, 7 months ago to ai

Fascinating Microsoft AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. For example, you can setup a Human Admin, LLM Product Manager, LLM Back End Engineer, LLM UI Designer, LLM Critics, etc. You can then instruct them to collaborate and create a website. You can review their conversations, approve their work, or provide feedback throughout the process as a human admin. #AI #LLM https://github.com/microsoft/autogen

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ datajake1999, devinprater