@chikim@mastodon.social avatar

chikim

@chikim@mastodon.social

Love music, technology, accessibility! Faculty at Berklee College of Music 👨🏻‍💻🎹🐕‍🦺

This profile is from a federated server and may be incomplete. Browse more on the original instance.

chikim, to random
@chikim@mastodon.social avatar

VOCR v2.0.0-alpha.1 is here. It's an alpha release, so it's likely to be very buggy. If you feel adventurous, read the release notes and download from GitHub. New features include auto-scan, settings in menu extras instead of shortcuts, object detection, and UI exploration with GPT-4V, which is extremely unreliable but can be useful. https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.1

chikim, to random
@chikim@mastodon.social avatar

Let's try again! I haven't found any UI for local LLMs that isn't annoying to use with screen readers, so I just made one for myself for Ollama called VOLlama. lol Hope someone finds it useful.
Windows users: follow the instruction on the release page to install Ollama with Docker.
Mac user: Install Ollama using the instruction on ollama.ai. Also, the app is not signed.
https://github.com/chigkim/VOLlama/releases/tag/v0.1.0-alpha.1
@vick21 @freakyfwoof @tristan @KyleBorah @Bri

chikim, to accessibility
@chikim@mastodon.social avatar

Can anyone suggest effective methods for teaching a blind person how to repair small, intricate items, a task that involves using razorblades, torches, hammers, and screwdrivers with great precision?

chikim, to random
@chikim@mastodon.social avatar

This person is using Google Sheets and Web MIDI API as midi sequencer. Why? Because you can. lol https://www.asepbagja.com/programming/making-music-with-google-sheets/

chikim, to random
@chikim@mastodon.social avatar

I was wondering why the letter W in Braille does not follow the same pattern as the rest of the alphabet, and Chat GPT gave me the answer! "Braille was first invented in France in the 1820s by Louis Braille, who was blind. The French language does not have the letter W, so it was not included in the original Braille alphabet. The letter W was added to Braille later, when it was adopted by other languages that use the letter W."

chikim, to macos
@chikim@mastodon.social avatar

Cool tip for running LLMs on Apple Silicon! By default, MacOS allows GPU to use up to 2/3 of RAM on machines with <=36GB and 3/4 on machines with >36GB. I used the command sudo sysctl iogpu.wired_limit_mb=57344 to override and allocate 56GB/64GB for GPU. This allowed me to load all layers of larger models for a faster speed!

chikim, to random
@chikim@mastodon.social avatar

I am curious about how much the GPT-4 multimodal model confabulates. I asked an open source model to describe my headshot, and it said, "The man is smiling for the camera while wearing glasses and a pink shirt." Another model said that the person in the image is wearing a pink shirt, black slacks, and dress shoes. However, I am not wearing glasses, and the picture does not show below my shoulders. The only correct part of the description is that a man is smiling in a pink shirt!

chikim, to ai
@chikim@mastodon.social avatar

Maybe we have an open source competitor for ElevenLabs? Check out their demo which they switch between original and synthesized. I can't tell. lol Apparently they're going to fully open source codebase and model weights. https://jasonppy.github.io/VoiceCraft_web/

chikim, to random
@chikim@mastodon.social avatar

VOCR v2.0.0-alpha.16
Takes a screenshot before asking for a prompt; Able to select which model for Ollama to use if multiple clip models are found
https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.16
@talon @vick21 @pixelate @KyleBorah @FreakyFwoof @Bri @pitermach

chikim, to random
@chikim@mastodon.social avatar

Any open source voice clone model for tts that sounds as good as 11labs yet? If not, what's the best one so far?

chikim, to random
@chikim@mastodon.social avatar

VOCR v2.0.0-alpha.13: It now supports GPT, Ollama, Llama.cpp. Use models submenu in settings. @vick21 @talon @pixelate @KyleBorah @FreakyFwoof @Bri @pitermach https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.13

chikim, to random
@chikim@mastodon.social avatar

IMPORTANT VOCR v2.0.0-alpha.12: Everyone who upgraded to any v2 alpha build, you should upgrade to this one. Previous builds has a bug where it uses window size as a resolution instead of original resolution from screenshot. It will improve quality for OCR, object detection, identify object, etc. It also has new workflow. See the change log. @vick21 @pixelate @KyleBorah @FreakyFwoof @Bri @pitermach @talon https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.12

chikim, to random
@chikim@mastodon.social avatar

VOCR v2.0.0-alpha.18
Autoupdater, logger FOR DEBUG, Realtime OCR shortcut toggles the feature
Eventually I need to split pre-release vs public release for autoupdater, but let's see how it goes.
https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.18
@vick21 @FreakyFwoof @KyleBorah @talon @pixelate @Bri @pitermach

chikim, to llm
@chikim@mastodon.social avatar

Falcon-180B: Biggest and first open source large language model that you can't run on consumer hardware. Even just to run Inference with 4-bit quantized model, yu need 320GB vram like 8xA100s. 180 billion parameters; trained on 3.5 trillion tokens using 4,096 A100 40GB GPUs; spent total of 7 million GPU hours. https://huggingface.co/blog/falcon-180b

chikim, to random
@chikim@mastodon.social avatar

Finally release VOCR 2.0.0. So many new features since 1.0! You can download and checkout the demo here. https://chigkim.github.io/VOCR/

chikim, to llm
@chikim@mastodon.social avatar

Tired of neutral responses from LLMs? Llama-3 seems great at following system prompts, so try this system prompt for an opinionated chatbot.
"You are a helpful, opinionated, decisive assistant. When asked a yes/no question, begin your respond with one word answer: yes or no. For open-ended or complex questions, adopt a firm stance. Justify your views with well-reasoned arguments, robust evidence, and succinct explanations, ensuring clarity and confidence in every response."

chikim, to accessibility
@chikim@mastodon.social avatar

Ableton Live, Reaper, Pro Tools, Logic, Sibelius, MuseScore, and Komplete Kontrol all have something in common. They are not only mainstream music software but also accessible to screen reader users! Twenty years ago, only Sonar and Sibelius were accessible. Although there is still more work to be done, there's no doubt that significant progress has been made in this space!

chikim, to jazz
@chikim@mastodon.social avatar

One of my students, Ciara Moser, who plays the bass, released her album “Blind. So What?” It sounds awesome! Check it out!
Spotify:https://open.spotify.com/album/0dwFmVgwpLgz1CwuTDK5JP?si=hFxsJhTURh--krlFhsT0Mw
Apple Music: https://music.apple.com/us/album/blind-so-what/1712158793

chikim, to llm
@chikim@mastodon.social avatar

VOLlama v0.1.0, an open-source, accessible chat client for OLlama
Unfortunately, many user interfaces for open source large language models are either inaccessible or annoying to use with screen readers, so I decided to make one for myself and others. Non screen reder users are welcome to use it as well.
I hope that ML UI libraries like Streamlit and Gradio will become more friendly with screen readers in the future, so making apps like this is not necessary!

https://chigkim.github.io/VOLlama/

chikim, to random
@chikim@mastodon.social avatar

Ok, I emailed for my issues, and they asked me to try their demo, send them the log it generated, and also asked me to provide my IP address.

chikim, to random
@chikim@mastodon.social avatar

VOLlama v0.1.0-alpha.2 has many new features: able to set system message, Save and recall chat history, Copy and delete model to use as presets, system message and host address in persistent settings, and bug fixes!
https://github.com/chigkim/VOLlama/releases/tag/v0.1.0-alpha.2
@vick21 @freakyfwoof @tristan @KyleBorah @Bri

chikim, to random
@chikim@mastodon.social avatar

VOCR-v2.0.0-alpha.6: Now you can OCR VOCursor Realtime, and you're able to toggle object detection. Every scan during realtime OCR triggers a sound, but I'll turn it off in next build. @FreakyFwoof @pitermach @vick21 @Bri https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.6

chikim, to accessibility
@chikim@mastodon.social avatar

Besides Microsoft Defender, what are some free antivirus software that are accessible with screen reader? I tried Avast, Bitdefender, AVG, Avira, and all of their GUIs have poor accessibility for screen reader. OUt of those options, Avira was most tolerable, but still it's pretty poor.

chikim, to random
@chikim@mastodon.social avatar

VOCR v2.0.0-alpha.8: dismiss menu with command+z if navigation or realtime ocr is active; press return to ask gpt without editing the prompt; save last screenshot. Few more days before I go back to my regular job. lol @FreakyFwoof @vick21 @Bri @pitermach https://github.com/chigkim/VOCR/releases/tag/v2.0.0-alpha.8

chikim, to ai
@chikim@mastodon.social avatar

Fascinating Microsoft AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. For example, you can setup a Human Admin, LLM Product Manager, LLM Back End Engineer, LLM UI Designer, LLM Critics, etc. You can then instruct them to collaborate and create a website. You can review their conversations, approve their work, or provide feedback throughout the process as a human admin. https://github.com/microsoft/autogen

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • mdbf
  • ngwrru68w68
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • tacticalgear
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • provamag3
  • ethstaker
  • GTA5RPClips
  • modclub
  • tester
  • Leos
  • osvaldo12
  • cisconetworking
  • everett
  • cubers
  • normalnudes
  • anitta
  • megavids
  • lostlight
  • All magazines