chikim, Cool tip for running LLMs on Apple Silicon! By default, MacOS allows GPU to use up to 2/3 of RAM on machines with <=36GB and 3/4 on machines with >36GB. I used the command
sudo sysctl iogpu.wired_limit_mb=57344
to override and allocate 56GB/64GB for GPU. This allowed me to load all layers of larger models for a faster speed! #MacOS #LLM #AI #ML
Add comment