Using a raspberry pi seems very underpowered, best case you will be limited to something like 4-7B models on a 8GB RPi4. You may need to configure it with very long timeouts and expect it to output something like a token every few minutes.
I ran a 6B model on a i7 without a GPU and it didn’t give good results before I got CUDA up and running. Probably because of timeout.