you can try llama.cpp with a small model, a 4bit 7B model I suggest. They run sl...

you can try llama.cpp with a small model, a 4bit 7B model I suggest. They run slow on my M1 MacBook with 16GB of ram, so if it does work it will be quite painful.

I run the 30B 4bit model on my M2 MacMini 32GB and it works okay, the 7B model is blazingly fast on that machine.