on-device LLMs · apple silicon · the occasional deep debug
What actually runs well on a 16 GB MacBook
Honest local-LLM benchmarks on a base M3, 16 GB — tokens/sec, peak RAM, and exactly where it hits the wall. The numbers nobody publishes because they run on H100s.
Why Mistral and Devstral models drop their spaces on Apple Silicon
Debugging why tekken-v13 models emit Ġ instead of spaces through mlx-lm's server, and the one-line root cause in MLX's detokenizer routing.