Writing

記事

on-device LLMs · apple silicon · the occasional deep debug

What actually runs well on a 16 GB MacBook

Honest local-LLM benchmarks on a base M3, 16 GB — tokens/sec, peak RAM, and exactly where it hits the wall. The numbers nobody publishes because they run on H100s.

↗ read

Why Mistral and Devstral models drop their spaces on Apple Silicon

Debugging why tekken-v13 models emit Ġ instead of spaces through mlx-lm's server, and the one-line root cause in MLX's detokenizer routing.

↗ read