ChessBench is a simple, open-source way to compare how well AI models solve chess puzzles.
I built it just for fun to test when models say they can reason, how good are they actually at tactical thinking and spatial awareness? So instead of vague claims, ChessBench runs the same set of curated mate-in-1/2/3 puzzles across multiple models and shows side-by-side results.
You can quickly see:
which models get the right moves most often,
where they fail,
and how much each model costs in tokens and money.
Built with