← Across & Down

How the engine actually works

No magic, and no claim we’ve beaten anyone yet. This is a reasoning engine — it proposes answers, places them, and retracts them when the crossings prove it wrong — plus an honest account of where it stands against the best crossword AIs ever built.

The pipeline

  1. 01 · Propose. For each clue, a language model and a 245k-clue historical index produce a distribution over candidate answers — not one guess, a ranked field with probabilities.
  2. 02 · Place. A deterministic constraint engine owns the grid: forward-checking, MRV (minimum-remaining-values) to pick the highest-leverage slot next, and conflict-directed retraction. It is the same family as Dr.Fill and the Berkeley Crossword Solver — symbolic reasoning, not a word-list lookup.
  3. 03 · Doubt & retract. When a crossing contradicts a placed answer, the engine un-places it and explains why. Every place and retract is recorded with the reasoning behind it.

Where we stand — honestly

Published numbers for the field, and ours next to them. We’ll update this as the benchmark moves.

SystemWhat it doesPublished result
Berkeley Crossword Solver (2021)Neural QA → loopy belief propagation → local-search correction82% full-puzzle, 99.9% letter (themeless). First program to beat every human at the ACPT. The bar.
Dr.Fill (Ginsberg)Heuristic search over a clue databaseNear top-human for years; hybridized with Berkeley for the 2021 win.
OneLookPattern + dictionary/definition lookupExcellent reference search — but no reasoning, no grid, no explanation.
Crossword Genius (Ross)Cryptic clue explanationStrong at explaining cryptic wordplay; single-clue, no full-grid reasoning.
Across & Down (ours)LLM + historical distribution propose → constraint engine places → records reasoning per moveClue-answer recall: ~50% top-1, ~67% top-20 on held-out historical clues. Full 15×15 solve: below Berkeley today — we’re building the same belief-propagation path that closes the gap.

The honest version: on a fully filled hard grid we are not yet at Berkeley’s accuracy. We say so on purpose. What we have that none of them ship: a recorded, replayable chain of reasoning for every move.

What only we do

The roadmap to the bar

  1. ✓ Shipped. Distributional clue-answering (the historical index — 50× better top-1 than length+frequency).
  2. → Building. Belief-propagation grid solving over those distributions — the step that took Berkeley to 99.9% letter accuracy.
  3. ○ Next. Self-correction pass; a modern clue bank for contemporary answers; a public daily benchmark on real published puzzles.
Try a clue →Watch it solve

Methodology: recall measured on a puzzle-level held-out split of the public-domain NYT archive (≈450k clue→answer pairs). Berkeley figures from Wallace et al., ACL 2022. Brutal feedback welcome — roger@grubb.net.