# AC-3-based word suggestion algorithm

**Status**: Approved | 2025

**Author**: Victor Ma

**Reviewer**: Jonathan Blandford


## Introduction

The AC-3-based [word suggestion algorithm](word-suggestion.md) generates a
complete and perfect set of word suggestions for every clue in the grid, all at
once. This algorithm is based on the AC-3 arc consistency algorithm from the
field of [constraint satisfaction
problems](https://inst.eecs.berkeley.edu/~cs188/textbook/csp/).

## CSP model for grid fillling

The problem of finding a valid grid fill is a constraint satisfaction problem
(CSP).  What follows is one possible way of modelling the CSP. However, [other
models also exist](https://cs.uwaterloo.ca/~vanbeek/Publications/cai01a.pdf).

### The CSP model

Let $V$ be the set of variables for the CSP, and let

$$
V = \{a_1, a_2, \dots, a_m\} \cup \{d_1, d_2, \dots, d_n\},
$$

where $a_i$ is the word in the $i$-Across clue and $m$ is the number of across
clues in the grid; and $d_j$ is the word in the $j$-Down clue and $n$ is the
number of down clues in the grid. So, there is one variable for each clue, and
each variable is set to the chosen word for its respective clue.

Let $D$ be the set of domains for the CSP, and let

$$
D = \{ D_v \mid v \in V \}
$$

where

$$
D_v = \text{the set of word-list words that match the filter for the clue that $v$ represents}.
$$

Let $I$ be the set of intersection constraints for the CSP, and let

$$
I =
\left\{
  \begin{aligned}
    I_{v_1v_2} \mid v_1, v_2 \in V
    &\land \text{The clues for $v_1$ and $v_2$ intersect each other} \\
    &\land \text{The clue for $v_1$ appears before the clue for $v_2$ in the clue list} \\
  \end{aligned}
\right\}
$$

where

$$
\begin{aligned}
  I_{v_1v_2} = \ &\text{The values of $v_1$ and $v_2$ have the same letter} \\
  &\text{at the offsets where their clues intersect}.
\end{aligned}
$$

Let $Q$ be the set of unique word constraints, and let

$$
Q =
\left\{
  \begin{aligned}
    Q_{v_iv_j} \mid v_i, v_j \in V & \land i \neq j                     \\
    & \land \text{The clues for $v_i$ and $v_j$ have the same length}   \\
    & \land \text{The clue for $v_i$ appears before the clue for $v_j$} \\
    & \hspace{1.2em} \text{in the clue list}                            \\
  \end{aligned}
\right\}.
$$

where

$$
Q_{v_1v_2} = (\text{The value of } v_1 \neq \text{the value of } v_2)
$$

Let $C$ be the set of constraints for the CSP, and let

$$
C = I \cup Q.
$$

### Solution to the CSP

A solution to this CSP assigns a word to each clue. Suppose that the CSP has a
generic Engish word list, and the following grid:
```
+---+---+---+---+
| ■ | ■ | ■ | N |
+---+---+---+---+
| T | I | M |   |
+---+---+---+---+
| ■ | ■ | ■ | X |
+---+---+---+---+
| W | E | S |   |
+---+---+---+---+
```
Then, the unique solution to the CSP is
\begin{align*}
  a_1 &= \text{TIME} \\
  a_2 &= \text{WEST} \\
  d_1 &= \text{NEXT}.\\
\end{align*}

### The AC-3 algorithm

However, our goal is not to find a solution to the CSP; our goal is to find the
set of possible words for a clue. To do this, we use the [AC-3
algorithm](https://inst.eecs.berkeley.edu/~cs188/textbook/csp/filtering.html).
The AC-3 algorithm accepts a CSP model and returns arc-consistent domains for
every variable in the CSP.

For our grid-filling CSP, this means that the AC-3 algorithm returns a set of
possible words for each clue. Furthermore, each set contains every possible word
that the clue can be and no word that the clue cannot be. In other words, there
are no dead-end words in any set, and no words are "missing" from any set.

## Performance
This algorithm must be implemented asynchronously. It is too slow to run
synchronously.

Furthermore, if the grid is too sparse, this algorithm will run indefinitely.
So, we should revert to a simpler word suggestion algorithm when the grid is
sparse.

## Caching

It should be possible to cache results, but the way to do so is not immediately
obvious. The problem is that the AC-3 algorithm can only reduce the domains. It
can never add values back into them.