# Forward-checking word suggestion algorithm

**Status**: Approved | Implemented | 2025

**Author**: Victor Ma

**Reviewer**: Jonathan Blandford


## Introduction

The forward-checking [word suggestion algorithm](word-suggestion.md) generates
word suggestions that are constrained by the current filter and every
intersecting filter of the current clue. The algorithm is implemented by the
function `word_list_find_clue_matches()`.


## The algorithm

The forward-checking word suggestion algorithm is built on top of the
[intersection-based word suggestion algorithm](intersection-algorithm.md). While
the intersection-based algorithm only accounts for a single intersecting filter
(namely, the one that contains the current cell), the forward-checking algorithm
accounts for every intersecting filter.

Here's how the algorithm works:
1. Iterate through the intersecting clues of the current clue:
    * For each intersecting clue, use the intersection-based word suggestion
      algorithm to determine the set of possible letters for the intersecting
      cell (the cell where the current clue and intersecting clue meet).
1. Return the set of words that meet these constraints:
    * The word meets the constraints imposed by the intersecting clues.
    * The word matches the current filter.


## Benefits

Consider this grid, which the [intersection-based algorithm fails to handle
properly](intersection-algorithm.md#limitations):
```
+---+---+---+---+
|   |   |   | Z |
+---+---+---+---+
|   |   |   | E |
+---+---+---+---+
|   |   |   | R |
+---+---+---+---+
| W | O | R |   | < current clue
+---+---+---+---+
  ^ current cell
```
Suppose that the current clue is 4-Across, and the current cell is the
bottom-left cell. Then, the forward-checking algorithm checks the filters for
all the intersecting clues, including 4-Down. The algorithm sees that the filter
for 4-Down constrains the bottom-right cell to the letter *O*. This makes the
current filter `WORO`, which doesn't match any word. So, the algorithm correctly
returns the empty set.

In general, this algorithm has two advantages over the intersection algorithm:
* Word suggestions for a given clue do not change based on the location of the
  cursor.
* Dead-end words are a lot less likely to be generated.


## Limitations

This approach significantly reduces the amount of dead-end words, but it does
not eliminate them altogether. Consider the following grid:
```
+---+---+---+---+
|   |   |   | N |
+---+---+---+---+
| Q | U | I |   |
+---+---+---+---+
|   |   |   | X |
+---+---+---+---+
| W | E | S |   | < current clue
+---+---+---+---+
```

2-Across starts with *QUI*, so the only word it can be is *QUIZ*. This makes the
filter for 4-Down *NZX?*. No word matches this filter, and so, 4-Down is
unfillable. This means that 4-Across is also unfillable, because it intersects
with 4-Down.

Now, suppose that the current clue is 4-Across. Then, the forward-checking
algorithm does not detect that the current clue is unfillable. The algorithm
sees that the filter for 4-Down is *N?X?*. But it does not see that 2-Across
forces 4-Down to be *NZX?*. So, the algorithm thinks that 4-Down can be *NEXT*,
and that 4-Across can be *WEST*---which is incorrect.

This failure is because the forward-checking algorithm checks the intersecting
clues of the current clue---but it does not check the intersecting clues of the
intersecting clues of the current clue. In other words, the algorithm does not
go "deep" enough to find the problematic clue (which is 2-Across).

### Additional levels are impractical

While it is possible to modify the algorithm to go one, two, or more levels
deeper, this is not practical, for a number of reasons.

#### Runtime increases exponentially

First, the number of intersections that the algorithm needs to check is roughly
$s^L$, where $s$ is the average clue size, and $L$ is the number of levels. So,
the algorithm's runtime increases exponentially, with each additional level that
we add to the algorithm.

#### Dead-end words are difficult to prevent entirely

Second, even adding a few levels to the algorithm is not enough to completely
eliminate dead-end words. Consider the following grid, where every clue is
unfillable, because 4-Across (*ZXCVBN?*) is unfillable, and that "unfillability"
propagates to every other clue:
```
+---+---+---+---+---+---+---+
|   | U | A | L | I | T |   |
+---+---+---+---+---+---+---+
| U | ■ | ■ | ■ | ■ | ■ | O |
+---+---+---+---+---+---+---+
| I | ■ | A | X |   | ■ | G |
+---+---+---+---+---+---+---+
| E | ■ | ■ | ■ | B | ■ | U |
+---+---+---+---+---+---+---+
|   | H | U | M |   | ■ | R |
+---+---+---+---+---+---+---+
| ■ | ■ | ■ | ■ | ■ | ■ | T |
+---+---+---+---+---+---+---+
| Z | X | C | V | B | N |   |
+---+---+---+---+---+---+---+
```

Now, suppose that the current clue is 2-Across (*AX?*). Then, our algorithm
needs at least six levels of *lookahead* to properly detect that the grid is
unfillable.  Otherwise, it thinks that the grid can be filled with *AXE*, *EBB*,
*THUMB*, *QUIET*, *QUALITY*, and *YOGURTS* (starting with 2-Across and
spiralling outward).

It may not be possible to prevent dead-end words entirely, with this
algorithm. For example, what happens in a dense grid with few black cells, where
the algorithm constantly loops in on itself? Does simply adding additional
levels of lookahead account for the complex constraint relations between clues?

#### Additional levels have diminishing returns

Third, additional levels of lookahead have diminishing returns, in terms of the
number of dead-end words they catch. This is because the clues closest to the
current clue have the greatest constraining power on the current clue. What
follows is a proof of this fact.

Suppose $i_1$ is an intersecting clue of the current clue. Then, $i_1$ has a big
influence on the current clue, because $i_1$ directly constrains the character
set for one of the current clue's cells (namely, the cell where the two clues
meet).

Now, suppose $i_2$ is an intersecting clue of $i_1$. Then, the only way that
$i_2$ can constrain the current clue is by constraining $i_1$. And the only way
it can constrain $i_1$ is by constraining the character set for one of $i_1$'s
cells. So, $i_2$ must constrain one of $i_1$'s cells enough for $i_1$ to become
constrained enough to add an additional constraint on the current clue.

And for a clue $i_3$ that intersects $i_2$, the constraining effect is another
order of magnitude smaller, and so on, for each additional intersection-removed
that a clue is.

So, the impact that a clue has on the current clue grows weaker, the more
intersections away it is. This means that the lookahead-based algorithm, by
accounting for the immediate intersecting clues of the current clue, eliminates
most of the dead-end words.

#### There is a better solution

And finally, there is a well-established algorithm called the AC-3 algorithm,
which we can use to [generate perfect word suggestions](ac3.md) with zero
dead-end words. So it doesn't make sense to try to extend this forward-checking
algorithm, given all the other drawbacks.


## Implementation

Here's how the forward-checking word suggestion algorithm is implemented:

<!-- How in-depth / low-level should this section be?
     There's a high-level explanation of the algorithm at the top
     of this doc, which I think explains the basics of it already. -->


## Caching

Here is a possible implementation for a caching system:
* Whenever we run the algorithm on a clue, cache those results.
  * Potentially: continuously run the algorithm on all the clues in the grid,
  asynchronously.
* Whenever the user modifies a clue $C$, delete the cache for all the
  intersecting clues of $C$.
* Whenever the user moves the cursor to a cached clue, return the cached
  results.