Intelligent Example: Solving Mini Crosswords with ToT and Backtracking

The objective is to fill a 5\times5 grid by finding ten words that satisfy both the horizontal and vertical clues (lexical, spatial, and deductive reasoning are all required).

Problem Setup

Task: 5\times5 Mini Crossword (20 questions/clues in total).
Goal: Fill the entire grid correctly.
Thought Decomposition: Each "thought" is the placement of a single word/clue filling (e.g., h1. TASKS; v5. NALED). The thoughts are sequenced based on priority queue, creating up to 10 intermediate steps.[1]
Search Algorithm: Depth-First Search (DFS). This prioritizes exploring one path completely before trying another.[2, 1]
Heuristic Evaluation (Pruning): At each step, the LLM is prompted to evaluate all remaining unfilled clues based on the current letter constraints. The output is a confidence score or a classification (e.g., "possible," "impossible").[1]

Step-by-Step ToT Execution (Demonstrating Backtracking)

Let's assume the LLM has already successfully filled h1. TASKS and is now at a search node (State s_{2}).

Step 1: Thought Generation (Prioritization)

The LLM is prompted to generate and prioritize candidates for the next word/clue to fill, considering the existing letter constraints (the 'A' from TASKS constrains one vertical clue, for instance).

Clue/Thought	Proposed Word	LLM Confidence (Heuristic)	Search Action
h2. [Clue]	MOTOR	High	Prioritize. Select for deep exploration.
v3. [Clue]	STRING	Medium	Keep as alternative.
h4. [Clue]	SALON	High	Keep as alternative.

Search Action: DFS commits to the h2. MOTOR path first.

Step 2: Deep Exploration (Fatal Error)

The system now expands the tree deeply along the chosen path. After placing h2. MOTOR, a new constraint is created (the 'T' from MOTOR constrains a different vertical clue). The LLM proposes and places the next thought, for instance, v1. TENETS.

Thought Generated	Partial Solution State	Search Action
v1. TENETS	Grid now contains TASKS, MOTOR, and TENETS	Continue deep search.

Step 3: State Evaluation and Pruning

The LLM is then asked to evaluate the viability of the entire remaining problem from this new state (s_{3}). It examines all un-filled horizontal and vertical clues against the letters placed so far.

The LLM finds that, due to the letter placement conflict between h1, h2, and v1, one remaining vertical clue, v5., now has the mandatory constraint: S_R_D_.

Remaining Clue	Constraint	LLM Value Prompt Result	Pruning Trigger
v5. Desiccator...	S_R_D_	Impossible [1]	Pruning Activated.

The LLM determines that no known word can satisfy the S_R_D_ constraint given the clue, rendering the current path a "dead-end." This is an explicit, language-based heuristic determination.[1]

Step 4: Backtracking

Because the current state is deemed "impossible," the DFS algorithm executes the crucial ToT mechanism: Backtracking.[1]

The entire sub-tree stemming from v1. TENETS is pruned and discarded.
The system reverts the search state back to the parent node, where only h1. TASKS and h2. MOTOR were placed.
The search mechanism marks v1. TENETS as a failed branch and selects the next alternative from the queue at that level (Step 2). If no alternatives exist, it backtracks again to the previous parent (State s_{2} before any move was made from it).

Intelligence Demonstrated:

The key advantage here is the LLM's capacity to recognize a long-term failure immediately after a local step, prompting a structural correction to the problem-solving process.[1]

Linear CoT Failure: A linear Chain-of-Thought process would have continued generating tokens sequentially, amplifying the error from the "impossible" constraint until the whole sequence was produced and failed.[1]
ToT Success: ToT uses its deliberate self-evaluation (System 2 reasoning) to trigger a global search control function (backtracking), thus saving computational steps and efficiently recovering from the local error to search an alternative, viable path.[2, 1] The research confirmed this capability is indispensable for complex planning: removing the backtracking feature caused the success rate to plummet from 60% to only 20% on the Mini Crosswords task.[1]

4.5 KiB Raw Blame History