W

O

R

D

L

E

ML Project

A comparative study of heuristic and machine learning approaches to solving Wordle. Six solvers, from simple letter frequency to deep reinforcement learning, tested against the theoretical optimal.

🎯 Solver Assistant

Get help solving Wordle

▶️ Autoplay

Watch a solver play

📊 About

Results & methodology

6 Solvers · 2,315 Words · Optimal: 3.421 avg guesses

How to use: Type a 5-letter word and press Enter to fill the next row. Then click each tile to set its color (gray → yellow → green) to match your Wordle feedback. Click Suggest Guess to ask the solver what to play next.

Author

Justin Hoffman, UIUC MCS / ISU MSCS Graduate Student. This project was developed for IT 448 (Graduate Machine Learning) at Illinois State University, Spring 2026.

Paper (PDF) · GitHub

jhffmn.myAlt1@gmail.com · jrhoff2@ilstu.edu

Inspiration

In 2022, I built a Python-based Wordle solver ("Wordmaster Master") in response to a challenge from a coworker. That original solver used simple frequency analysis, scoring candidate words by how commonly their letters appeared, and was wrapped in a tkinter GUI. I wrote it before I had any formal introduction to information gain or decision trees, which made it a natural baseline when I later revisited the problem with machine learning.

This project began as an attempt to revisit that original solver with a more formal machine learning perspective. Instead of relying only on hand-designed heuristics, I wanted to see whether a model could learn an effective strategy from experience. This led naturally to reinforcement learning formulations of the problem.

I was heavily inspired by Andrew Ho’s "Wordle Solving with Deep Reinforcement Learning", which frames Wordle as a reinforcement learning task and explores the practical challenges of training deep RL models in this setting. His work highlights several key design decisions, including a structured 417-dimensional state representation, a 130-dimensional output space scored via dot product against word encodings, and a staged warm-start training pipeline.

Ho reports strong results, about 98% win rate and roughly 4.1 average guesses, but on a much larger vocabulary of about 13K words. He also notes that achieving this performance required millions of training games along with curriculum-style training and targeted resampling of difficult words. His strongest results came from an A2C approach, while DQN struggled at full scale.

In this project, I implemented a DQN architecture closely following Ho’s design and also experimented briefly with A2C. Due to time and compute limits, training was restricted to tens of thousands of episodes instead of millions. A2C did not reach competitive performance within this budget, so the focus remained on DQN and planning-based methods.

Because of these constraints, the goal is not to reproduce Ho’s results, but to compare how different approaches behave under limited training and controlled conditions. In particular, the teacher-guided DQN, which uses a rollout-based policy to guide exploration, emerged as the most effective learning-based compromise.

It is also important to distinguish between the learned model and the deployed solver. At inference time, the DQN solver operates within a constrained wrapper. If filtering reduces the candidate set to a single word, that word is selected directly, and repeated guesses are prevented with a hard mask. These constraints improve stability, but they are not learned by the model itself.

Overall, this work should be viewed as an exploratory study. Learned approaches do not outperform strong heuristics under these conditions, but the results suggest that expert-guided reinforcement learning is a promising direction with more training.

Project Overview

This project compares six algorithmic strategies for solving Wordle, spanning heuristics, reinforcement learning, and deep reinforcement learning, against the known optimal solution (3.421 average guesses, 100% win rate). All solvers are evaluated on the official 2,315-word answer list in a closed-vocabulary setting. The goal is to determine whether learned models can outperform strong heuristic and planning-based approaches on a small, structured decision problem, especially under limited training and compute. The results show that simple heuristics and planning methods remain extremely strong, while pure reinforcement learning struggles without additional structure. Adding teacher guidance significantly improves performance.

The Solvers

1. Frequency Heuristic Heuristic
My original 2022 solver adapted into a headless evaluator. It scores words using letter frequency and positional frequency across remaining candidates, with positions weighted 3x more heavily. When more than two words remain, it also considers exploratory guesses from the full vocabulary to maximize information gain.

2. Information Gain (Minimax) Heuristic
For each guess, the solver simulates feedback against every remaining word, partitions candidates by feedback pattern, and selects the guess that minimizes the worst-case remaining set size. The opening guess is computed once and cached.

3. DQN v1 (Pure) Deep RL
A Deep Q-Network following Ho (2022): 417-dim state → 512 → 512 → 130 output. The network produces a 130-dimensional embedding scored against candidate word encodings via dot product. Trained with Double DQN and epsilon decay from 1.0 to 0.05 using random exploration. Result: 67.2% win rate with 4.582 average guesses. Pure DQN struggles with weak exploration and poor training signal.

4. DQN v2 (Teacher-Guided) Deep RL
Same architecture as v1, but uses the rollout solver as a teacher. During exploration, the agent follows the teacher’s action, and receives a reward bonus for matching it. Trained for 20,000 episodes. At inference time, the solver applies practical constraints such as single-candidate selection and masking repeated guesses. Result: 97.7% win rate with 3.678 average guesses. Learning improves dramatically when guided by a strong policy.

5. Tabular Q-Learning RL
Learns which strategy to use rather than which word to guess. State is defined by the number of greens and yellows, and actions correspond to selecting among predefined strategies. Only about 19 states exist, making a full Q-table feasible.

6. Rollout (POMDP) RL / Planning
Uses one-step lookahead with a base policy. For each candidate guess, it simulates full games across all remaining words and selects the one with the lowest expected number of guesses. Memoization dramatically reduces the number of unique states.

Performance Comparison

All results below are evaluated over the full 2,315-word answer set.

Solver	Type	Win Rate	Avg Guesses	Speed
Optimal (Bertsimas 2024)	Benchmark	100.0%	3.421
Rollout (POMDP)	RL / Planning	100.0%	3.477	182.7 it/s
Frequency Heuristic	Heuristic	100.0%	3.575	19.5 it/s
Info Gain (Minimax)	Heuristic	100.0%	3.644	2.6 it/s
Tabular Q-Learning	RL	99.0%	3.651	70.5 it/s
DQN v2 (Teacher-Guided)	Deep RL	97.7%	3.678	135.6 it/s
DQN v1 (Pure)	Deep RL	67.2%	4.582	114.4 it/s
Ho (2022) reported	Deep RL	~98%	~4.1	—

Average Guesses

Win Rate

Why DQN v1 Fails, and How v2 Recovers

Pure DQN struggles because exploration is weak. Random guesses generate poor training data, and the model ends up learning from its own mistakes. This creates a feedback loop where performance degrades.

DQN v2 improves this by using a strong teacher. The replay buffer is filled with high-quality decisions, which stabilizes learning and leads to significantly better performance.

Key Findings

Simple heuristics remain extremely strong. The frequency solver achieved 100% win rate with 3.575 average guesses, outperforming all learned approaches.

The effective state space is much smaller than it appears. The rollout solver encountered only 331 unique states across the full benchmark. Despite the NP-hard formulation, near-optimal play visits a highly constrained subset of states.

Architecture Details

DQN: 417-dim state → 512 → 512 → 130 output. Uses dot product scoring against word encodings.

Rollout solver: Memoized evaluation over candidate sets.

Frequency heuristic: Letter frequency plus positional weighting.

Limitations and Future Work

The DQN is not used end-to-end. The solver applies rules at inference time that are not learned, so performance reflects both the model and external logic.

Training scale is limited. Models were trained on tens of thousands of games rather than millions.

The reward function does not directly optimize candidate set reduction, which is the true objective.

The teacher-guided model is limited by its teacher and cannot exceed rollout performance without changes.

References

Daniel Lokshtanov and Bernardo Subercaseaux. 2022. Wordle Is NP-Hard. In Proceedings of the 11th International Conference on Fun with Algorithms (FUN 2022). Leibniz International Proceedings in Informatics (LIPIcs), Vol. 226, Article 19, 19:1–19:8. https://doi.org/10.4230/LIPIcs.FUN.2022.19
Dimitris Bertsimas and Alex Paskov. 2025. An Exact Solution to Wordle. Operations Research. https://doi.org/10.1287/opre.2022.0434
Andrew Ho. 2022. Solving Wordle with Reinforcement Learning.
Chao-Lin Liu. 2022. Using Wordle for Learning to Design and Compare Strategies. In Proceedings of the 2022 IEEE Conference on Games (CoG 2022), 465–472. https://doi.org/10.1109/CoG51982.2022.9893585
Siddhant Bhambri, Amrita Bhattacharjee, and Dimitri P. Bertsekas. 2023. Playing Wordle Using an Online Rollout Algorithm for Deterministic POMDPs. In Proceedings of the 2023 IEEE Conference on Games (CoG 2023). Extended version available at https://arxiv.org/abs/2211.10298
Benton J. Anderson and Jesse G. Meyer. 2022. Finding the Optimal Human Strategy for Wordle Using Maximum Correct Letter Probabilities and Reinforcement Learning. arXiv:2202.00557. https://arxiv.org/abs/2202.00557
Jiaqi Weng and Chunlin Feng. 2023. Prediction Model for Wordle Game Results with High Robustness. arXiv:2309.14250. https://arxiv.org/abs/2309.14250
Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. 2011. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS 2011).
Volodymyr Mnih et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533. https://doi.org/10.1038/nature14236
Yu-Chun Chen, Hsiao-En Kuan, Yu-Shiang Lu, Tzu-Chi Chen, and I-Chen Wu. 2022. Entropy-Based Two-Phase Optimization Algorithm for Solving Wordle-like Games. In Proceedings of the 2022 International Conference on Technologies and Applications of Artificial Intelligence (TAAI 2022). https://doi.org/10.1109/TAAI57707.2022.00014