ToT
Tree of Thoughts: explore the solution space via search
For complex tasks that require exploration or strategic lookahead, traditional prompting techniques don't cut it. Yao et al. (2023) proposed the Tree of Thoughts (ToT) framework, which builds on chain-of-thought prompting and guides language models to explore thoughts as intermediate steps for general problem solving.
ToT maintains a tree of thoughts, where thoughts are coherent language sequences that serve as intermediate steps toward solving a problem. This lets the LM evaluate its own intermediate thoughts during rigorous reasoning. It combines the ability to generate and evaluate thoughts with search algorithms (BFS, DFS) to systematically explore while validating forward and backtracking.
Here's how the ToT framework works:

Image source: Yao et al. (2023)
ToT requires defining the number of thoughts/steps and the number of candidates per step, depending on the task. For example, the "Game of 24" in the paper is a math reasoning task that needs 3 thought steps, each requiring an intermediate equation. Each step keeps the top 5 candidates.
For the Game of 24, ToT uses breadth-first search (BFS), where the LM evaluates each thought candidate as "sure/maybe/impossible." The authors explain: "the aim is to promote correct partial solutions that can be verified as sure with few lookahead trials, eliminate impossible partial solutions based on 'too big/too small' commonsense, and keep the rest as maybe." Each step samples 3 evaluations. The process looks like this:

Image source: Yao et al. (2023)
The results speak for themselves -- ToT significantly outperforms other prompting methods:

Image source: Yao et al. (2023)
Code examples can be found here and here.
Big picture: Yao et al. (2023) and Long (2023) share a similar core idea -- both enhance LLM problem-solving through multi-turn conversation in a tree search format. The main difference is that Yao et al. (2023) uses DFS/BFS/beam search, while Long (2023) proposes a "ToT Controller" trained via reinforcement learning to drive the search strategy (including when to backtrack and how far). DFS/BFS/beam search are generic -- they don't adapt to specific problems. A RL-trained ToT Controller, on the other hand, can learn from new datasets or through self-play (think AlphaGo vs. brute force search). So even with a frozen LLM, an RL-based ToT system can keep evolving and learning.
Hulbert (2023) distilled the main ToT concepts into a short prompt that guides the LLM to evaluate intermediate thoughts in a single prompt. Here's an example:
Imagine three different experts are answering this question.
Each expert writes down the first step of their thinking, then shares it.
Then each expert writes down the next step of their thinking and shares it.
Continue until all experts have completed all their steps.
If any expert makes a mistake, remove that expert from the discussion.
Question: ...