Prompt Chaining
Break tasks into steps and generate incrementally for more controllable output
Introduction
To make LLMs more reliable, one key prompting technique is breaking tasks into subtasks. Once you've identified the subtasks, you feed each subtask's prompt to the model and use the result as part of the next prompt. That's prompt chaining -- decomposing a task into subtasks and creating a series of prompt operations.
https://www.youtube.com/embed/CKZC5RigYEc?si=EG1kHf83ceawWdHX
Prompt chaining can handle really complex tasks that an LLM might struggle with using a single detailed prompt. In a chain, each step transforms or processes the generated response until you reach the desired result.
Beyond performance, prompt chaining also improves transparency, controllability, and reliability. You can pinpoint where the model goes wrong and improve performance at specific stages.
It's especially useful for building LLM-powered conversational assistants and creating personalized user experiences.
Prompt Chaining Examples
Prompt Chaining for Document QA
Prompt chains work great for scenarios involving multiple operations or transformations. For example, a common LLM use case is answering questions based on large text documents. You can design two prompts: the first one extracts relevant quotes to answer the question, and the second one uses those quotes plus the original document to generate the actual answer.
The first prompt below extracts relevant quotes from a document based on a question. For simplicity, we've added a placeholder {{document}}. To test this, you could copy-paste a Wikipedia article, like the one on Prompt Engineering. Since this uses a long context, we went with OpenAI's gpt-4-1106-preview. You can also use other long-context LLMs like Claude.
Prompt 1:
You are a helpful assistant. Your task is to answer a question based on a document.
Step 1: Extract quotes that are relevant to the question from the document delimited by ####.
Output the quote list using <quotes></quotes>. If you cannot find any relevant quotes, respond with "No relevant quotes found!".
####
{{document}}
####
Here's a screenshot of the full prompt, including the question passed via the user role.

Prompt 1 output:
<quotes>
- Chain-of-thought (CoT) prompting[27]
- Generated knowledge prompting[37]
- Least-to-most prompting[38]
- Self-consistency decoding[39]
- Complexity-based prompting[41]
- Self-refine[42]
- Tree-of-thought prompting[43]
- Maieutic prompting[45]
- Directional-stimulus prompting[46]
- Textual inversion and embeddings[59]
- Using gradient descent to search for prompts[61][62][63][64]
- Prompt injection[65][66][67]
</quotes>
The quotes returned from the first prompt now become input for the second prompt. You could clean them up -- say, remove citation markers -- using another prompt in the chain. But for now, let's skip that. The second prompt takes the extracted quotes and generates a helpful answer based on both the document and the quotes:
Prompt 2:
Using the relevant quotes extracted from the document (delimited by <quotes></quotes>) and the original document (delimited by ####), write an answer to the question. Make sure the answer is accurate, friendly, and helpful.
####
{{document}}
####
<quotes>
- Chain-of-thought (CoT) prompting[27]
- Generated knowledge prompting[37]
- Least-to-most prompting[38]
- Self-consistency decoding[39]
- Complexity-based prompting[41]
- Self-refine[42]
- Tree-of-thought prompting[43]
- Maieutic prompting[45]
- Directional-stimulus prompting[46]
- Textual inversion and embeddings[59]
- Using gradient descent to search for prompts[61][62][63][64]
- Prompt injection[65][66][67]
</quotes>
Prompt 2 output:
The prompt techniques mentioned in the document include:
1. Chain-of-thought (CoT) prompting[27]
2. Generated knowledge prompting[37]
3. Least-to-most prompting[38]
4. Self-consistency decoding[39]
5. Complexity-based prompting[41]
6. Self-refine[42]
7. Tree-of-thought prompting[43]
8. Maieutic prompting[45]
9. Directional-stimulus prompting[46]
10. Textual inversion and embeddings[59]
11. Using gradient descent to search for prompts[61][62][63][64]
12. Prompt injection[65][66][67]
Each technique applies a distinct strategy to improve or constrain interactions with large language models to produce desired outputs.
As you can see, simplifying and creating prompt chains is a useful approach when responses need multiple operations or transformations. As an exercise, try designing a prompt that strips the citation markers (e.g., [27]) from the response before sending it to the end user.
You can also find more prompt chaining examples in this documentation, which uses the Claude LLM. Our examples were inspired by and adapted from theirs.