Few-shot Prompting
Enable in-context learning through demonstrations and examples
Large language models have impressive zero-shot abilities, but they still fall short on more complex tasks when you don't give them examples. Few-shot prompting is the fix -- you provide demonstrations in the prompt to steer the model toward better performance. These demos act as conditioning for the examples that follow.
According to Touvron et al. 2023, few-shot properties start to emerge once models are large enough (Kaplan et al., 2020).
Let's walk through an example from Brown et al. 2020. Here the task is using a made-up word correctly in a sentence.
Prompt:
A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is:
We were traveling in Africa and we saw these very cute whatpus.
To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses the word farduddle is:
Output:
When we won the game, we all started farduddling to celebrate.
The model learned the task from just one example (1-shot). For harder tasks, you can bump that up -- 3-shot, 5-shot, 10-shot, whatever it takes.
Turn this chapter's knowledge into practical skills
Enter the interactive lab and practice Prompt with real tasks. Get started in 10 minutes.
Based on findings from Min et al. (2022), here are some extra tips about demonstrations for few-shot learning:
- "The label space and the distribution of the input text specified by the demonstrations are both important (regardless of whether the labels are correct for individual inputs)"
- The format you use matters a lot for performance. Even random labels work way better than no labels at all.
- Selecting random labels from the true label distribution (rather than uniform) also helps.
Let's try something interesting. Here's an example with randomized labels (meaning Negative and Positive are randomly assigned to inputs):
Prompt:
This is awesome! // Negative
This is terrible! // Positive
Wow, that movie was great! // Positive
What a horrible show! //
Output:
Negative
We still got the right answer even with randomized labels. And the consistent format helped too. Here's the thing -- newer GPT models are even more robust to messy formatting:
Prompt:
Positive This is awesome!
This is bad! Negative
Wow that movie was rad!
Positive
What a horrible show! --
Output:
Negative
Inconsistent format, but the model still nailed it. More thorough testing would be needed to confirm this holds across different and more complex tasks.
Limitations of Few-Shot Prompting
Standard few-shot works well for many tasks, but it's not perfect -- especially for complex reasoning. Let's show why. Remember this task from earlier:
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:
The model outputs:
Yes, the odd numbers in this group add up to 107, which is an even number.
Wrong. 107 isn't even. This highlights the limitations and the need for more advanced prompting.
Let's see if adding few-shot examples helps.
Prompt:
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Answer is False.
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
A: Answer is True.
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
A: Answer is True.
The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.
A: Answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:
Output:
Answer is True.
Still didn't work. Few-shot alone isn't enough for this kind of reasoning problem. The examples above give the model basic task info, but the task itself requires multi-step reasoning. In other words, breaking the problem into steps and showing the model how to reason through them would help. That's exactly what Chain-of-Thought (CoT) prompting was designed for.
Bottom line: examples help for many tasks. But when zero-shot and few-shot both fail, the model probably needs more than pattern matching -- it needs to reason. From here, consider fine-tuning or more advanced prompting techniques. Next up, we'll cover chain-of-thought prompting, which has been getting a lot of attention.