Self-Consistency
Sample multiple reasoning paths and aggregate to stabilize CoT outputs
Self-consistency is one of the more advanced prompting techniques out there. Proposed by Wang et al. (2022), it aims to "replace the naive greedy decoding used in chain-of-thought prompting." The idea: sample multiple different reasoning paths through few-shot CoT, then pick the most consistent answer from the generated results. This boosts CoT performance on arithmetic and commonsense reasoning tasks.
Let's try this arithmetic reasoning example:
Prompt:
When I was 6 years old, my sister was half my age. Now I am 70 years old. How old is my sister?
Output:
35
Wrong. So how do we fix this with self-consistency? Let's use the few-shot exemplars from Wang et al. 2022 (Table 17):
Prompt:
Q: There are 15 trees in the grove. The forester will plant trees today. After they are done, there will be 21 trees. How many trees did the forester plant today?
A: We start with 15 trees. We end with 21 trees. The difference is the number of trees planted. So they planted 21 - 15 = 6 trees. The answer is 6.
Q: There are 3 cars in the parking lot and 2 more cars arrive. How many cars are in the parking lot?
A: There are already 3 cars. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.
Q: Leah has 32 chocolates and her sister has 42. If they eat 35, how many do they have left in total?
A: Leah has 32 and her sister has 42, so initially they have 32 + 42 = 74 chocolates. They ate 35. So they have 74 - 35 = 39 chocolates left. The answer is 39.
Q: Jason has 20 lollipops. He gives some to Denny. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?
A: Jason had 20 lollipops. Now he has 12, so he gave away 20 - 12 = 8 lollipops. The answer is 8.
Q: Shawn has 5 toys. For Christmas, he got 2 toys from his parents. How many toys does he have now?
A: He has 5 toys. He got 2 more, so he has 5 + 2 = 7 toys. Then he got 2 more, so in total he has 7 + 2 = 9 toys. The answer is 9.
Q: There are 9 computers in the server room. From Monday to Thursday, 5 computers are installed each day. How many computers are in the server room now?
A: From Monday to Thursday is 4 days. Each day 5 computers are added, so 4 * 5 = 20 computers were added. Starting with 9 computers, now there are 9 + 20 = 29 computers. The answer is 29.
Q: Michael has 58 golf balls. On Tuesday he loses 23. On Wednesday he loses 2 more. How many golf balls does he have left at the end of Wednesday?
A: Michael started with 58. After losing 23 on Tuesday he has 58 - 23 = 35. After losing 2 more on Wednesday he has 35 - 2 = 33. The answer is 33.
Q: Olivia has 23 dollars. She buys five bagels at 3 dollars each. How much money does she have left?
A: Five bagels at 3 dollars each cost 15 dollars. She has 23 - 15 = 8 dollars left. The answer is 8.
Q: When I was 6 years old, my sister was half my age. Now I am 70 years old. How old is my sister?
A:
Output 1:
When I was 6, my sister was half my age, so she was 3. Now I am 70, so she is 70 - 3 = 67. The answer is 67.
Output 2:
When the narrator was 6, their sister was half their age, so she was 3. Now the narrator is 70, so the sister is 70 - 3 = 67. The answer is 67.
Output 3:
When I was 6, my sister was half my age, so she was 3. Now I am 70, so she is 70 / 2 = 35. The answer is 35.
Computing the final answer involves a few steps (see the paper for details), but the short version: the majority answer already emerged, so that becomes the final answer.