Self-Consistency
多路径采样 + 聚合,提升 CoT 推理稳定性
也许在提示工程中更高级的技术之一是自我一致性。由 Wang 等人(2022) 提出,自我一致性旨在“替换链式思维提示中使用的天真贪婪解码方法”。其想法是通过少样本 CoT 采样多个不同的推理路径,并使用生成结果选择最一致的答案。这有助于提高 CoT 提示在涉及算术和常识推理的任务中的性能。
让我们尝试以下算术推理示例:
提示:
codeWhen I was 6 years old, my sister was half my age. Now I am 70 years old. How old is my sister?
输出:
code35
输出是错误的!我们如何通过自我一致性来改进这个问题?让我们试试。我们将使用 Wang 等人 2022 年的少量样本范例(表 17 ):
提示:
codeQ: There are 15 trees in the grove. The forester will plant trees today. After they are done, there will be 21 trees. How many trees did the forester plant today? A: We start with 15 trees. We end with 21 trees. The difference is the number of trees planted. So they planted 21 - 15 = 6 trees. The answer is 6. Q: There are 3 cars in the parking lot and 2 more cars arrive. How many cars are in the parking lot? A: There are already 3 cars. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5. Q: Leah has 32 chocolates and her sister has 42. If they eat 35, how many do they have left in total? A: Leah has 32 and her sister has 42, so initially they have 32 + 42 = 74 chocolates. They ate 35. So they have 74 - 35 = 39 chocolates left. The answer is 39. Q: Jason has 20 lollipops. He gives some to Denny. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny? A: Jason had 20 lollipops. Now he has 12, so he gave away 20 - 12 = 8 lollipops. The answer is 8. Q: Shawn has 5 toys. For Christmas, he got 2 toys from his parents. How many toys does he have now? A: He has 5 toys. He got 2 more, so he has 5 + 2 = 7 toys. Then he got 2 more, so in total he has 7 + 2 = 9 toys. The answer is 9. Q: There are 9 computers in the server room. From Monday to Thursday, 5 computers are installed each day. How many computers are in the server room now? A: From Monday to Thursday is 4 days. Each day 5 computers are added, so 4 * 5 = 20 computers were added. Starting with 9 computers, now there are 9 + 20 = 29 computers. The answer is 29. Q: Michael has 58 golf balls. On Tuesday he loses 23. On Wednesday he loses 2 more. How many golf balls does he have left at the end of Wednesday? A: Michael started with 58. After losing 23 on Tuesday he has 58 - 23 = 35. After losing 2 more on Wednesday he has 35 - 2 = 33. The answer is 33. Q: Olivia has 23 dollars. She buys five bagels at 3 dollars each. How much money does she have left? A: Five bagels at 3 dollars each cost 15 dollars. She has 23 - 15 = 8 dollars left. The answer is 8. Q: When I was 6 years old, my sister was half my age. Now I am 70 years old. How old is my sister? A:
输出 1:
codeWhen I was 6, my sister was half my age, so she was 3. Now I am 70, so she is 70 - 3 = 67. The answer is 67.
输出 2:
codeWhen the narrator was 6, their sister was half their age, so she was 3. Now the narrator is 70, so the sister is 70 - 3 = 67. The answer is 67.
输出 3:
codeWhen I was 6, my sister was half my age, so she was 3. Now I am 70, so she is 70 / 2 = 35. The answer is 35.
计算最终答案涉及几个步骤(详见论文),但为了简单起见,我们可以看到已经出现了大多数答案,因此这基本上将成为最终答案。