Physical reasoning
physical reasoning prompt example
TL;DR(中文)
- 这是一个
physical reasoning的小测试:要求模型在脑内做“物理约束 + 稳定性”判断,而不是纯文本知识问答。 - 适合用于验证:模型是否能遵守常识约束(重心、承重、易碎性、摩擦)、是否能给出可执行步骤。
- 在生产里建议:把约束显式写出来(fragile/heavy/sharp/liquid),并要求输出结构化方案与风险提示。
Background
This prompt tests an LLM's physical reasoning capabilities by asking it to perform actions on a set of objects.
How to Apply(中文)
把这个模板迁移到真实任务时,建议把 “objects” 写成更明确的属性集合:
- weight:heavy / light
- fragility:fragile / robust
- shape:flat / cylindrical / sharp
- stability:base area / center of mass
这样模型更容易遵循约束并给出合理的 stacking plan。
How to Iterate(中文)
- 强制输出格式:
Order(从下到上)+Justification(每层原因)+Risks - 增加禁止项:例如 “Do not place fragile items under heavy items”
- 加
self-check:让模型在最后检查是否违反了任何约束 - 加场景变量:桌面大小、是否可用胶带、是否允许开盒/拆包装等
Self-check rubric(中文)
- 是否给出了明确的 stacking order(从下到上)?
- 是否解释了稳定性依据(base area / center of mass / friction)?
- 是否考虑易碎/液体/尖锐物的风险?
- 是否给出备选方案(如果某个物体不可用)?
Practice(中文)
练习:把 objects 换成你生活/工作里真实可遇到的组合,并加上约束:
- “不能损坏任何物品”
- “只能用一只手操作”
- “桌面只有 A4 大小”
观察模型是否能稳定地产出可执行方案。
Prompt
Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.
Code / API
OpenAI (Python)
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": "Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.",
}
],
temperature=1,
max_tokens=500,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
)
Fireworks (Python)
import fireworks.client
fireworks.client.api_key = "<FIREWORKS_API_KEY>"
completion = fireworks.client.ChatCompletion.create(
model="accounts/fireworks/models/mixtral-8x7b-instruct",
messages=[
{
"role": "user",
"content": "Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.",
}
],
stop=["<|im_start|>", "<|im_end|>", "<|endoftext|>"],
stream=True,
n=1,
top_p=1,
top_k=40,
presence_penalty=0,
frequency_penalty=0,
prompt_truncate_len=1024,
context_length_exceeded_behavior="truncate",
temperature=0.9,
max_tokens=4000,
)