Physical reasoning

physical reasoning prompt example

TL;DR（中文）

这是一个 physical reasoning 的小测试：要求模型在脑内做“物理约束 + 稳定性”判断，而不是纯文本知识问答。
适合用于验证：模型是否能遵守常识约束（重心、承重、易碎性、摩擦）、是否能给出可执行步骤。
在生产里建议：把约束显式写出来（fragile/heavy/sharp/liquid），并要求输出结构化方案与风险提示。

Background

This prompt tests an LLM's physical reasoning capabilities by asking it to perform actions on a set of objects.

How to Apply（中文）

把这个模板迁移到真实任务时，建议把 “objects” 写成更明确的属性集合：

weight：heavy / light
fragility：fragile / robust
shape：flat / cylindrical / sharp
stability：base area / center of mass

这样模型更容易遵循约束并给出合理的 stacking plan。

How to Iterate（中文）

强制输出格式：Order（从下到上）+ Justification（每层原因）+ Risks
增加禁止项：例如 “Do not place fragile items under heavy items”
加 self-check：让模型在最后检查是否违反了任何约束
加场景变量：桌面大小、是否可用胶带、是否允许开盒/拆包装等

Self-check rubric（中文）

是否给出了明确的 stacking order（从下到上）？
是否解释了稳定性依据（base area / center of mass / friction）？
是否考虑易碎/液体/尖锐物的风险？
是否给出备选方案（如果某个物体不可用）？

Practice（中文）

练习：把 objects 换成你生活/工作里真实可遇到的组合，并加上约束：

“不能损坏任何物品”
“只能用一只手操作”
“桌面只有 A4 大小”

观察模型是否能稳定地产出可执行方案。

Prompt

Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.

Code / API

OpenAI (Python)

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "user",
            "content": "Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.",
        }
    ],
    temperature=1,
    max_tokens=500,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
)

Fireworks (Python)

import fireworks.client

fireworks.client.api_key = "<FIREWORKS_API_KEY>"

completion = fireworks.client.ChatCompletion.create(
    model="accounts/fireworks/models/mixtral-8x7b-instruct",
    messages=[
        {
            "role": "user",
            "content": "Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.",
        }
    ],
    stop=["<|im_start|>", "<|im_end|>", "<|endoftext|>"],
    stream=True,
    n=1,
    top_p=1,
    top_k=40,
    presence_penalty=0,
    frequency_penalty=0,
    prompt_truncate_len=1024,
    context_length_exceeded_behavior="truncate",
    temperature=0.9,
    max_tokens=4000,
)

Reference

Sparks of Artificial General Intelligence: Early experiments with GPT-4 (13 April 2023)