logo
P
Prompt Master

Prompt 大师

掌握和 AI 对话的艺术

Jailbreaking

Jailbreaking concepts and defenses (safety-trimmed)

Background

Jailbreaking refers to attempts to bypass an LLM's safety policies and defense mechanisms, tricking the model into outputting content it shouldn't. This is a concept from the security research context.

What You Need to Know

  • In real products, jailbreaking often overlaps with prompt injection and prompt leaking
  • Models and providers keep updating, so any specific jailbreak prompt will quickly become ineffective or get patched

Defense Strategy (High Level)

  • Clearly separate instructions from user input (structured, partitioned, quoted/escaped)
  • Declare a threat model in the instructions: don't execute additional instructions found in user input
  • Do output filtering / policy checks (plus logging and monitoring)
  • Enforce strict allowlists for tool calls and external actions

For security reasons, this site doesn't provide usable jailbreak prompts or copyable attack scripts that could bypass safety policies.