logo
P
Prompt Master

Prompt 大师

掌握和 AI 对话的艺术

GraphPrompts

Graph-based prompting framework: inject graph-structured information into LLM context

TL;DR

  • GraphPrompts is for tasks where the data is naturally a graph: knowledge graphs, social graphs, citation graphs, molecule graphs, recommendation graphs, etc.
  • Core idea: inject graph structure information (nodes/edges/neighborhoods/subgraphs) into the LLM context in a more controlled way, so the model can leverage structure instead of just reading flat text.
  • Two common approaches in practice: graph-to-text (linearize the subgraph) and retrieve-then-prompt (retrieve relevant nodes/paths/communities first, then generate answers).
  • Key risks: context explosion, structure loss (linearization drops information), and unverifiable outputs. Pair with evaluation and a self-check rubric.

Core Concepts

GraphPrompts generally refers to a class of "graph-aware prompting" methods: you have a graph (nodes + edges) and you want the LLM to use the graph's structure and attributes for downstream tasks.

In many real business scenarios, graph information already exists:

  • CRM/user relationships: user-user, user-company, user-behavior
  • Content/search: query-document, document-document (citation/similarity), topic graph
  • Courses/learning: lesson-prerequisite, student-lesson, skill graph

If you dump all this structure directly to the model, the common problems are: too much information, unclear structure, and the model ignoring key relationships. That's why GraphPrompts emphasizes "how to select the subgraph + how to represent the structure + how to constrain the output."

How to Apply

Step 1: Define the Task and Output (Task-First)

First, pin down your downstream task type:

  • classification (e.g., node label)
  • link prediction (e.g., recommendation/relationship prediction)
  • ranking (e.g., candidate node ranking)
  • question answering (fact-based QA over a graph)

Then define the output format (keep it structured), for example:

  • label (single label)
  • top-k candidates (list + score)
  • answer + evidence (answer + referenced node/edge)

Step 2: Subgraph Selection

Don't stuff the entire graph into the context. Common selection strategies:

  • k-hop neighborhood (1-2 hops around the target node)
  • shortest path / random walk (extract relevant paths)
  • community / cluster summary (community-level summaries)
  • hybrid: use traditional retrieval/graph algorithms first, then let the LLM re-rank

Step 3: Represent the Structure (Graph Representation)

Three common representation methods (balancing controllability vs. information loss):

  1. Adjacency list: Most direct, complete information, but can get long
  2. Triples (subject-predicate-object): Great for knowledge graphs
  3. Natural language summary: Short, but loses structural detail

Add a "schema" layer to reduce ambiguity -- fix fields like: node_id, type, attributes, edge_type.

Prompt Template

You are a graph-aware assistant.

Task: <describe task>
Output format: <JSON schema or strict format>

Graph schema:
- Node: id, type, attributes
- Edge: source, target, type

Given the subgraph below, solve the task using only the provided graph information.
If the graph lacks sufficient evidence, respond with "Insufficient evidence" and list missing information.

Subgraph:
<nodes>
<edges>

How to Iterate

  1. Reduce context: Shrink from 2-hop to 1-hop; recall 50 first, then re-rank to 10
  2. Add evidence constraints: Require that each conclusion maps to a node_id/edge (for downstream verification)
  3. Structured output: Use JSON schema with fixed fields -- avoid the model writing prose
  4. Add negative signals: Have the model explain "which edges/nodes are missing that would be needed to support this conclusion"
  5. Compare baselines: For the same task, compare "no graph" vs "graph-to-text" vs "retrieve-then-prompt"

Self-Check Rubric

  • Did the model only use the given subgraph (no external knowledge/guessing)?
  • Can each conclusion in the output point to a specific node_id/edge?
  • Does the subgraph selection cover key relationships (any missing critical hops/paths)?
  • Is the context manageable (no token explosion)? Any ambiguity in the structure representation?

Practice

Exercise: Use a "course prerequisite graph" for recommendations.

  • Node: course, skill
  • Edge: requires, teaches
  • Task: Given a student's completed course list, recommend the next 3 courses and explain the graph evidence for each recommendation (which requires edges are satisfied, which skill nodes are missing).

References