logo
58

Regular Expressions

⏱️ 40 min

Regular Expressions (Regex): Extracting Text Patterns Efficiently

What might confuse you right now

"Regex looks unreadable. Too many symbols."

Start with common use cases (emails, order numbers, phone numbers). You don't need to memorize the entire syntax at once.

One-line definition

Regex is a string matching ruleset for finding, extracting, and replacing text.

Real-life analogy

Text is like a shelf of products. Regex is the filter label you use to pick out what you need.

Minimal runnable example

import re
text = "email: hello@example.com"
m = re.search(r"[\w.-]+@[\w.-]+\.\w+", text)
print(m.group())

Quick quiz (5 min)

  1. Extract all order numbers (e.g., A-123) from a text.
  2. Extract all emails.
  3. Use re.sub() to mask digits.

Quiz answer guidelines & grading criteria

  • Answer direction: working code that covers core conditions and edge inputs from the prompt.
  • Criterion 1 (Correctness): Main flow produces correct results, key branches execute.
  • Criterion 2 (Readability): Clear variable names, no excessive nesting.
  • Criterion 3 (Robustness): Basic protection against null values, type errors, or unexpected input.

Transfer task (homework)

Write extract_contacts(text) that returns two lists: emails and phones.

Acceptance criteria

You can independently:

  • Use search/findall/sub
  • Write and debug basic patterns
  • Perform text extraction and masking

Common errors & debugging steps (beginner edition)

  • Can't understand the error: read the last line for the error type (e.g., TypeError, NameError), then trace back to the relevant code line.
  • Not sure about a variable's value: temporarily add print(variable, type(variable)) at key points to verify data matches expectations.
  • Code changes aren't taking effect: confirm the file is saved, you're running the right file, and your terminal environment (venv) is correct.

Common misconceptions

  • Misconception: More complex regex = better.
  • Reality: Readability and maintainability come first.