58
Regular Expressions
Regular Expressions (Regex): Extracting Text Patterns Efficiently
What might confuse you right now
"Regex looks unreadable. Too many symbols."
Start with common use cases (emails, order numbers, phone numbers). You don't need to memorize the entire syntax at once.
One-line definition
Regex is a string matching ruleset for finding, extracting, and replacing text.
Real-life analogy
Text is like a shelf of products. Regex is the filter label you use to pick out what you need.
Minimal runnable example
import re
text = "email: hello@example.com"
m = re.search(r"[\w.-]+@[\w.-]+\.\w+", text)
print(m.group())
Quick quiz (5 min)
- Extract all order numbers (e.g.,
A-123) from a text. - Extract all emails.
- Use
re.sub()to mask digits.
Quiz answer guidelines & grading criteria
- Answer direction: working code that covers core conditions and edge inputs from the prompt.
- Criterion 1 (Correctness): Main flow produces correct results, key branches execute.
- Criterion 2 (Readability): Clear variable names, no excessive nesting.
- Criterion 3 (Robustness): Basic protection against null values, type errors, or unexpected input.
Transfer task (homework)
Write extract_contacts(text) that returns two lists: emails and phones.
Acceptance criteria
You can independently:
- Use
search/findall/sub - Write and debug basic patterns
- Perform text extraction and masking
Common errors & debugging steps (beginner edition)
- Can't understand the error: read the last line for the error type (e.g.,
TypeError,NameError), then trace back to the relevant code line. - Not sure about a variable's value: temporarily add
print(variable, type(variable))at key points to verify data matches expectations. - Code changes aren't taking effect: confirm the file is saved, you're running the right file, and your terminal environment (venv) is correct.
Common misconceptions
- Misconception: More complex regex = better.
- Reality: Readability and maintainability come first.