Regex Cheat Sheet (2026) — Anchors, Quantifiers & Flags
Regex in one tab — anchors, character classes, quantifiers, groups, lookarounds, flags, and the patterns worth keeping. PCRE, JS, Python, Go, Java.
Regex is unmemorable on purpose. The syntax is dense, the flavors disagree, and the patterns you wrote last quarter look like keysmash today. This is the one tab worth keeping open.
Examples below are PCRE-flavored unless noted. JavaScript, Python, Go, and Rust all match the basics; the Flavor differences section calls out where they diverge.
Anatomy of a pattern
Anchors — where the match has to land
| Pattern | Matches |
|---|---|
^ | Start of string (or line, with m flag) |
$ | End of string (or line, with m flag) |
\b | Word boundary — the spot between \w and \W |
\B | Non-word boundary |
\A | Absolute start of input (ignores m flag) |
\z | Absolute end of input (ignores m flag) — not in JS |
Character classes — what counts as a character
| Pattern | Matches |
|---|---|
. | Any character except newline (use s flag to include newlines) |
\d / \D | Digit [0-9] / non-digit |
\w / \W | Word character [A-Za-z0-9_] / non-word |
\s / \S | Whitespace / non-whitespace |
[abc] | One of a, b, c |
[^abc] | Anything but a, b, c |
[a-z] | One character in the range |
[a-zA-Z0-9] | Letters or digits |
[\d-] | A digit or a literal - (place - first or last to make it literal) |
\w is not Unicode-aware in most flavors. café has \w\w\w\W because é isn’t ASCII. In JS, opt in with the u flag and use \p{L} from the Unicode property classes section below. In Python, \w is Unicode-aware by default; pass re.ASCII to opt out.
Quantifiers — how many times
| Pattern | Matches |
|---|---|
? | 0 or 1 (optional) |
* | 0 or more |
+ | 1 or more |
{n} | Exactly n |
{n,} | n or more |
{n,m} | Between n and m (inclusive) |
*? +? ?? {n,m}? | Lazy — match the shortest possible string |
*+ ++ ?+ {n,m}+ | Possessive — no backtracking (PCRE/Java only) |
Default quantifiers are greedy — they match as much as possible, then backtrack. <.*> against <a><b> matches the entire <a><b>, not just <a>. Add ? to go lazy: <.*?> matches <a> first.
Greedy · <.*>
Lazy · <.*?>
Groups & alternation
| Pattern | Matches |
|---|---|
(abc) | Capturing group — referenced as \1 or $1 |
(?:abc) | Non-capturing group — same grouping, no capture overhead |
(?<name>abc) | Named capture group — referenced as \k<name> or $<name> |
a|b | a or b |
(cat|dog)s? | cat, cats, dog, dogs |
Lookarounds — match without consuming
| Pattern | Means |
|---|---|
(?=...) | Lookahead: followed by ... |
(?!...) | Negative lookahead: not followed by ... |
(?<=...) | Lookbehind: preceded by ... |
(?<!...) | Negative lookbehind: not preceded by ... |
\d+(?=px) — digits that come right before "px" → 12 in "12px"(?<=\$)\d+ — digits that come right after "$" → 99 in "$99"\b(?!the\b)\w+ — words that aren't "the" → fox, jumps, ...Lookbehind used to be limited to fixed-width patterns. Modern PCRE2, .NET, Python 3.7+, and JavaScript (since 2018) all support variable-length lookbehind. Older Java and Go’s RE2 still don’t support lookbehind at all.
Backreferences — match the same thing twice
| Pattern | Means |
|---|---|
\1, \2, … | Re-match group 1, 2, … |
\k<name> | Re-match named group |
(['"]).+?\1 | Match a quoted string with matching quote types |
\b(\w+)\s+\1\b | Find duplicated words (“the the”) |
Flags — the modifiers
| Flag | Effect |
|---|---|
i | Case-insensitive |
m | ^ and $ match start/end of each line |
s (a.k.a. dotall) | . matches newlines |
g (JS) | Global — find all, not just first |
u (JS) | Unicode mode — enables \p{...}, surrogate pair handling |
x (PCRE/Python/Ruby) | Free-spacing — whitespace and # comments ignored |
// JS — flags go after the closing slash/^foo$/im.test("FOO") // true"abc abc abc".match(/abc/g) // ["abc", "abc", "abc"]# Python — flags as the second arg or inline (?i)(?m)(?s)re.match(r"^foo$", "FOO", re.I | re.M)Unicode property classes (u flag in JS, default in Python/Ruby/PCRE)
| Pattern | Matches |
|---|---|
\p{L} | Any letter (Latin, Greek, CJK, …) |
\p{N} | Any number |
\p{P} | Any punctuation |
\p{Z} | Any separator (whitespace + line/paragraph) |
\p{Lu} / \p{Ll} | Uppercase / lowercase letter |
\p{Script=Greek} | Letters from a specific script |
\P{L} | The negation — anything that isn’t a letter |
Replacement syntax — what goes in the right side of a substitution
| Token | Means |
|---|---|
$& (or \0) | The full match |
$1, $2, … | Capture group 1, 2, … |
$<name> (or \k<name>) | Named group |
$$ | A literal $ |
\u, \l (Vim, sed) | Uppercase / lowercase the next character |
s/(\w+) (\w+)/$2 $1/ — swap two wordss/(?<area>\d{3})/(\1)/ — wrap a 3-digit run in parensPatterns worth keeping
| Goal | Pattern |
|---|---|
| Trim leading/trailing whitespace | ^\s+|\s+$ |
| Collapse internal whitespace | \s+ → " " |
| Match an integer (signed) | ^-?\d+$ |
| Match a decimal | ^-?\d+(\.\d+)?$ |
| Hex color | ^#(?:[0-9a-fA-F]{3}){1,2}$ |
| ISO date (YYYY-MM-DD) | ^\d{4}-\d{2}-\d{2}$ |
| ISO time (HH:MM , 24h) | ^([01]\d|2[0-3]):[0-5]\d$ |
| Slug (kebab-case) | ^[a-z0-9]+(-[a-z0-9]+)*$ |
| URL (loose, good-enough) | https?://[^\s)]+ |
| IPv4 | \b(?:\d{1,3}\.){3}\d{1,3}\b (also matches 999.999.999.999) |
| UUID v4 | ^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$ |
| Strip ANSI escape codes | \x1b\[[0-9;]*m |
There is no correct regex for email addresses. The official one (RFC 5322) is hundreds of characters and accepts addresses no real system delivers to. Use ^[^@\s]+@[^@\s]+\.[^@\s]+$ as a sanity check, then send a confirmation email — that is how every production system actually validates.
Flavor differences
| Feature | JS | Python | PCRE | Go | Java |
|---|---|---|---|---|---|
| Lookbehind | |||||
| Variable-length lookbehind | |||||
Possessive *+ ++ | |||||
Atomic groups (?>...) | |||||
| Backreferences | |||||
Inline flags (?i) | |||||
Unicode \p{L} | |||||
Free-spacing x flag |
The headline: Go’s RE2 trades expressiveness for guaranteed linear-time matching — no backreferences, no lookarounds. Worth knowing before you copy a PCRE pattern into Go.
The traps
Patterns that look right and aren’t:
- Catastrophic backtracking.
(a+)+$againstaaaaaaaaaaaaaaaaaaaaaaaaaa!can hang for seconds. Use possessive quantifiers, atomic groups, or rewrite to avoid nested repetition. .*across newlines..does not match\nby default. Use thes(dotall) flag, or[\s\S]*?for “any character including newlines.”- Anchors with
mvs without. Without themflag,^and$only match the start and end of the whole string, not each line. - Greedy
<.*>. Match an HTML tag with<[^>]+>, not<.*>— the latter swallows everything between the first<and the last>. - Replace strings have their own metacharacters. A
$in your replacement string is interpolated. Escape with$$(most flavors) or\$(some). - Capture groups change replacement numbering. Adding a
(?:...)non-capturing group instead of(...)keeps$1,$2, … referring to what you expect. - HTML / JSON / SQL. Don’t parse them with regex. Use a parser. The “good enough” regex always misses the case that breaks production.
Why (a+)+$ hangs on aaaa!
Test your patterns
The pattern almost always means something subtly different from what you typed. Use a tester before committing it:
- regex101.com — flavor switcher, explains your pattern token-by-token, shows backtracking
- regexr.com — lighter, pattern library
node -e 'console.log("test".match(/.../))'— fastest local check for JS regex
More cheat sheets
- Git Cheat Sheet — branching, undoing, rebasing, the dangerous flags
Discussion for this post
Letters, marginalia, and dispatches from fellow readers.
Likes, favorites, and comments are available for signed-in readers.
Comments
Be the first to drop a thought.