Regex Cheat Sheet (2026) — Anchors, Quantifiers & Flags • Daily Upkeep

Regex is unmemorable on purpose. The syntax is dense, the flavors disagree, and the patterns you wrote last quarter look like keysmash today. This is the one tab worth keeping open.

Examples below are PCRE-flavored unless noted. JavaScript, Python, Go, and Rust all match the basics; the Flavor differences section calls out where they diverge.

Anatomy of a pattern

Every regex is a string of these — anchors, escapes, groups, quantifiers, anchors. Read it left to right; the rest is vocabulary.

Anchors — where the match has to land

Pattern	Matches
`^`	Start of string (or line, with `m` flag)
`$`	End of string (or line, with `m` flag)
`\b`	Word boundary — the spot between `\w` and `\W`
`\B`	Non-word boundary
`\A`	Absolute start of input (ignores `m` flag)
`\z`	Absolute end of input (ignores `m` flag) — not in JS

Character classes — what counts as a character

Pattern	Matches
`.`	Any character except newline (use `s` flag to include newlines)
`\d` / `\D`	Digit `[0-9]` / non-digit
`\w` / `\W`	Word character `[A-Za-z0-9_]` / non-word
`\s` / `\S`	Whitespace / non-whitespace
`[abc]`	One of `a`, `b`, `c`
`[^abc]`	Anything but `a`, `b`, `c`
`[a-z]`	One character in the range
`[a-zA-Z0-9]`	Letters or digits
`[\d-]`	A digit or a literal `-` (place `-` first or last to make it literal)

\w is not Unicode-aware in most flavors. café has \w\w\w\W because é isn’t ASCII. In JS, opt in with the u flag and use \p{L} from the Unicode property classes section below. In Python, \w is Unicode-aware by default; pass re.ASCII to opt out.

Quantifiers — how many times

Pattern	Matches
`?`	0 or 1 (optional)
`*`	0 or more
`+`	1 or more
`{n}`	Exactly `n`
`{n,}`	`n` or more
`{n,m}`	Between `n` and `m` (inclusive)
`*?` `+?` `??` `{n,m}?`	Lazy — match the shortest possible string
`*+` `++` `?+` `{n,m}+`	Possessive — no backtracking (PCRE/Java only)

Default quantifiers are greedy — they match as much as possible, then backtrack. <.*> against <a><b> matches the entire <a><b>, not just <a>. Add ? to go lazy: <.*?> matches <a> first.

Greedy · <.*>

Lazy · <.*?>

Groups & alternation

Pattern	Matches
`(abc)`	Capturing group — referenced as `\1` or `$1`
`(?:abc)`	Non-capturing group — same grouping, no capture overhead
`(?<name>abc)`	Named capture group — referenced as `\k<name>` or `$<name>`
`a\|b`	`a` or `b`
`(cat\|dog)s?`	`cat`, `cats`, `dog`, `dogs`

Lookarounds — match without consuming

Pattern	Means
`(?=...)`	Lookahead: followed by `...`
`(?!...)`	Negative lookahead: not followed by `...`
`(?<=...)`	Lookbehind: preceded by `...`
`(?<!...)`	Negative lookbehind: not preceded by `...`

\d+(?=px)         — digits that come right before "px"      → 12 in "12px"
(?<=\$)\d+        — digits that come right after "$"        → 99 in "$99"
\b(?!the\b)\w+    — words that aren't "the"                  → fox, jumps, ...

Lookbehind used to be limited to fixed-width patterns. Modern PCRE2, .NET, Python 3.7+, and JavaScript (since 2018) all support variable-length lookbehind. Older Java and Go’s RE2 still don’t support lookbehind at all.

Backreferences — match the same thing twice

Pattern	Means
`\1`, `\2`, …	Re-match group 1, 2, …
`\k<name>`	Re-match named group
`(['"]).+?\1`	Match a quoted string with matching quote types
`\b(\w+)\s+\1\b`	Find duplicated words (“the the”)

Flags — the modifiers

Flag	Effect
`i`	Case-insensitive
`m`	`^` and `$` match start/end of each line
`s` (a.k.a. `dotall`)	`.` matches newlines
`g` (JS)	Global — find all, not just first
`u` (JS)	Unicode mode — enables `\p{...}`, surrogate pair handling
`x` (PCRE/Python/Ruby)	Free-spacing — whitespace and `# comments` ignored

// JS — flags go after the closing slash
/^foo$/im.test("FOO")    // true
"abc abc abc".match(/abc/g)  // ["abc", "abc", "abc"]

# Python — flags as the second arg or inline (?i)(?m)(?s)
re.match(r"^foo$", "FOO", re.I | re.M)

Unicode property classes (`u` flag in JS, default in Python/Ruby/PCRE)

Pattern	Matches
`\p{L}`	Any letter (Latin, Greek, CJK, …)
`\p{N}`	Any number
`\p{P}`	Any punctuation
`\p{Z}`	Any separator (whitespace + line/paragraph)
`\p{Lu}` / `\p{Ll}`	Uppercase / lowercase letter
`\p{Script=Greek}`	Letters from a specific script
`\P{L}`	The negation — anything that isn’t a letter

Replacement syntax — what goes in the right side of a substitution

Token	Means
`$&` (or `\0`)	The full match
`$1`, `$2`, …	Capture group 1, 2, …
`$<name>` (or `\k<name>`)	Named group
`$$`	A literal `$`
`\u`, `\l` (Vim, sed)	Uppercase / lowercase the next character

s/(\w+) (\w+)/$2 $1/   — swap two words
s/(?<area>\d{3})/(\1)/  — wrap a 3-digit run in parens

Patterns worth keeping

Goal	Pattern
Trim leading/trailing whitespace	`^\s+\|\s+$`
Collapse internal whitespace	`\s+` → `" "`
Match an integer (signed)	`^-?\d+$`
Match a decimal	`^-?\d+(\.\d+)?$`
Hex color	`^#(?:[0-9a-fA-F]{3}){1,2}$`
ISO date (YYYY-MM-DD)	`^\d{4}-\d{2}-\d{2}$`
ISO time (HH:MM , 24h)	`^([01]\d\|2[0-3]):[0-5]\d$`
Slug (kebab-case)	`^[a-z0-9]+(-[a-z0-9]+)*$`
URL (loose, good-enough)	`https?://[^\s)]+`
IPv4	`\b(?:\d{1,3}\.){3}\d{1,3}\b` (also matches `999.999.999.999`)
UUID v4	`^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$`
Strip ANSI escape codes	`\x1b\[[0-9;]*m`

There is no correct regex for email addresses. The official one (RFC 5322) is hundreds of characters and accepts addresses no real system delivers to. Use ^[^@\s]+@[^@\s]+\.[^@\s]+$ as a sanity check, then send a confirmation email — that is how every production system actually validates.

Flavor differences

Feature	JS	Python	PCRE	Go	Java
Lookbehind
Variable-length lookbehind
Possessive `*+` `++`
Atomic groups `(?>...)`
Backreferences
Inline flags `(?i)`
Unicode `\p{L}`
Free-spacing `x` flag

The headline: Go’s RE2 trades expressiveness for guaranteed linear-time matching — no backreferences, no lookarounds. Worth knowing before you copy a PCRE pattern into Go.

The traps

Patterns that look right and aren’t:

Catastrophic backtracking. (a+)+$ against aaaaaaaaaaaaaaaaaaaaaaaaaa! can hang for seconds. Use possessive quantifiers, atomic groups, or rewrite to avoid nested repetition.
.* across newlines. . does not match \n by default. Use the s (dotall) flag, or [\s\S]*? for “any character including newlines.”
Anchors with m vs without. Without the m flag, ^ and $ only match the start and end of the whole string, not each line.
Greedy <.*>. Match an HTML tag with <[^>]+>, not <.*> — the latter swallows everything between the first < and the last >.
Replace strings have their own metacharacters. A $ in your replacement string is interpolated. Escape with $$ (most flavors) or \$ (some).
Capture groups change replacement numbering. Adding a (?:...) non-capturing group instead of (...) keeps $1, $2, … referring to what you expect.
HTML / JSON / SQL. Don’t parse them with regex. Use a parser. The “good enough” regex always misses the case that breaks production.

Why (a+)+$ hangs on aaaa!

Test your patterns

The pattern almost always means something subtly different from what you typed. Use a tester before committing it:

regex101.com — flavor switcher, explains your pattern token-by-token, shows backtracking
regexr.com — lighter, pattern library
node -e 'console.log("test".match(/.../))' — fastest local check for JS regex

More cheat sheets

Git Cheat Sheet — branching, undoing, rebasing, the dangerous flags

Regex Cheat Sheet (2026) — Anchors, Quantifiers & Flags

Anchors — where the match has to land

Character classes — what counts as a character

Quantifiers — how many times

Groups & alternation

Lookarounds — match without consuming

Backreferences — match the same thing twice

Flags — the modifiers

Unicode property classes (`u` flag in JS, default in Python/Ruby/PCRE)

Replacement syntax — what goes in the right side of a substitution

Patterns worth keeping

Flavor differences

The traps

Test your patterns

More cheat sheets

Discussion for this post

Comments