skip to content
JUN 2026 No. 26
Daily Upkeep
An entry 9 min read

Regex Cheat Sheet (2026) — Anchors, Quantifiers & Flags

Regex in one tab — anchors, character classes, quantifiers, groups, lookarounds, flags, and the patterns worth keeping. PCRE, JS, Python, Go, Java.

Regex cheat sheet — letterpress specimen card with anchors, quantifiers, lookarounds, and flag fragments.

Regex is unmemorable on purpose. The syntax is dense, the flavors disagree, and the patterns you wrote last quarter look like keysmash today. This is the one tab worth keeping open.

Examples below are PCRE-flavored unless noted. JavaScript, Python, Go, and Rust all match the basics; the Flavor differences section calls out where they diverge.

Anatomy of a pattern

Regex pattern anatomy The pattern caret open-paren question less-than area greater-than backslash-d three close-brace close-paren hyphen open-paren backslash-d four close-brace close-paren dollar, with each token labeled below. ^(?<area>\d{3})-(\d{4})$ Anchor · start Named capture “area” Literal hyphen Capture group Anchor · end Reads: exactly 3 digits, captured as “area” Reads: exactly 4 digits, group #1
Every regex is a string of these — anchors, escapes, groups, quantifiers, anchors. Read it left to right; the rest is vocabulary.

Anchors — where the match has to land

PatternMatches
^Start of string (or line, with m flag)
$End of string (or line, with m flag)
\bWord boundary — the spot between \w and \W
\BNon-word boundary
\AAbsolute start of input (ignores m flag)
\zAbsolute end of input (ignores m flag) — not in JS

Character classes — what counts as a character

PatternMatches
.Any character except newline (use s flag to include newlines)
\d / \DDigit [0-9] / non-digit
\w / \WWord character [A-Za-z0-9_] / non-word
\s / \SWhitespace / non-whitespace
[abc]One of a, b, c
[^abc]Anything but a, b, c
[a-z]One character in the range
[a-zA-Z0-9]Letters or digits
[\d-]A digit or a literal - (place - first or last to make it literal)

\w is not Unicode-aware in most flavors. café has \w\w\w\W because é isn’t ASCII. In JS, opt in with the u flag and use \p{L} from the Unicode property classes section below. In Python, \w is Unicode-aware by default; pass re.ASCII to opt out.

Quantifiers — how many times

PatternMatches
?0 or 1 (optional)
*0 or more
+1 or more
{n}Exactly n
{n,}n or more
{n,m}Between n and m (inclusive)
*? +? ?? {n,m}?Lazy — match the shortest possible string
*+ ++ ?+ {n,m}+Possessive — no backtracking (PCRE/Java only)

Default quantifiers are greedy — they match as much as possible, then backtrack. <.*> against <a><b> matches the entire <a><b>, not just <a>. Add ? to go lazy: <.*?> matches <a> first.

Greedy · <.*>

<a><b> Match · 1 of 1

Lazy · <.*?>

<a><b> Matches · 2 of 2

Groups & alternation

PatternMatches
(abc)Capturing group — referenced as \1 or $1
(?:abc)Non-capturing group — same grouping, no capture overhead
(?<name>abc)Named capture group — referenced as \k<name> or $<name>
a|ba or b
(cat|dog)s?cat, cats, dog, dogs

Lookarounds — match without consuming

PatternMeans
(?=...)Lookahead: followed by ...
(?!...)Negative lookahead: not followed by ...
(?<=...)Lookbehind: preceded by ...
(?<!...)Negative lookbehind: not preceded by ...
\d+(?=px) — digits that come right before "px" → 12 in "12px"
(?<=\$)\d+ — digits that come right after "$" → 99 in "$99"
\b(?!the\b)\w+ — words that aren't "the" → fox, jumps, ...

Lookbehind used to be limited to fixed-width patterns. Modern PCRE2, .NET, Python 3.7+, and JavaScript (since 2018) all support variable-length lookbehind. Older Java and Go’s RE2 still don’t support lookbehind at all.

Backreferences — match the same thing twice

PatternMeans
\1, \2, …Re-match group 1, 2, …
\k<name>Re-match named group
(['"]).+?\1Match a quoted string with matching quote types
\b(\w+)\s+\1\bFind duplicated words (“the the”)

Flags — the modifiers

FlagEffect
iCase-insensitive
m^ and $ match start/end of each line
s (a.k.a. dotall). matches newlines
g (JS)Global — find all, not just first
u (JS)Unicode mode — enables \p{...}, surrogate pair handling
x (PCRE/Python/Ruby)Free-spacing — whitespace and # comments ignored
// JS — flags go after the closing slash
/^foo$/im.test("FOO") // true
"abc abc abc".match(/abc/g) // ["abc", "abc", "abc"]
# Python — flags as the second arg or inline (?i)(?m)(?s)
re.match(r"^foo$", "FOO", re.I | re.M)

Unicode property classes (u flag in JS, default in Python/Ruby/PCRE)

PatternMatches
\p{L}Any letter (Latin, Greek, CJK, …)
\p{N}Any number
\p{P}Any punctuation
\p{Z}Any separator (whitespace + line/paragraph)
\p{Lu} / \p{Ll}Uppercase / lowercase letter
\p{Script=Greek}Letters from a specific script
\P{L}The negation — anything that isn’t a letter

Replacement syntax — what goes in the right side of a substitution

TokenMeans
$& (or \0)The full match
$1, $2, …Capture group 1, 2, …
$<name> (or \k<name>)Named group
$$A literal $
\u, \l (Vim, sed)Uppercase / lowercase the next character
s/(\w+) (\w+)/$2 $1/ — swap two words
s/(?<area>\d{3})/(\1)/ — wrap a 3-digit run in parens

Patterns worth keeping

GoalPattern
Trim leading/trailing whitespace^\s+|\s+$
Collapse internal whitespace\s+" "
Match an integer (signed)^-?\d+$
Match a decimal^-?\d+(\.\d+)?$
Hex color^#(?:[0-9a-fA-F]{3}){1,2}$
ISO date (YYYY-MM-DD)^\d{4}-\d{2}-\d{2}$
ISO time (HH:MM , 24h)^([01]\d|2[0-3]):[0-5]\d$
Slug (kebab-case)^[a-z0-9]+(-[a-z0-9]+)*$
URL (loose, good-enough)https?://[^\s)]+
IPv4\b(?:\d{1,3}\.){3}\d{1,3}\b (also matches 999.999.999.999)
UUID v4^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$
Strip ANSI escape codes\x1b\[[0-9;]*m

There is no correct regex for email addresses. The official one (RFC 5322) is hundreds of characters and accepts addresses no real system delivers to. Use ^[^@\s]+@[^@\s]+\.[^@\s]+$ as a sanity check, then send a confirmation email — that is how every production system actually validates.

Flavor differences

FeatureJSPythonPCREGoJava
Lookbehind
Variable-length lookbehind
Possessive *+ ++
Atomic groups (?>...)
Backreferences
Inline flags (?i)
Unicode \p{L}
Free-spacing x flag

The headline: Go’s RE2 trades expressiveness for guaranteed linear-time matching — no backreferences, no lookarounds. Worth knowing before you copy a PCRE pattern into Go.

The traps

Patterns that look right and aren’t:

  • Catastrophic backtracking. (a+)+$ against aaaaaaaaaaaaaaaaaaaaaaaaaa! can hang for seconds. Use possessive quantifiers, atomic groups, or rewrite to avoid nested repetition.
  • .* across newlines. . does not match \n by default. Use the s (dotall) flag, or [\s\S]*? for “any character including newlines.”
  • Anchors with m vs without. Without the m flag, ^ and $ only match the start and end of the whole string, not each line.
  • Greedy <.*>. Match an HTML tag with <[^>]+>, not <.*> — the latter swallows everything between the first < and the last >.
  • Replace strings have their own metacharacters. A $ in your replacement string is interpolated. Escape with $$ (most flavors) or \$ (some).
  • Capture groups change replacement numbering. Adding a (?:...) non-capturing group instead of (...) keeps $1, $2, … referring to what you expect.
  • HTML / JSON / SQL. Don’t parse them with regex. Use a parser. The “good enough” regex always misses the case that breaks production.

Why (a+)+$ hangs on aaaa!

attempt #1 …2⁴ = 16 paths none reach the trailing ! · all backtrack

Test your patterns

The pattern almost always means something subtly different from what you typed. Use a tester before committing it:

  • regex101.com — flavor switcher, explains your pattern token-by-token, shows backtracking
  • regexr.com — lighter, pattern library
  • node -e 'console.log("test".match(/.../))' — fastest local check for JS regex

More cheat sheets

Discussion for this post

Letters, marginalia, and dispatches from fellow readers.

0 likes · 0 favorites

Comments

Be the first to drop a thought.

    Likes, favorites, and comments are available for signed-in readers.