Regular Expressions: Unicode Property Escapes

Join the AI Workshop and learn to build real-world apps with AI. A hands-on, practical program to level up your skills.

As we saw above, in a regular expression pattern you can use \d to match any digit, \s to match any whitespace character, \w to match any alphanumeric character, and so on.

Unicode property escapes extend this concept to all Unicode characters introducing \p{} and its negation \P{}.

Any unicode character has a set of properties. For example Script determines the language family, ASCII is a boolean that’s true for ASCII characters, and so on. You can put this property in the curly braces, and the regex will check for that to be true:

/^\p{ASCII}+$/u.test('abc')   //✅
/^\p{ASCII}+$/u.test('ABC@')  //✅
/^\p{ASCII}+$/u.test('ABC🙃') //❌

ASCII_Hex_Digit is another boolean property, that checks if the string only contains valid hexadecimal digits:

/^\p{ASCII_Hex_Digit}+$/u.test('0123456789ABCDEF') //✅
/^\p{ASCII_Hex_Digit}+$/u.test('h')                //❌

There are many other boolean properties, which you check by adding their name in the curly braces, including Uppercase, Lowercase, White_Space, Alphabetic, Emoji and more:

/^\p{Lowercase}$/u.test('h') //✅
/^\p{Uppercase}$/u.test('H') //✅

/^\p{Emoji}+$/u.test('H')   //❌
/^\p{Emoji}+$/u.test('🙃🙃') //✅

In addition to those binary properties, you can check any of the Unicode character properties to match a specific value. In this example, I check if the string is written in the greek or latin alphabet:

/^\p{Script=Greek}+$/u.test('ελληνικά') //✅
/^\p{Script=Latin}+$/u.test('hey') //✅

Read more about all the properties you can use directly on the TC39 proposal.

Lessons in this unit:

0:	Introduction
1:	Introduction
2:	Anchoring
3:	Match Items in Ranges
4:	Matching a Range Item Multiple Times
5:	Negating a Pattern
6:	Meta Characters
7:	Regular Expressions Choices
8:	Quantifiers
9:	Optional Items
10:	Groups
11:	Capturing Groups
12:	Using match and exec Without Groups
13:	Noncapturing Groups
14:	Flags
15:	Inspecting a Regex
16:	Escaping
17:	String Boundaries
18:	Replacing
19:	Greediness
20:	Lookaheads
21:	Lookbehinds
22:	Unicode
23:	▶︎ Unicode Property Escapes
24:	Examples