Join the AI Workshop to learn more about AI and how it can be applied to web development. Next cohort February 1st, 2026
The AI-first Web Development BOOTCAMP cohort starts February 24th, 2026. 10 weeks of intensive training and hands-on projects.
Unicode is an industry standard for consistent encoding of written text. It aims to provide a unique number to identify every character for every language, on any platform.
Code Points
Unicode maps every character to a specific code, called a code point. A code point takes the form of U+<hex-code>, ranging from U+0000 to U+10FFFF.
For example: U+004F represents the letter “O”.
Character Encodings
Unicode defines different character encodings:
- UTF-8: Variable width (1-4 bytes), most popular on the web
- UTF-16: Variable width (2-4 bytes), used internally by JavaScript
- UTF-32: Fixed width (4 bytes)
UTF-8
UTF-8 is the most popular encoding, used on over 90% of web pages. It’s backwards compatible with ASCII—the first 128 characters are identical.
| Bytes | Range |
|---|---|
| 1 | U+0000 - U+007F |
| 2 | U+0080 - U+07FF |
| 3 | U+0800 - U+FFFF |
| 4 | U+10000 - U+10FFFF |
Planes
Unicode organizes characters into 17 planes:
- Plane 0 (BMP): U+0000 - U+FFFF, contains most modern characters
- Planes 1-16 (Astral planes): U+10000 and above
Characters in astral planes are called astral code points.
Working with Unicode in JavaScript
Creating Strings from Code Points
String.fromCodePoint(70, 108, 97, 118, 105, 111) // 'Flavio'
Getting Code Points
'A'.codePointAt(0) // 65
'🐶'.codePointAt(0) // 128054
Unicode Escape Sequences
'\u0041' // 'A'
'\u{1F436}' // '🐶' (ES6 syntax for astral characters)
Combining Characters
Unicode allows combining characters to form graphemes:
'e\u0301' // 'é' (e + combining acute accent)
'\u00E9' // 'é' (precomposed form)
Both represent the same visual character but are different strings.
Normalization
Because characters can be represented multiple ways, use normalize() for comparisons:
const a = '\u00E9' // é (precomposed)
const b = 'e\u0301' // é (combining)
a === b // false
a.normalize() === b.normalize() // true
String Length with Unicode
Be careful with string length for astral characters:
'🐶'.length // 2 (surrogate pair)
[...'🐶'].length // 1 (proper character count)
Emojis
Emojis are Unicode astral plane characters:
'🐶' // U+1F436
'👨👩👧' // Multiple code points combined
The family emoji is actually multiple code points joined with Zero Width Joiner (U+200D).