How can I search for a specific character?

Use the search box to find characters by name (e.g., "arrow", "heart", "copyright"). You can also filter by Unicode category (Letters, Symbols, Punctuation, etc.) to browse specific types of characters.

What does the HTML entity code do?

The HTML entity code (like © for ©) can be used in HTML/CSS to display special characters. This is useful when typing the character directly isn't possible or for cross-browser compatibility.

Is Unicode Character Map free to use?

Yes, Unicode Character Map is completely free with no sign-up required. All processing happens in your browser for maximum privacy.

Is my data safe when using Unicode Character Map?

Absolutely. Unicode Character Map runs 100% in your browser. No files are uploaded to any server, your data never leaves your device.

Do I need to install anything to use Unicode Character Map?

No installation needed. Unicode Character Map works directly in any modern web browser, Chrome, Firefox, Safari, or Edge.

Free Character Map

Browse Unicode characters by category, search by name or code point, and copy to clipboard.

No data leaves your device

How to Use

Click a category tab to view characters in that group.
Click any character to view its details and copy options.
Use the search box to find characters by name (e.g., "heart") or hex code (e.g., "2665").
Click Copy Character to copy the selected character to your clipboard.

Frequently Asked Questions

What is a Unicode code point?

A Unicode code point is a unique number assigned to every character in the Unicode standard. It is written in hexadecimal format, often prefixed with U+ (e.g., U+2665 for ♥).

What is an HTML entity?

An HTML entity is a special code that represents a character in HTML. For example, ♥ represents ♥. Entities are useful when you cannot directly type a character.

What is CSS code?

CSS code uses the \\ notation to insert a character using its Unicode code point in stylesheets. For example, .heart::before { content: "\2665"; } inserts ♥.

A short history of Unicode

Before Unicode, every region had its own incompatible character encoding: ASCII for English, the ISO 8859 family for European languages (8859-1 Latin-1, 8859-5 Cyrillic, 8859-6 Arabic), Windows code pages 1252 / 1251 / 1253-1258, multibyte sets for East Asian languages (Shift-JIS for Japanese, Big5 for Traditional Chinese, GB2312 for Simplified Chinese, EUC-KR for Korean). Mismatched encodings produced garbled text known by the Japanese term mojibake (文字化け, "character transformation"), opening a Japanese page in the wrong encoding gave you rows of question marks or random Latin-1 letters.

The work began in 1987 at Xerox. Joe Becker, with Lee Collins and Mark Davis at Apple, started investigating a single universal character set that could replace the patchwork. Becker's August 1988 draft proposal, "Unicode 88," explained: "the name 'Unicode' is intended to suggest a unique, unified, universal encoding." The Unicode Consortium was incorporated in January 1991 and shipped Unicode 1.0 in October that year with about 7,100 characters across 24 scripts.

As of Unicode 17.0 (released 9 September 2025) the standard contains about 159,801 characters across 172 scripts, with code space allocated for 1,112,064 valid code points, meaning Unicode has assigned roughly 14% of its possible space and has decades of headroom. Major recent milestones: Unicode 6.0 (2010) was the first version to formally encode emoji (722 of them, taken from the Japanese carriers); Unicode 17.0 added four new scripts (Sidetic, Tolong Siki, Beria Erfe, Tai Yo) and pushed the total CJK ideograph count over 100,000.

Code points, planes, and encodings

A code point is just a number, written in hexadecimal with a U+ prefix, like U+2665 for ♥. Code points are grouped into 17 planes of 65,536 code points each. Almost everything you've ever read lives on Plane 0, the Basic Multilingual Plane (BMP, U+0000 to U+FFFF). Plane 1 (the Supplementary Multilingual Plane) holds historical scripts (Linear B, Egyptian hieroglyphs, Cuneiform), musical notation, and almost all emoji. Planes 2 and 3 are CJK ideograph extensions. Planes 4-13 are unassigned, reserved for the future. Plane 14 carries variation selectors and emoji modifiers. Planes 15 and 16 are private-use areas where fonts and apps assign their own meanings.

A code point is just a number; an encoding is how that number gets stored as bytes. Unicode defines three:

UTF-8: variable width, 1 to 4 bytes per character. Designed by Ken Thompson and Rob Pike at Bell Labs in 1992 (sketched on a New Jersey diner placemat). The first 128 code points (ASCII) take exactly 1 byte with the same binary value as ASCII, so a pure-ASCII file is already a valid UTF-8 file. As of January 2026, UTF-8 is used by roughly 98.9% of websites, the WHATWG-recommended encoding and the default for new text protocols.
UTF-16: variable width, 2 or 4 bytes. BMP characters take 2 bytes; characters in supplementary planes take 4 bytes via surrogate pairs (a high surrogate U+D800-U+DBFF plus a low surrogate U+DC00-U+DFFF). Used internally by Windows APIs, Java, JavaScript (string .length counts UTF-16 code units, which is why an emoji often "counts as 2"), and Qt. Less than 0.004% of public web pages use it as transport.
UTF-32: fixed width, 4 bytes per code point. Simple to index but space-inefficient. Used internally by some Unix runtimes for direct code-point indexing; rare on disk or wire.

The 25 invisible whitespace characters

Unicode formally tags exactly 25 characters with the White_Space=yes property: regular space (U+0020), tab, line feed, carriage return, no-break space (U+00A0, the famous one that looks identical to a regular space but won't break across lines), the typographic widths in U+2000-U+200A, the line / paragraph separators (U+2028 / U+2029), the narrow no-break space common in French typography (U+202F), the medium mathematical space (U+205F), and the full-width ideographic space (U+3000) used in CJK text.

Several characters look invisible but are not classified as whitespace and behave differently from a regular space:

U+200B Zero-Width Space: allows a line break with no visible gap; not whitespace by Unicode classification.
U+200D Zero-Width Joiner: the glue inside multi-character emoji like family or profession sequences.
U+200C Zero-Width Non-Joiner: controls ligature joining.
U+00AD Soft Hyphen: invisible until the renderer breaks the line.
U+FEFF Byte Order Mark: at the start of a file declares endianness; in the middle, an invisible no-break space. Excel's UTF-8 CSV exports prepend one, which often shows up in downstream tools as an unexpected leading character on the first column header.

These invisible characters are routinely the cause of "why won't this string match?" debugging sessions, paste any character into a character map's search and it will tell you the actual code point, so you can confirm whether you're looking at a smart quote masquerading as a straight one, or an NBSP where a regular space should be.

Useful character ranges

Block	Range	Examples
Latin-1 Supplement	U+0080-U+00FF	à ñ ü © ® ¥ § ° ¶
Greek	U+0370-U+03FF	α β γ π Σ Ω
Cyrillic	U+0400-U+04FF	Russian / Ukrainian / Bulgarian etc.
General Punctuation	U+2000-U+206F	-, … " " ' ' • † NBSP ZWSP
Currency Symbols	U+20A0-U+20CF	€ £ ¥ ₩ ₽ ₹ ₿
Letterlike Symbols	U+2100-U+214F	™ ℠ № ℃ ℉ ℗
Arrows	U+2190-U+21FF	← → ↑ ↓ ↔ ⇒ ⇐
Mathematical Operators	U+2200-U+22FF	∑ ∫ ∞ √ ≠ ≤ ≥ ± ∂ ∇ ∈ ∪ ∩
Box Drawing	U+2500-U+257F	─ │ ┌ ┐ └ ┘ ├ ┤ ┬ ┴ ┼ ═ ║ ╔ ╗
Math Alphanumerics	U+1D400-U+1D7FF	"Fancy text" generators (𝓗𝓮𝓵𝓵𝓸) draw from here.

Special characters in everyday writing

The "I just need to type one symbol" use case, quick reference of what this tool exists to deliver in two clicks:

Em dash - U+2014 (—), sentence-level break.
En dash - U+2013 (–)) ranges (1950-1975) and pairings (Boston-Hartford).
Ellipsis … U+2026 (…), three dots as a single character.
Smart quotes: opening " U+201C, closing " U+201D, opening ' U+2018, closing ' U+2019.
Non-breaking space U+00A0 ( ), keeps "100 km" together.
Section § U+00A7, Pilcrow ¶ U+00B6, Degree ° U+00B0.
Multiplication × U+00D7, Division ÷ U+00F7, neither is the letter x or a slash.

When you'd reach for a character map

Typing accented letters without the right keyboard layout: résumé, jalapeño, fiancée, naïve.
Math and science: pasting ∑, ∫, ≠, π, ±, ∞, μ, Ω into a doc without launching the equation editor.
Currency: the symbol you need is rarely on your keyboard. Euro €, yen ¥, peso ₱, rupee ₹.
Punctuation in legal and academic writing: em dashes, smart quotes, the section sign §, the dagger †.
Fancy display text for social-media bios and branding: Mathematical Alphanumeric Symbols (U+1D400-U+1D7FF) let you stylise text without using an image.
CLI and TUI design: Box Drawing characters for ASCII-art borders, ncurses programs, and README diagrams.
Debugging encoding bugs: paste a character to see its actual code point and confirm whether you've got a smart quote masquerading as a straight one.

Security: the homograph problem

Many Unicode characters look identical across scripts. The Cyrillic lowercase "а" (U+0430) is visually indistinguishable from the Latin "a" (U+0061). Attackers register internationalised domain names that look like legitimate ones (for instance an "apple.com" with a Cyrillic а in place of the Latin a) and use them for phishing. A 2017 attack on adoḅe.com used the dotted-below ḅ (U+1E05) to deliver malware. Modern browsers mitigate this with restrictive script-mixing rules, falling back to the ASCII Punycode form (xn--…) when a domain mixes scripts; Safari is particularly conservative. The same lookalike property that makes Unicode rich for human writing makes it dangerous in domains, and a character map is one way to confirm the actual code point of every character at a glance.

Related Tools

Unicode Converter Text Case Converter HTML Entities