Character Encoding Comparison

Understanding the differences between ASCII, Extended ASCII, and Unicode

Overview

1963

The original 7-bit encoding standard that defined 128 characters for electronic communication.

1981

8-bit extensions that added 128 more characters to support additional languages and symbols.

1991

A comprehensive encoding standard that aims to include every character from all human languages.

Feature	ASCII	Extended ASCII	Unicode
Bit Depth	7-bit	8-bit	Variable (8 to 32 bits)
Character Range	0-127	0-255	0-1,114,111
Language Support	English only	Limited Western European	All world languages
Control Characters	0-31, 127	0-31, 127-159	Various ranges
Common Implementations	Universally standardized	CP437, ISO-8859, Windows-1252	UTF-8, UTF-16, UTF-32

Encoding	Bytes Per Character	Advantages	Disadvantages
UTF-8	1-4 bytes	Backward compatible with ASCII Efficient for English text Most popular on the web	Variable width makes indexing complex Asian languages require more bytes
UTF-16	2 or 4 bytes	Used by JavaScript, Java, Windows Efficient for most languages	Not backward compatible with ASCII Surrogate pairs complicate processing
UTF-32	4 bytes	Fixed width for easy indexing Simple processing logic	Very inefficient storage Rarely used in practice

When text is decoded using an incorrect character encoding, resulting in garbage characters.

A special invisible character at the beginning of a text file that indicates the encoding and endianness.