Computer Science
Character Coding Schemes
Character coding schemes use binary patterns to represent character data (text).
A common code in all computers ensures that information can easily be transferred between machines.
American Standard Code For Information Interchange (ASCII)
7 bits are used allowing 128 different characters to be represented. Each character in the standard set is assigned a number. The 7 bit binary representation of that number is used to represent that character.
The first 32 characters are control codes such as TAB or Line Feed. Digits, lower and upper case letters and standard symbols are represented. The extended ASCII uses the eighth bit and codes more characters and symbols.
A number will be stored differently depending on whether it is being displayed or used for calculations. The ASCII representation of the character data "23" is not the same as the pure binary pattern for this number using the same number of bits. The representation of this data within the computer system depends on the context and use of the data.
ASCII Table
It's worth looking carefully at the ASCII table. Look, for example at the codes for upper case letters and compare them with corresponding codes for lower case. There is a difference of only one bit. Another nice feature of the codes chosen is that you only need to change 2 bits to convert the character code of a number into the binary representation of that number.
Decimal | Binary | Hexadecimal | Character |
---|---|---|---|
0 | 0000000 | 0 | NUL (null) |
1 | 0000001 | 1 | SOH (start of heading) |
2 | 0000010 | 2 | STX (start of text) |
3 | 0000011 | 3 | ETX (end of text) |
4 | 0000100 | 4 | EOT (end of transmission) |
5 | 0000101 | 5 | ENQ (enquiry) |
6 | 0000110 | 6 | ACK (acknowledge) |
7 | 0000111 | 7 | BEL (bel) |
8 | 0001000 | 8 | BS (backspace) |
9 | 0001001 | 9 | TAB (horizontal tab) |
10 | 0001010 | A | LF (NL line feed, new line) |
11 | 0001011 | B | VT (vertical tab) |
12 | 0001100 | C | FF (NP form feed, new page) |
13 | 0001101 | D | CR (carriage return) |
14 | 0001110 | E | SO (shift out) |
15 | 0001111 | F | SI (shift in) |
16 | 0010000 | 10 | DLE (data link exchange) |
17 | 0010001 | 11 | DC1 (device control 1) |
18 | 0010010 | 12 | DC2 (device control 2) |
19 | 0010011 | 13 | DC3 (device control 3) |
20 | 0010100 | 14 | DC4 (device control 4) |
21 | 0010101 | 15 | NAK (negative acknowledge) |
22 | 0010110 | 16 | SYN (synchronous idle) |
23 | 0010111 | 17 | ETB (end of trans. block) |
24 | 0011000 | 18 | CAN (cancel) |
25 | 0011001 | 19 | EM (end of medium) |
26 | 0011010 | 1A | SUB (substitute) |
27 | 0011011 | 1B | ESC (escape) |
28 | 0011100 | 1C | FS (file separator) |
29 | 0011101 | 1D | GS (group separator) |
30 | 0011110 | 1E | RS (record separator) |
31 | 0011111 | 1F | US (unit separator) |
32 | 0100000 | 20 | SPACE |
33 | 0100001 | 21 | ! |
34 | 0100010 | 22 | " |
35 | 0100011 | 23 | # |
36 | 0100100 | 24 | $ |
37 | 0100101 | 25 | % |
38 | 0100110 | 26 | & |
39 | 0100111 | 27 | ' |
40 | 0101000 | 28 | ( |
41 | 0101001 | 29 | ) |
42 | 0101010 | 2A | * |
43 | 0101011 | 2B | + |
44 | 0101100 | 2C | , |
45 | 0101101 | 2D | - |
46 | 0101110 | 2E | . |
47 | 0101111 | 2F | / |
48 | 0110000 | 30 | 0 |
49 | 0110001 | 31 | 1 |
50 | 0110010 | 32 | 2 |
51 | 0110011 | 33 | 3 |
52 | 0110100 | 34 | 4 |
53 | 0110101 | 35 | 5 |
54 | 0110110 | 36 | 6 |
55 | 0110111 | 37 | 7 |
56 | 0111000 | 38 | 8 |
57 | 0111001 | 39 | 9 |
58 | 0111010 | 3A | : |
59 | 0111011 | 3B | ; |
60 | 0111100 | 3C | < |
61 | 0111101 | 3D | = |
62 | 0111110 | 3E | > |
63 | 0111111 | 3F | ? |
64 | 1000000 | 40 | @ |
65 | 1000001 | 41 | A |
66 | 1000010 | 42 | B |
67 | 1000011 | 43 | C |
68 | 1000100 | 44 | D |
69 | 1000101 | 45 | E |
70 | 1000110 | 46 | F |
71 | 1000111 | 47 | G |
72 | 1001000 | 48 | H |
73 | 1001001 | 49 | I |
74 | 1001010 | 4A | J |
75 | 1001011 | 4B | K |
76 | 1001100 | 4C | L |
77 | 1001101 | 4D | M |
78 | 1001110 | 4E | N |
79 | 1001111 | 4F | O |
80 | 1010000 | 50 | P |
81 | 1010001 | 51 | Q |
82 | 1010010 | 52 | R |
83 | 1010011 | 53 | S |
84 | 1010100 | 54 | T |
85 | 1010101 | 55 | U |
86 | 1010110 | 56 | V |
87 | 1010111 | 57 | W |
88 | 1011000 | 58 | X |
89 | 1011001 | 59 | Y |
90 | 1011010 | 5A | Z |
91 | 1011011 | 5B | [ |
92 | 1011100 | 5C | \ |
93 | 1011101 | 5D | ] |
94 | 1011110 | 5E | ^ |
95 | 1011111 | 5F | _ |
96 | 1100000 | 60 | ` |
97 | 1100001 | 61 | a |
98 | 1100010 | 62 | b |
99 | 1100011 | 63 | c |
100 | 1100100 | 64 | d |
101 | 1100101 | 65 | e |
102 | 1100110 | 66 | f |
103 | 1100111 | 67 | g |
104 | 1101000 | 68 | h |
105 | 1101001 | 69 | i |
106 | 1101010 | 6A | j |
107 | 1101011 | 6B | k |
108 | 1101100 | 6C | l |
109 | 1101101 | 6D | m |
110 | 1101110 | 6E | n |
111 | 1101111 | 6F | o |
112 | 1110000 | 70 | p |
113 | 1110001 | 71 | q |
114 | 1110010 | 72 | r |
115 | 1110011 | 73 | s |
116 | 1110100 | 74 | t |
117 | 1110101 | 75 | u |
118 | 1110110 | 76 | v |
119 | 1110111 | 77 | w |
120 | 1111000 | 78 | x |
121 | 1111001 | 79 | y |
122 | 1111010 | 7A | z |
123 | 1111011 | 7B | { |
124 | 1111100 | 7C | | |
125 | 1111101 | 7D | } |
126 | 1111110 | 7E | ~ |
127 | 1111111 | 7F |
Unicode
The ASCII codes can now be considered to be a subset of unicode. In fact, the first 128 characters of both are the same. Many common character encoding systems, like UTF-8, are backwards-compatible with ASCII.
Unicode is so named because of the intention that it describes a universal character set. That is, it contains all possible characters for all languages and scripts. There are over 10000 characters in the Unicode character set. It uses 16 bits to represent each character.
The Unicode Consortium manages the standards for Unicode including the addition of new symbols where necessary.