Teach-ICT OCR GCSE Computing - representing characters using ASCII and Unicode character sets

THE education site for computer science and ICT

SITE HOME

GCSE Computing

GCSE Computing Home Page

Unit A451 Topics

GCSE Computing Quizzes

CHARACTERS

THEORY

5. Number of Characters

2. Use of binary codes to represent characters

You already know that computers can only process binary numbers i.e. "0" and "1".

While this works perfectly well for the computer, imagine how hard it would be to write your essays or emails using only 0s and 1s. How would you know if you had made a spelling mistake? How could you proofread your work?

So a system had to be developed which would allow you to use the full set of alphabetic characters, both upper and lower case, numbers and symbols.

The system uses codes to represent each character, number and symbol in the chosen language.

The most commonly used system is called the 7 bit standard ASCII code. This has space for 128 symbols, this is enough for standard English, but not enough for other common languages. This will be discussed a little later.

Of course 7 bits is not a convenient 'package' for a computer, so the code is stored as an 8 bit byte with the highest bit set to zero.

In ASCII every character is represented by a binary number, e.g:

The 8 bit ASCII code below represents the upper case letter A:

The 8 bit ASCII code below represents the lower case letter a:

If you wanted to represent the word JOHN in ASCII, it would look like this:

01001010 01001111 01001000 01001110

The word JOHN would take 4 bytes of memory to store.

Extended ASCII

Notice that the highest bit is zero in the examples shown above. This means that standard ASCII only uses the first 128 numbers available out of a possible 256.

Other languages such as German, French, Finnish, Irish, Icelandic etc take advantage of the other 128 spaces to include their own special characters.

This is called the extended ascii character set. Unfortunately there is not a single standard agreed extended ascii code set. There are many of them, as various countries wanted to use the higher bits for their own alphabet.

For example the German umlaut ü is DC Hex or 11011100 in the 'Latin 1 Western European' set which is the default set for Windows.

There are other extended sets as well such as 'Latin 2 Central European' that covers languages such as Croatian, Serbian, Czech.

Final point

The main point to remember is that language and symbols can be represented as binary numbers within a computer. As long as the 'encoding' is recognised by the computer, it will display the correct symbol on the screen.

Challenge see if you can find out one extra fact on this topic that we haven't already told you

Click on this link: ASCII Code