Understanding character encoding is crucial to everyone who’s writing on computer, especially in foreign languages, using emojis or any other special characters. In this aricle you’ll learn the basics of Unicode UTF character encoding, code points, code units and we’ll look at ASCII art as well.
Without languages, we won’t be writing this and you won’t be here reading it. Language is how human beings can translate what exists in their mental space to one another. There wouldn’t be communication without language and there wouldn’t be communication with language if the other person doesn’t understand what you are saying. This applies to computers too.
Every developer can create their language for their software. For instance, if your phone runs on a developer’s software, if you text another person whose phones on another developer’s software, there won’t be communication. We are sure you are familiar with someone sending you an emoji and what shows on your device is ‘??’. It is to curb the happening of situations like this that developers have begun using and stuck with Unicode.
Unicode is that code that allows your computer to save numbers and alphabets that is readable by humans into numbers. This is necessary so your device can be able to exchange information and show the information exchanged without bringing the annoying ‘??’.
Now we are going to go deeper into Unicode, starting with what Unicode means.
What is Unicode?
|✔||✔||U+2714||Heavy check mark|
|‚||‚||U+201A||Single low-9 quotation mark|
|„||„||U+201E||Double low-9 quotation mark|
Character Encoding: A Beginner’s Guide
There is no Unicode without character encoding. Character encoding is the attachment of a number to a character. Unicode is a worldwide accepted character encoding. For instance, the alphabet B could be attached to the number 6, could turn to something like a=12, s=15 and so on, etc.
As a kind of character encoding, Unicode has over 128,000 encoded characters. Also, it has different kinds of character encoding format which are called Unicode Transformation Format. These formats are;
- UTF-8. This format varies in the number it uses to encode characters, depending on the characters. For English characters, one byte or 8 bits are used to encode. For other characters, it uses a series of bits. This format is very popular on the internet and has found a home in email systems.
- UTF-16. This format uses 16 bits or 2 bytes to encode popularly used characters. Should there be additional characters to be encoded, they would be shown as a pair of 16 bit or 2 bytes numbers.
- UTF-32. Only four bytes or 32 bits are used in encoding characters in this format. This format came into existence as advancement in technology exposed the limitations of the 16-bit format. The most interesting thing about this format is that it doesn’t need to represent increased characters as a pair of 32-bit numbers. Rather, it is perfectly able of representing any Unicode character as a 32-bit number.
Why was Unicode created?
The American Standard Code for Information Interchange (ASCII) was the initial popular encoding method but it had character limitations with only 128 character definitions being used. It was very suitable for English characters but not so much for the characters of other languages of the world. As a result, the developers from other parts of the world started creating their encoding method to suit their own language.
The result was a jungle of encoding methods with limited communication outside various regions. Thus, Unicode was created as a form of compromise among developers of the world.
Why Should You Use Unicode?
Unicode is global and as such supports many languages. Different languages could be combined at a go unlike before where it was one language at a time. Unicode has found a home among many computer tech giants like Apple, Microsoft, HP and so on. Also, it is the character encoding scheme in popular browsers like Firefox, Google Chrome etc.
Using Unicode increases your chances of information singularity across all known devices.
What are Code Points?
Code Point is the worth that is ascribed to a character in a Unicode character encoding scheme. Code points are broken down into 17 different sections referred to as planes which carry up to 65,536 code points. These planes are depicted with the numerical value of 0 to 16 with the 0 plane carrying the oft-repeated numbers.
What are Code Units?
Note that code units can be changed to code points and the character encoding methods contains code units. These units show where a character is on a plane.
We suggest that you not say it out loud or post that ASCII art is dead. Microsoft did the same in 1998 and Bill Gates was called an overzealous man desperate to force his Microsoft fonts in people’s throats. ASCII art is still very much in use today and it has influenced the creation of modern computer-generated images. Also, the current emojis are offspring of the old ASCII emoticons. In fact, some devices still retain the old ASCII emoticons.
/ _,-\ ()()_/:)
\ / , ` `|
'-..-| \-.,___, /
\ `-.__/ /
Visit TextPaint to draw with characters on a canvas!
ASCII art wasn’t originally created for art purposes but to serve a function similar to what modern-day printers provide. Creativity led to the birth of ASCII and its influence doesn’t seem to be disappearing soon. Some developers have even developed an app that shows the ASCII versions of modern art.
Text created with the Big Text Converter.
Arthur Evans is a veteran freelance writer, proof-reader and editor from the UK. Arthur also provides biostatistics help. He is excellent at his job, which is to liaise with students, collect information on their thesis and dissertation requirements. After getting what he needs from students, he then helps them write and deliver ingenious work, well within the deadline.