Unicode is the standard for defining computer characters. It removes the limitations and conflicts of traditional encodings. With 137,929 characters, it has enough capacity to completely cover the world’s current and historic languages. It also contains symbols and special characters like emojis.
UTF-8 is a variable-length encoding format where the first 128 characters (1st octet) are the original ASCII character set – bare-bones text, numbers, and simple punctuation without any support for foreign language or special characters. All characters in the global Unicode character set are encoded using one to four 8-bit bytes (octet). UTF-8 is the dominant character encoding used on the Web, in email, and with XML/HTML.
Unicode characters are at the heart of everything you read. Visual effects like typeface, font size and color embellish the characters. Line and paragraph spacing, tables and graphics make reading easier. Without these added features, text looks like simple typewriter characters.
Databases and spreadsheets output text. This data passes through systems where it’s formatted as reports and statements. In addition, text is extracted form databases to generate keywords, abstracts and excerpts. The Unicode format ensures that there are no conflicts in these operations. For example, Unicode ensures that content management systems are free of conflicts between overlapping language character sets.
More on Unicode
Visit the Unicode Organization
Fun Fact – Emojis
Everyone uses emojis. The Unicode Consortium approves and manages these popular images. They represent things like faces, weather, emotions, animals and languages. They also express love, thanks and congratulations. In fact, more emojis are added all the time.