Unicode character coding description

Unicode is the name of the national languages character coding (Czech, Russian, Turkish, Chinese ..) into the computer binary form.
All characters used to be stored into one byte originally, therefore it was possible to code a total of 256 different characters. The first 128 characters are standardized as so called ASCII table, where all the lower case letters (a-z), capital letters (A-Z) without diacritic and numbers (0, 1-9) are defined, including the special characters like comma, semicolon, colon and many others.

The remaining 128 characters can be used for coding the special national language characters. There are for example the characters á, č, ü, etc., used for Central European languages and ф, и, б, ъ, etc. characters used in the Russian. Unfortunatelly the number of these special national language characters is too high, therefore all of them cannot be coded using the remaining 128 resources. That is why the special code pages has been created for different national language groups. For example, the Win-1250 codepage used for the Central European languages, the Win-1251 codepage containing all Cyrillic alphabet characters, etc. This way it was possible to save international language texts, but it was not possible to combine the characters of different codepages into one text file.

The special language characters coding was finally solved by the intoduction of the Unicode character coding (for more details please visit the server of the Unicode association at This coding system is able to code all the special worldwide language chracters correctly. The solution is based on the fact, that a character is no longer stored into 1 byte (only 256 possible options), but it is stored into 2 bytes (i.e. 65536 possible options). This coding system is branded UTF-16.

The main advantage of UTF-16 is a very simple management of all possible characters, the disadvantage is the double size and ASCII table incompatibility. The incompatibility problem is considerable when text files are being saved. Therefore the alternative Unicode coding system has been created, which is operating with variable lenght of the saved character. The ASCII table characters are stored into 1 byte, while the non-ASCII characters are stored into 2 or more bytes (one byte contains the information whether another byte is following). This coding system is branded UTF-8. It is mainly used for text (XML, HTM) files. While working with such texts, the characters are transformed into UTF-16 in the computer´s operating memomry, making it work faster this way.

The UTF-16 and UTF-8 character coding is used by the PROMOTIC system since version 7, therefore it is possible to create multilingual applications without the need of character codepage swithing or using special language versions of OS Windows. The UTF-16 coding is used while the application is running (panel texts, in scripts, etc.), while the UTF-8 coding is used for the text files (e.g. XML text files for Macro expression $.text).

The transfer to Unicode is only possible in OS Windows 2000 and higher. Therefore for older OS Windows 98/Me the Promotic7A version is available. This version is identical as the Promotic7, but UTF-16 text coding is not used. That is why creating multilingual applications (for example Russian) is not that easy.

© MICROSYS, spol. s r. o.Tavičská 845/21 703 00 Ostrava-Vítkovice