BSG utilizes HTTP cookies (and similar or complementary technologies) to 1) make this website safe, functional, and accessible (through the use of mandatory cookies) and 2) understand how you use our website (through the use of optional cookies) in order to improve your experience and to provide you with personalized content.

The information in the cookie text files may be related to your personal preferences or your device and is intended to make the site operate according to your expectations. The information contained in cookies does not usually identify your identity directly but is helpful in providing you with a more personalized user experience.

In accordance with the requirements of the General Data Protection Regulation (GDPR) privacy and security law that governs how the personal data of individuals in the EU may be processed and transferred, we provide you the possibility to prohibit the use of certain types of cookies when you use our website.

Read our Cookie Notice and the Privacy Policy for detailed information on how BGS collects and uses cookies. Please note that prohibiting the use of certain types of cookies may affect your interaction with the website and limit the accessibility of services we offer you. Choose the appropriate category below to learn more and to disable cookies.

Accept All cookies*

*Recommended for comfortable use of the site

Accept only necessary cookies

Accept only selected cookies

Necessary cookies

Social media

Analytics

Marketing

16-bit Unicode

What is 16-bit Unicode?

16-bit Unicode (UTF-16) or Unicode Transformation Format is a way to encode data with capability of 1,112,064 possible characters.

UTF-16 (Unicode Transformation Format), also referred to as 16-bit Unicode, is the encoding mechanism utilized to represent all 1,112,064 possible characters in the Unicode character set.

There are three alternative encoding algorithms that accompany the fundamental 16-bit sequence model. These encoding approaches enable the conversion of code points to 8-bit or octet sequences. At the outset, Unicode was devised as a 16-bit encoding to encompass all contemporary scripts.

Upon further reflection however, it became clear that a larger number of bits was required for most users because of the incorporation of approximately 14,500 composite characters to be compatible with existing sets. This led to the invention of UTF-16. With the help of UTF-16, up to 60,000 characters can be accessed in single Unicode 16-bit units. Furthermore, by utilizing surrogate pairings, an additional one million characters can be accessed.

Two ranges of Unicode code values have been allocated for the highest and lowest values of the pairings. The set of low values falls between 0xDC00 and 0xDFFF, while the set of high values is bounded between 0xD800 and 0xDBFF. Characters requiring surrogate pairs are rare because the first 64,000 values have already encoded the most commonly used characters. Most frequently used characters can be held in UTF-16 with a single code unit for each code point, thereby achieving an optimal trade-off between management and storage capacity. This is the standard encoding defined by Unicode.

Is Unicode a 16-bit encoding?

Unicode employs an 8-bit or 16-bit encoding scheme contingent on the type of data being encoded. As a rule of thumb, each character in the 16-bit encoding form is two bytes wide. This sixteen-bit encoding format is usually denoted as U+hhhh, with hhhh being the character’s hexadecimal code point. Most of the world’s major languages can be encoded using this encoding form, which yields in excess of 65 000 code components.

The Unicode standard also features a mechanism which allows for the encoding of up to one million extra characters. This extension process encodes an extended or supplemental character using two high and low surrogate code points. The coding value for the primary (or high) surrogate code point ranges from U+D800 to U+DBFF, while the coding value for the secondary (or low) surrogate code point ranges from U+DC00 to U+DFFF.

Can Unicode text be represented in more than one way?

Unicode data can be presented in different formats, such as UTF-8, UTF-16, and UTF-32. All of them can accommodate the entirety of Unicode, though they differ in terms of their code unit bit lengths. Additionally, UTS #6: A Standard Compression Scheme for Unicode (SCSU) outlines a compression modification.

What is a UTF?

The Unicode Transformation Format (UTF) is a character encoding format enabling the representation of all Unicode character code points. The most commonly used variety is UTF-8, which is an 8-bit code unit variable length encoding developed to be compatible with ASCII encoding. Another term for the Unicode Transformation Format is the Universal Transformation Format.

Unicode utilizes two distinct encodings: the Unicode Transformation Format (UTF) and the Universal Character Set (UCS). These encodings provide a range of code points which are mapped into collections of coded values, the number of bits used by each code value being indicated by the encoding names. In this way, code points are assigned to uniquely identify each character.