UTF-8 character encoding

Lee ler762@gmail.com
Wed Jun 27 09:31:00 GMT 2018

On 6/26/18, Thomas Wolff  wrote:

> This encoding scheme is wrong; where did you get it from? Maybe it's the
> obsolete UTF-8...


I thought I saw something about utf-8 being able to handle a 31 bit
value..  is that also obsolete/wrong?

how about this for the current encoding scheme:

Table 3-6.  UTF-8 Bit Distribution
Bits    Scalar Value               First Byte  Second Byte  Third Byte
 Fourth Byte
  7   00000000 0xxxxxxx            0xxxxxxx
 11   00000yyy yyxxxxxx            110yyyyy    10xxxxxx
 16   zzzzyyyy yyxxxxxx            1110zzzz    10yyyyyy     10xxxxxx
 21   000uuuuu zzzzyyyy yyxxxxxx   11110uuu    10uuzzzz     10yyyyyy    10xxxxxx


Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

More information about the Cygwin mailing list