charset changes

Thomas Wolff towo@towo.net
Sat Feb 6 23:44:00 GMT 2010


Corinna Vinschen schrieb:
>> ...
>>     
> I just read the GB18030 entry in the german wikipedia again and, boy,
> I dislike that codeset immediately every time.  2-byte sequences have
> a trailing byte in the range 0x40-0xfe, 3-byte sequences don't exist,
> 4-byte sequences have a second and forth byte in the range 0x30-0x39.
> Why, oh why, do codeset implementors have to overload the ASCII range
> without need.
>   
While unwieldy to handle, this is historically explained. It is an 
immediate consequence of the design requirement to be upwards compatible 
with GBK which already used the range from 0x40, so in order to 
distinguish new and longer sequences from the GBK 2-byte sequences there 
was no choice than to use an even lower range for them.



More information about the Cygwin-developers mailing list