charset changes
Thomas Wolff
towo@towo.net
Sat Feb 6 23:44:00 GMT 2010
Corinna Vinschen schrieb:
>> ...
>>
> I just read the GB18030 entry in the german wikipedia again and, boy,
> I dislike that codeset immediately every time. 2-byte sequences have
> a trailing byte in the range 0x40-0xfe, 3-byte sequences don't exist,
> 4-byte sequences have a second and forth byte in the range 0x30-0x39.
> Why, oh why, do codeset implementors have to overload the ASCII range
> without need.
>
While unwieldy to handle, this is historically explained. It is an
immediate consequence of the design requirement to be upwards compatible
with GBK which already used the range from 0x40, so in order to
distinguish new and longer sequences from the GBK 2-byte sequences there
was no choice than to use an even lower range for them.
More information about the Cygwin-developers
mailing list