This is the mail archive of the
newlib@sourceware.org
mailing list for the newlib project.
Re: [PATCH/RFA] Internationalize ctype functionality
On Mar 27 21:13, Howland Craig D (Craig) wrote:
> OTOH, does it make sense to only do tolower and toupper but not the
> rest of the others at the same time? (Should these tolower and toupper
> changes be tabled until later?)
What rest? isupper/islower etc? That's done with my patch?
> That is, will either of them change the 1-byte value into a different
> 1-byte value?
Yes.
> Couldn't then the value just be given straight to towlower() and the
> return therefrom used directly? (It would be much more efficient.)
> For example,
>
> else if (c != EOF && MB_CUR_MAX == 1)
> - {
> - char s[MB_LEN_MAX] = { c, '\0' };
> - wchar_t wc;
> - if (mbtowc (&wc, s, 1) >= 0
> - && wctomb (s, (wchar_t) towupper ((wint_t) wc)) == 1)
> - c = s[0];
> + c = (unsigned char) towupper ((wint_t) (unsigned char) c);
> - }
No, that doesn't work. It only works for the ISO-8859-1 charset
because the character set from ISO-8859-1 forms the base Unicode
Latin1 plain from 0xa0 to 0xff. So, only for ISO-8859-1 the singlebyte
character value is equal to the wide char value.
Assume you're not using ISO-8859-1 but ISO-8859-3 instead, Latin 3
instead. This codepage contains charcters which are not in the base
Latin 1 plain. For example
tolower(0xaf) == 0xbf
The character 0xaf in ISO-8859-3 is a latin Z with a dot above. The
Unicode representation of this character is 0x017b. The lower case
equivalent is Unicode 0x017c. Transform it back to ISO-8859-3 and you
get 0xbf, the latin z with dot above in ISO-8859-3 representation.
Do as you suggest and 0xaf is converted to Unicode 0xaf, which is the
macron sign, a punctuation character, which obviously has no lower
case equivalent. Result: 0xaf.
Corinna
--
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat