This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: broken check in wchar testsuite


On 2019-09-03 10:00, Giacomo Tesio wrote:
> Appartently the check at newlib/testsuite/newlib.wctype/twctype.c:40
> is wrong since the unicode character 0x0967 is a number, not a letter.
> See https://www.fileformat.info/info/unicode/char/0967/index.htm

$ grep '^0967' /usr/share/unicode/ucd/UnicodeData.txt
0967;DEVANAGARI DIGIT ONE;Nd;0;L;;1;1;1;N;;;;;

Those have been there as digits since Unicode 1.0 in 1991:
http://www.unicode.org/versions/Unicode1.0.0/ch04.pdf

> The nearest letter I've found is 0x0961, see
> https://www.fileformat.info/info/unicode/char/0961/index.htm

$ egrep '^1?096.;[^;]+LETTER[^;]+;L' /usr/share/unicode/ucd/UnicodeData.txt
0960;DEVANAGARI LETTER VOCALIC RR;Lo;0;L;;;;;N;;;;;
0961;DEVANAGARI LETTER VOCALIC LL;Lo;0;L;;;;;N;;;;;

Perhaps 0x0967 was intended to be one of the letters 0x0[0124578F]67:

$ egrep '^1?0.67;[^;]+LETTER[^;]+;L' /usr/share/unicode/ucd/UnicodeData.txt
0067;LATIN SMALL LETTER G;Ll;0;L;;;;;N;;;0047;;0047
0167;LATIN SMALL LETTER T WITH STROKE;Ll;0;L;;;;;N;LATIN SMALL LETTER T
BAR;;0166;;0166
0267;LATIN SMALL LETTER HENG WITH HOOK;Ll;0;L;;;;;N;LATIN SMALL LETTER HENG HOOK;;;;
0467;CYRILLIC SMALL LETTER LITTLE YUS;Ll;0;L;;;;;N;;;0466;;0466
0567;ARMENIAN SMALL LETTER EH;Ll;0;L;;;;;N;;;0537;;0537
0767;ARABIC LETTER NOON WITH TWO DOTS BELOW;Lo;0;AL;;;;;N;;;;;
0867;SYRIAC LETTER MALAYALAM RA;Lo;0;AL;;;;;N;;;;;
0F67;TIBETAN LETTER HA;Lo;0;L;;;;;N;;;;;
10367;OLD PERMIC LETTER YRY;Lo;0;L;;;;;N;;;;;
10467;SHAVIAN LETTER EGG;Lo;0;L;;;;;N;;;;;
10867;PALMYRENE LETTER HETH;Lo;0;R;;;;;N;;;;;
10A67;OLD SOUTH ARABIAN LETTER RESH;Lo;0;R;;;;;N;;;;;
10B67;INSCRIPTIONAL PAHLAVI LETTER HETH;Lo;0;R;;;;;N;;;;;

Could just pick the first for a patch:

--- a/twctype.c 2019-03-23 20:44:45.229950600 -0600
+++ b/twctype.c 2019-09-03 14:10:30.440326700 -0600
@@ -37,7 +37,7 @@ int main()
   else
     {
       setlocale (LC_CTYPE, "C-UTF-8");
-      CHECK (iswalpha(0x0967));
+      CHECK (iswalpha(0x0067));
       CHECK (!iswalpha(0x128e));
       CHECK (iswalnum(0x1d7ce));
       CHECK (!iswalnum(0x1d800));

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]