broken check in wchar testsuite

Brian Inglis Brian.Inglis@SystematicSw.ab.ca
Tue Sep 3 20:55:00 GMT 2019


On 2019-09-03 10:00, Giacomo Tesio wrote:
> Appartently the check at newlib/testsuite/newlib.wctype/twctype.c:40
> is wrong since the unicode character 0x0967 is a number, not a letter.
> See https://www.fileformat.info/info/unicode/char/0967/index.htm

$ grep '^0967' /usr/share/unicode/ucd/UnicodeData.txt
0967;DEVANAGARI DIGIT ONE;Nd;0;L;;1;1;1;N;;;;;

Those have been there as digits since Unicode 1.0 in 1991:
http://www.unicode.org/versions/Unicode1.0.0/ch04.pdf

> The nearest letter I've found is 0x0961, see
> https://www.fileformat.info/info/unicode/char/0961/index.htm

$ egrep '^1?096.;[^;]+LETTER[^;]+;L' /usr/share/unicode/ucd/UnicodeData.txt
0960;DEVANAGARI LETTER VOCALIC RR;Lo;0;L;;;;;N;;;;;
0961;DEVANAGARI LETTER VOCALIC LL;Lo;0;L;;;;;N;;;;;

Perhaps 0x0967 was intended to be one of the letters 0x0[0124578F]67:

$ egrep '^1?0.67;[^;]+LETTER[^;]+;L' /usr/share/unicode/ucd/UnicodeData.txt
0067;LATIN SMALL LETTER G;Ll;0;L;;;;;N;;;0047;;0047
0167;LATIN SMALL LETTER T WITH STROKE;Ll;0;L;;;;;N;LATIN SMALL LETTER T
BAR;;0166;;0166
0267;LATIN SMALL LETTER HENG WITH HOOK;Ll;0;L;;;;;N;LATIN SMALL LETTER HENG HOOK;;;;
0467;CYRILLIC SMALL LETTER LITTLE YUS;Ll;0;L;;;;;N;;;0466;;0466
0567;ARMENIAN SMALL LETTER EH;Ll;0;L;;;;;N;;;0537;;0537
0767;ARABIC LETTER NOON WITH TWO DOTS BELOW;Lo;0;AL;;;;;N;;;;;
0867;SYRIAC LETTER MALAYALAM RA;Lo;0;AL;;;;;N;;;;;
0F67;TIBETAN LETTER HA;Lo;0;L;;;;;N;;;;;
10367;OLD PERMIC LETTER YRY;Lo;0;L;;;;;N;;;;;
10467;SHAVIAN LETTER EGG;Lo;0;L;;;;;N;;;;;
10867;PALMYRENE LETTER HETH;Lo;0;R;;;;;N;;;;;
10A67;OLD SOUTH ARABIAN LETTER RESH;Lo;0;R;;;;;N;;;;;
10B67;INSCRIPTIONAL PAHLAVI LETTER HETH;Lo;0;R;;;;;N;;;;;

Could just pick the first for a patch:

--- a/twctype.c 2019-03-23 20:44:45.229950600 -0600
+++ b/twctype.c 2019-09-03 14:10:30.440326700 -0600
@@ -37,7 +37,7 @@ int main()
   else
     {
       setlocale (LC_CTYPE, "C-UTF-8");
-      CHECK (iswalpha(0x0967));
+      CHECK (iswalpha(0x0067));
       CHECK (!iswalpha(0x128e));
       CHECK (iswalnum(0x1d7ce));
       CHECK (!iswalnum(0x1d800));

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.



More information about the Newlib mailing list