This is the mail archive of the
newlib@sourceware.org
mailing list for the newlib project.
Re: broken check in wchar testsuite
- From: Brian Inglis <Brian dot Inglis at SystematicSw dot ab dot ca>
- To: newlib at sourceware dot org
- Date: Tue, 3 Sep 2019 14:40:25 -0600
- Subject: Re: broken check in wchar testsuite
- References: <CAHL7psEVh4iR05KWW8PM-x4yDsSAyQ1C_QFkH6RC3uKc=cTnHg@mail.gmail.com>
- Reply-to: Brian dot Inglis at SystematicSw dot ab dot ca
On 2019-09-03 10:00, Giacomo Tesio wrote:
> Appartently the check at newlib/testsuite/newlib.wctype/twctype.c:40
> is wrong since the unicode character 0x0967 is a number, not a letter.
> See https://www.fileformat.info/info/unicode/char/0967/index.htm
$ grep '^0967' /usr/share/unicode/ucd/UnicodeData.txt
0967;DEVANAGARI DIGIT ONE;Nd;0;L;;1;1;1;N;;;;;
Those have been there as digits since Unicode 1.0 in 1991:
http://www.unicode.org/versions/Unicode1.0.0/ch04.pdf
> The nearest letter I've found is 0x0961, see
> https://www.fileformat.info/info/unicode/char/0961/index.htm
$ egrep '^1?096.;[^;]+LETTER[^;]+;L' /usr/share/unicode/ucd/UnicodeData.txt
0960;DEVANAGARI LETTER VOCALIC RR;Lo;0;L;;;;;N;;;;;
0961;DEVANAGARI LETTER VOCALIC LL;Lo;0;L;;;;;N;;;;;
Perhaps 0x0967 was intended to be one of the letters 0x0[0124578F]67:
$ egrep '^1?0.67;[^;]+LETTER[^;]+;L' /usr/share/unicode/ucd/UnicodeData.txt
0067;LATIN SMALL LETTER G;Ll;0;L;;;;;N;;;0047;;0047
0167;LATIN SMALL LETTER T WITH STROKE;Ll;0;L;;;;;N;LATIN SMALL LETTER T
BAR;;0166;;0166
0267;LATIN SMALL LETTER HENG WITH HOOK;Ll;0;L;;;;;N;LATIN SMALL LETTER HENG HOOK;;;;
0467;CYRILLIC SMALL LETTER LITTLE YUS;Ll;0;L;;;;;N;;;0466;;0466
0567;ARMENIAN SMALL LETTER EH;Ll;0;L;;;;;N;;;0537;;0537
0767;ARABIC LETTER NOON WITH TWO DOTS BELOW;Lo;0;AL;;;;;N;;;;;
0867;SYRIAC LETTER MALAYALAM RA;Lo;0;AL;;;;;N;;;;;
0F67;TIBETAN LETTER HA;Lo;0;L;;;;;N;;;;;
10367;OLD PERMIC LETTER YRY;Lo;0;L;;;;;N;;;;;
10467;SHAVIAN LETTER EGG;Lo;0;L;;;;;N;;;;;
10867;PALMYRENE LETTER HETH;Lo;0;R;;;;;N;;;;;
10A67;OLD SOUTH ARABIAN LETTER RESH;Lo;0;R;;;;;N;;;;;
10B67;INSCRIPTIONAL PAHLAVI LETTER HETH;Lo;0;R;;;;;N;;;;;
Could just pick the first for a patch:
--- a/twctype.c 2019-03-23 20:44:45.229950600 -0600
+++ b/twctype.c 2019-09-03 14:10:30.440326700 -0600
@@ -37,7 +37,7 @@ int main()
else
{
setlocale (LC_CTYPE, "C-UTF-8");
- CHECK (iswalpha(0x0967));
+ CHECK (iswalpha(0x0067));
CHECK (!iswalpha(0x128e));
CHECK (iswalnum(0x1d7ce));
CHECK (!iswalnum(0x1d800));
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.