This is the mail archive of the
glibc-bugs@sources.redhat.com
mailing list for the glibc project.
[Bug localedata/368] localedef fails with coplex LC_COLLATE rules
- From: "barbier at linuxfr dot org" <sourceware-bugzilla at sources dot redhat dot com>
- To: glibc-bugs at sources dot redhat dot com
- Date: 17 Jan 2005 21:38:31 -0000
- Subject: [Bug localedata/368] localedef fails with coplex LC_COLLATE rules
- References: <20040905204900.368.pablo@mandrakesoft.com>
- Reply-to: sourceware-bugzilla at sources dot redhat dot com
------- Additional Comments From barbier at linuxfr dot org 2005-01-17 21:38 -------
As this patch only changes the multi-byte sequence, we can check
whether wide-char and multi-byte collations give the same results,
in which case this patch is certainly right.
I created a file containing sequences of 2 Tibetan characters:
$ for i in `seq 0x0F00 0x0FCF`; do
for j in `seq 0x0F00 0x0FCF`; do
printf "0: %08x %08x 0000000a " $i $j | xxd -r -g4
done
done | iconv -f ucs4 -t utf8 > input_file
Then ran
$ LC_ALL=en_US.UTF-8 ./tst-wcscoll < input_file > out.wc-en_US
$ LC_ALL=en_US.UTF-8 ./tst-strcoll < input_file > out.mb-en_US
$ cmp out.wc-en_US out.mb-en_US
$
So results are exactly similar. But to show that this patch allows
more than 256 collating elements, we need to check with more complex
LC_COLLATE sections. I took Pablo's locale file, s/^%%%%</</ to have
more than 256 collating elements, and re-ran this test:
$ export LOCPATH=`mktemp -d /tmp/test.XXXXXX`
$ localedef.patched -i dz_BT -f UTF-8 $LOCPATH/dz_BT
$ LC_ALL=dz_BT ./tst-wcscoll < input_file > out.wc-dz_BT
$ LC_ALL=dz_BT ./tst-strcoll < input_file > out.mb-dz_BT
$ cmp out.wc-dz_BT out.mb-dz_BT
$
Looks good.
Note that tst-strcoll is much slower than tst-wcscoll, which seems
quite logical since the primary key is the first UTF-8 byte and does
not change in the range 0x0F00-0x0FCF.
--
http://sources.redhat.com/bugzilla/show_bug.cgi?id=368
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.