This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
- From: "maiku.fabian at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Mon, 15 Jan 2018 14:56:21 +0000
- Subject: [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
- Auto-submitted: auto-generated
- References: <bug-21547-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #10 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Elie Roux from comment #9)
> I have to say I don't really understand why ICU behaves like that... I think
> we should do two things:
>
> - change my rule file so that it contains just one line and fix this oddity
> - report a bug on ICU (maybe it's not a bug per se, but I can't see any
> other way to solve this mistery)
The glibc rules I implemented showed the same behaviour, so I doubt this
is an ICU bug, I think it is a bug in the rules.
> I'll fix the rule file (possibly today). If you have some time do you think
> you could report the ICU bug?
>
> Thank you!
The order of the rules does matter. Look at this:
A somewhat longer test input:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ cat localedata/dz_BT.UTF-8.in.mini
ཉ
ྋྙ
གཉ
གཉྫ
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Again the same two lines in the rules file:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ cat rules-mini.txt
&གཉ<གཉྫ
&ཉ<<ྋྙ<གཉ<མཉ<རྙ=ཪྙ<སྙ<བརྙ=བཪྙ<བསྙ
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Testing:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ ~/bin/icu-collation-test.py -r rules-mini.txt -i
localedata/dz_BT.UTF-8.in.mini -o /tmp/dz_BT.UTF-8.out
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Result looks unexpected, probably not what you want:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ diff -u /local/mfabian/src/glibc/localedata/dz_BT.UTF-8.in.mini
/tmp/dz_BT.UTF-8.out
--- /local/mfabian/src/glibc/localedata/dz_BT.UTF-8.in.mini
2018-01-15 15:45:48.332377013 +0100
+++ /tmp/dz_BT.UTF-8.out 2018-01-15 15:50:46.357040054 +0100
@@ -1,4 +1,4 @@
+གཉྫ
ཉ
ྋྙ
གཉ
-གཉྫ
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Now I reverse the order of the two lines in the rules file:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ cat rules-mini.txt
&ཉ<<ྋྙ<གཉ<མཉ<རྙ=ཪྙ<སྙ<བརྙ=བཪྙ<བསྙ
&གཉ<གཉྫ
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Testing again:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ ~/bin/icu-collation-test.py -r rules-mini.txt -i
localedata/dz_BT.UTF-8.in.mini -o /tmp/dz_BT.UTF-8.out
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
No difference, I get the expected order:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ diff -u /local/mfabian/src/glibc/localedata/dz_BT.UTF-8.in.mini
/tmp/dz_BT.UTF-8.out
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$
--
You are receiving this mail because:
You are on the CC list for the bug.