This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)


https://sourceware.org/bugzilla/show_bug.cgi?id=21547

--- Comment #10 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Elie Roux from comment #9)
> I have to say I don't really understand why ICU behaves like that... I think
> we should do two things: 
> 
> - change my rule file so that it contains just one line and fix this oddity
> - report a bug on ICU (maybe it's not a bug per se, but I can't see any
> other way to solve this mistery)

The glibc rules I implemented showed the same behaviour, so I doubt this
is an ICU bug, I think it is a bug in the rules.

> I'll fix the rule file (possibly today). If you have some time do you think
> you could report the ICU bug?
> 
> Thank you!

The order of the rules does matter. Look at this:

A somewhat longer test input:

     mfabian@taka:/local/mfabian/src/glibc (locales *$%)
     $ cat localedata/dz_BT.UTF-8.in.mini
     ཉ
     ྋྙ
     གཉ
     གཉྫ
     mfabian@taka:/local/mfabian/src/glibc (locales *$%)

Again the same two lines in the rules file:

     mfabian@taka:/local/mfabian/src/glibc (locales *$%)
     $ cat rules-mini.txt
     &གཉ<གཉྫ
     &ཉ<<ྋྙ<གཉ<མཉ<རྙ=ཪྙ<སྙ<བརྙ=བཪྙ<བསྙ
     mfabian@taka:/local/mfabian/src/glibc (locales *$%)

Testing:

     mfabian@taka:/local/mfabian/src/glibc (locales *$%)
     $ ~/bin/icu-collation-test.py -r rules-mini.txt -i
localedata/dz_BT.UTF-8.in.mini -o /tmp/dz_BT.UTF-8.out  
     mfabian@taka:/local/mfabian/src/glibc (locales *$%)

Result looks unexpected, probably not what you want:

     mfabian@taka:/local/mfabian/src/glibc (locales *$%)
     $ diff -u /local/mfabian/src/glibc/localedata/dz_BT.UTF-8.in.mini
/tmp/dz_BT.UTF-8.out 
     --- /local/mfabian/src/glibc/localedata/dz_BT.UTF-8.in.mini       
2018-01-15 15:45:48.332377013 +0100
     +++ /tmp/dz_BT.UTF-8.out   2018-01-15 15:50:46.357040054 +0100
     @@ -1,4 +1,4 @@
     +གཉྫ
      ཉ
      ྋྙ
      གཉ
     -གཉྫ
     mfabian@taka:/local/mfabian/src/glibc (locales *$%)

Now I reverse the order of the two lines in the rules file:

     mfabian@taka:/local/mfabian/src/glibc (locales *$%)
     $ cat rules-mini.txt
     &ཉ<<ྋྙ<གཉ<མཉ<རྙ=ཪྙ<སྙ<བརྙ=བཪྙ<བསྙ
     &གཉ<གཉྫ
     mfabian@taka:/local/mfabian/src/glibc (locales *$%)

Testing again:

     mfabian@taka:/local/mfabian/src/glibc (locales *$%)
     $ ~/bin/icu-collation-test.py -r rules-mini.txt -i
localedata/dz_BT.UTF-8.in.mini -o /tmp/dz_BT.UTF-8.out  
     mfabian@taka:/local/mfabian/src/glibc (locales *$%)

No difference, I get the expected order:

     mfabian@taka:/local/mfabian/src/glibc (locales *$%)
     $ diff -u /local/mfabian/src/glibc/localedata/dz_BT.UTF-8.in.mini
/tmp/dz_BT.UTF-8.out 
     mfabian@taka:/local/mfabian/src/glibc (locales *$%)
     $

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]