sorting in between indic languages should be happen as per unicode code point like one given in http://www.unicode.org/Public/UCA/latest/allkeys.txt presently its happening not working like that, bengali script should come after Devanagari its coming at the end. It should be fisrt Devanagari, Bengali , Gurumukhi, Gujarati and so on as per unicode code point
can you post some example unicode data that shows incorrect sorting ? that'll make it easier for us to integrate into tests to prevent future regressions.
Hi Mike, This is not about incorrect sorting. But what should be order when different scripts come together. Example: I think order for these codepoints should be as follows: u+0915, u+0995, u+0A15, u+0A95, u+0B15, u+0B95 I dont remember any reference as of now, but when we decide sorting order between different Unicode script, what should we follow? And IMO answer is http://www.unicode.org/Public/UCA/latest/allkeys.txt
In 2018, we updated the iso14651_t1_common to a 2016 version and then adapted the sort order of many locales. So the sort order of these Indic languages should now be in sync with the DUCET (http://www.unicode.org/Public/UCA/latest/allkeys.txt) as approximately defined in 2016. So I think the problem in the original comment is fixed. commit 9479b6d5e08eacce06c6ab60abc9b2f4eb8b71e4 Author: Mike FABIAN <mfabian@redhat.com> Date: Tue Jan 30 17:59:00 2018 +0100 Update iso14651_t1_common file to ISO14651_2016_TABLE1_en.txt [BZ #14095] [BZ #14095] - Review / update collation data from Unicode / ISO 14651 File downloaded from: http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt Updating this file alone is not enough, there are problems in the new file which need to be fixed and the collation rules for many locales need to be adapted. This is done by the following patches. This update also fixes the problem that many characters are treated as identical when sorting because they were not yet in the old iso14651_t1_common file, see: https://bugzilla.redhat.com/show_bug.cgi?id=1336308 - Infinite (∞) and empty set (∅) are treated as if they were the same character by sort and uniq [BZ #14095] * localedata/locales/iso14651_t1_common: Update file to latest version from ISO (ISO14651_2016_TABLE1_en.txt).
Closing as FIXED.