Bug 19922 - iso14651_t1_common: Define collation for Malayalam chillu characters
Summary: iso14651_t1_common: Define collation for Malayalam chillu characters
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: 2.25
: P2 normal
Target Milestone: 2.26
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-08 05:31 UTC by Santhosh Thottingal
Modified: 2017-07-13 15:37 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
iso14651_t1_common: define collation for Malayalam chillu characters (740 bytes, patch)
2016-04-08 05:31 UTC, Santhosh Thottingal
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Santhosh Thottingal 2016-04-08 05:31:03 UTC
Created attachment 9164 [details]
iso14651_t1_common: define collation for Malayalam chillu characters

Malayalam Chillu characters, that were added in Unicode 5.1 is not considered in the collation rules for Malayalam. These 6 characters are 
U+07DA  to U+07DF

Unicode defines them as alternate representation of ZWJ based Chillus (Consonant+Virama+ZWJ). ZWJ based chillus are represented in the collation rules already.

So U+07DA  to U+07DF should have primary collation weight equal to the ZWJ based Chillus. Note that ZWJ has 0 collation weight(ignorable in collation). So:

U+07DA(ൺ) and U+0D23(ണ)+ U+0D4D(്) have same primary weight and differs in secondary level weight.

Unicode CLDR collation also follows exactly same logic. See http://unicode.org/cldr/trac/browser/trunk/common/collation/ml.xml

 [...]
 #  Pre-5.1 Chillus secondary equal to 5.1 chillus.
 #  Chillus primary equal to their consonant_dead form.
 &ക്<<ക്\u200D<<<ൿ
 &ണ്<<ണ്\u200D<<<ൺ
 &ന്<<ന്\u200D<<<ൻ
 &ര്<<ര്\u200D<<<ർ
 &ല്<<ല്\u200D<<<ൽ
 &ള്<<ള്\u200D<<<ൾ
 [...]


The attached patch implements this.

To test, have a text file with following content:
ണ്‍
ണ്
ൺ

$ LANG=ml_IN.UTF-8 sort ~/sort.txt
ണ്
ണ്‍
ൺ

The same input can be tested with http://demo.icu-project.org/icu-bin/collation.html and verify the output is same as the above output.

Explanation of output:

1. ണ\u0D4D - This is ണ + ് 
2. ണ\u0D4D\u200D - This is ണ + ് + ZWJ - ZWJ based chillu. Sorts after the ZWJ less dead form of ണ.
3. ൺ - This is atomic chillu ൺ U+07DA - with secondary level collation weight differing from above ZWJ based chillu.
Comment 1 Santhosh Thottingal 2017-04-18 09:44:28 UTC
Patch submitted to libc-alpha list https://sourceware.org/ml/libc-alpha/2017-04/msg00306.html
Comment 2 Pravin S 2017-04-18 09:50:51 UTC
Verified patch, it applies clean. Giving expected results.


Thank you Santhosh for this patch.
Atomic Chillu U+0D7A-U+0D7F are in Unicode from 5.1 version and it is important to have sorting support for this in Glibc.
Comment 3 Santhosh Thottingal 2017-04-19 10:38:30 UTC
Changelog: 
BZ 19922: Defined collation for 6 Malayalam chillu characters U+07DA to U+07DF

(I hope the format is correct, feel free to edit as required)
Comment 4 Pravin S 2017-04-19 11:05:13 UTC
Hi Santhosh,

   Changelog should be of format.  https://sourceware.org/bugzilla/show_bug.cgi?id=17588#c11
    
   You can also check glibc/localedata/ChangeLog
Comment 5 cvs-commit@gcc.gnu.org 2017-06-11 14:10:27 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  b05eca0e1d96aecb25516287913c54bbb93d3d92 (commit)
      from  8458956a6219b6dbd97b0e9e97caf742f3c6342e (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b05eca0e1d96aecb25516287913c54bbb93d3d92

commit b05eca0e1d96aecb25516287913c54bbb93d3d92
Author: Santhosh Thottingal <santhosh.thottingal@gmail.com>
Date:   Sun Jun 11 10:08:37 2017 -0400

    Correct collation rules for Malayalam.
    
    	[BZ #19922]
    	* locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF.
    
    	[BZ #19919]
    	* locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.

-----------------------------------------------------------------------

Summary of changes:
 localedata/ChangeLog                  |    8 ++++++++
 localedata/locales/iso14651_t1_common |   26 ++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)
Comment 6 Zack Weinberg 2017-06-11 14:17:10 UTC
Will be fixed in 2.26.
Comment 7 cvs-commit@gcc.gnu.org 2017-06-11 14:27:37 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.25/master has been updated
       via  f92b1025980a939645b1ec7e550411a05ac7c76f (commit)
      from  b8d2e394a2900cef5bbbe0503f15960f64a943b1 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f92b1025980a939645b1ec7e550411a05ac7c76f

commit f92b1025980a939645b1ec7e550411a05ac7c76f
Author: Santhosh Thottingal <santhosh.thottingal@gmail.com>
Date:   Sun Jun 11 10:08:37 2017 -0400

    Correct collation rules for Malayalam.
    
    	[BZ #19922]
    	* locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF.
    
    	[BZ #19919]
    	* locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.

-----------------------------------------------------------------------

Summary of changes:
 localedata/ChangeLog                  |    8 ++++++++
 localedata/locales/iso14651_t1_common |   26 ++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)
Comment 8 cvs-commit@gcc.gnu.org 2017-06-11 14:30:36 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.23/master has been updated
       via  9f172a30acdd64e140bedd438458830fa8c27ad8 (commit)
      from  0be74c5c7cb239e4884d1ee0fd48c746a0bd1a65 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9f172a30acdd64e140bedd438458830fa8c27ad8

commit 9f172a30acdd64e140bedd438458830fa8c27ad8
Author: Santhosh Thottingal <santhosh.thottingal@gmail.com>
Date:   Sun Jun 11 10:08:37 2017 -0400

    Correct collation rules for Malayalam.
    
    	[BZ #19922]
    	* locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF.
    
    	[BZ #19919]
    	* locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.

-----------------------------------------------------------------------

Summary of changes:
 localedata/ChangeLog                  |    8 ++++++++
 localedata/locales/iso14651_t1_common |   26 ++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)
Comment 9 cvs-commit@gcc.gnu.org 2017-06-11 14:31:25 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.24/master has been updated
       via  4e291e7c5277af2eec279e2047653f04fad483e1 (commit)
      from  0505a57d4381f2baaeed73e96b161d0fb313fa5c (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4e291e7c5277af2eec279e2047653f04fad483e1

commit 4e291e7c5277af2eec279e2047653f04fad483e1
Author: Santhosh Thottingal <santhosh.thottingal@gmail.com>
Date:   Sun Jun 11 10:08:37 2017 -0400

    Correct collation rules for Malayalam.
    
    	[BZ #19922]
    	* locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF.
    
    	[BZ #19919]
    	* locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.

-----------------------------------------------------------------------

Summary of changes:
 localedata/ChangeLog                  |    8 ++++++++
 localedata/locales/iso14651_t1_common |   26 ++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)
Comment 10 cvs-commit@gcc.gnu.org 2017-07-13 15:37:31 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, linaro/2.23/master has been updated
       via  ceeb0740ed04c48170f9f6f15fef55637ad84e1b (commit)
       via  24adabbe17d24b9cf4f42d81f546359f72515ce3 (commit)
       via  8224a992e15369224860c891e7367e6ab66f6fde (commit)
       via  ed739093d19855c71b3f38bfed7d318340b22612 (commit)
       via  fec2dc4089f6688e0f4ffc962700a0858f08bef9 (commit)
      from  6636d6f4fe5e6905bfe463874b4f958ed1ae4a84 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ceeb0740ed04c48170f9f6f15fef55637ad84e1b

commit ceeb0740ed04c48170f9f6f15fef55637ad84e1b
Author: Siddhesh Poyarekar <siddhesh@sourceware.org>
Date:   Tue Mar 7 20:52:04 2017 +0530

    Ignore and remove LD_HWCAP_MASK for AT_SECURE programs (bug #21209)
    
    The LD_HWCAP_MASK environment variable may alter the selection of
    function variants for some architectures.  For AT_SECURE process it
    means that if an outdated routine has a bug that would otherwise not
    affect newer platforms by default, LD_HWCAP_MASK will allow that bug
    to be exploited.
    
    To be on the safe side, ignore and disable LD_HWCAP_MASK for setuid
    binaries.
    
    	[BZ #21209]
    	* elf/rtld.c (process_envvars): Ignore LD_HWCAP_MASK for
    	AT_SECURE processes.
    	* sysdeps/generic/unsecvars.h: Add LD_HWCAP_MASK.
    
    (cherry picked from commit 1c1243b6fc33c029488add276e56570a07803bfd)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=24adabbe17d24b9cf4f42d81f546359f72515ce3

commit 24adabbe17d24b9cf4f42d81f546359f72515ce3
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Jun 19 22:32:12 2017 +0200

    ld.so: Reject overly long LD_AUDIT path elements
    
    Also only process the last LD_AUDIT entry.
    
    (cherry picked from commit 81b82fb966ffbd94353f793ad17116c6088dedd9)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8224a992e15369224860c891e7367e6ab66f6fde

commit 8224a992e15369224860c891e7367e6ab66f6fde
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Jun 19 22:31:04 2017 +0200

    ld.so: Reject overly long LD_PRELOAD path elements
    
    (cherry picked from commit 6d0ba622891bed9d8394eef1935add53003b12e8)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ed739093d19855c71b3f38bfed7d318340b22612

commit ed739093d19855c71b3f38bfed7d318340b22612
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Jun 19 18:34:53 2017 +0200

    CVE-2017-1000366: Ignore LD_LIBRARY_PATH for AT_SECURE=1 programs [BZ #21624]
    
    LD_LIBRARY_PATH can only be used to reorder system search paths, which
    is not useful functionality.
    
    This makes an exploitable unbounded alloca in _dl_init_paths unreachable
    for AT_SECURE=1 programs.
    
    (cherry picked from commit f6110a8fee2ca36f8e2d2abecf3cba9fa7b8ea7d)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=fec2dc4089f6688e0f4ffc962700a0858f08bef9

commit fec2dc4089f6688e0f4ffc962700a0858f08bef9
Author: Santhosh Thottingal <santhosh.thottingal@gmail.com>
Date:   Sun Jun 11 10:08:37 2017 -0400

    Correct collation rules for Malayalam.
    
    	[BZ #19922]
    	* locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF.
    
    	[BZ #19919]
    	* locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                             |   32 ++++++
 NEWS                                  |    2 +
 elf/rtld.c                            |  198 +++++++++++++++++++++++++++------
 localedata/ChangeLog                  |    8 ++
 localedata/locales/iso14651_t1_common |   26 ++++-
 sysdeps/generic/unsecvars.h           |    1 +
 6 files changed, 230 insertions(+), 37 deletions(-)