Bug 19919 - iso14651_t1_common: Correct the Malayalam sorting order of 0D36 and 0D37
Summary: iso14651_t1_common: Correct the Malayalam sorting order of 0D36 and 0D37
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: 2.25
: P2 normal
Target Milestone: 2.26
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-07 07:26 UTC by Santhosh Thottingal
Modified: 2017-07-13 15:37 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Patch to correct the sorting order of 0D36 and 0D37 (415 bytes, patch)
2016-04-07 07:26 UTC, Santhosh Thottingal
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Santhosh Thottingal 2016-04-07 07:26:08 UTC
Created attachment 9161 [details]
Patch to correct the sorting order of 0D36 and 0D37

The Malayalam characters ശ(U+0D36) and ഷ(U+0D37) should be sorted just like the order of they unicode code points. In the master version of glibc, the order is swapped.

It sorts ശ(U+0D36) after ഷ(U+0D37) which is a bug in the patch I submitted long time back(Bug 12541)

$ LANG=ml_IN.UTF-8 sort ~/sort.txt
വ
ഷ
ശ
സ
ഹ

This is wrong.

With the attached patch, I will get and it is correct.

$ LANG=ml_IN.UTF-8 sort ~/sort.txt
വ
ശ
ഷ
സ
ഹ

The mistake was probably happened because of confusing naming of Unicode characters that have very similar pronunciation.
Comment 1 Pravin S 2016-04-07 09:03:33 UTC
Hi Santhosh :)

   Patch looks good to me. Do you have any reference for this change?
Comment 2 Santhosh Thottingal 2016-04-07 10:33:12 UTC
References:
(1) Unicode Malayalam Code chart http://www.unicode.org/charts/PDF/U0D00.pdf - This patch just follows the code point order and as per language rules, there is no reordering for collation that is different from code point order
(2) https://en.wikipedia.org/wiki/Malayalam_script#Consonants
(3) Samkshiptha Sabdatharavali Malayalam Dictionary 2011 DC Books
Comment 3 Mike Frysinger 2016-04-07 18:00:13 UTC
iiuc, the iso tables are generated now ...
Comment 4 Santhosh Thottingal 2016-04-08 03:27:17 UTC
Collation tables are not automatically generated. Character database is generated from Unicode data.
Comment 5 Santhosh Thottingal 2016-05-15 11:08:08 UTC
ping. The references are given above. Can somebody look into this?
Comment 6 Pravin S 2017-04-18 06:58:03 UTC
I have tested this patch by applying to master branch and it is giving as expected results.

Reviewed+

Santhosh did you submitted this patch to libc-alpha mailing list?? https://sourceware.org/ml/libc-alpha/
Comment 7 Santhosh Thottingal 2017-04-18 09:43:32 UTC
The patch was submitted to libc-alpha mailing list https://sourceware.org/ml/libc-alpha/2017-04/msg00305.html
Comment 8 Santhosh Thottingal 2017-04-19 10:27:33 UTC
Changelog: 
BZ 19919: Corrected the Malayalam sorting order of 0D36 and 0D37

(I hope the format is correct, feel free to edit as required)
Comment 9 cvs-commit@gcc.gnu.org 2017-06-11 14:10:30 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  b05eca0e1d96aecb25516287913c54bbb93d3d92 (commit)
      from  8458956a6219b6dbd97b0e9e97caf742f3c6342e (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b05eca0e1d96aecb25516287913c54bbb93d3d92

commit b05eca0e1d96aecb25516287913c54bbb93d3d92
Author: Santhosh Thottingal <santhosh.thottingal@gmail.com>
Date:   Sun Jun 11 10:08:37 2017 -0400

    Correct collation rules for Malayalam.
    
    	[BZ #19922]
    	* locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF.
    
    	[BZ #19919]
    	* locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.

-----------------------------------------------------------------------

Summary of changes:
 localedata/ChangeLog                  |    8 ++++++++
 localedata/locales/iso14651_t1_common |   26 ++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)
Comment 10 Zack Weinberg 2017-06-11 14:16:39 UTC
Will be fixed in 2.26.
Comment 11 cvs-commit@gcc.gnu.org 2017-06-11 14:27:38 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.25/master has been updated
       via  f92b1025980a939645b1ec7e550411a05ac7c76f (commit)
      from  b8d2e394a2900cef5bbbe0503f15960f64a943b1 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f92b1025980a939645b1ec7e550411a05ac7c76f

commit f92b1025980a939645b1ec7e550411a05ac7c76f
Author: Santhosh Thottingal <santhosh.thottingal@gmail.com>
Date:   Sun Jun 11 10:08:37 2017 -0400

    Correct collation rules for Malayalam.
    
    	[BZ #19922]
    	* locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF.
    
    	[BZ #19919]
    	* locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.

-----------------------------------------------------------------------

Summary of changes:
 localedata/ChangeLog                  |    8 ++++++++
 localedata/locales/iso14651_t1_common |   26 ++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)
Comment 12 cvs-commit@gcc.gnu.org 2017-06-11 14:30:39 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.23/master has been updated
       via  9f172a30acdd64e140bedd438458830fa8c27ad8 (commit)
      from  0be74c5c7cb239e4884d1ee0fd48c746a0bd1a65 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9f172a30acdd64e140bedd438458830fa8c27ad8

commit 9f172a30acdd64e140bedd438458830fa8c27ad8
Author: Santhosh Thottingal <santhosh.thottingal@gmail.com>
Date:   Sun Jun 11 10:08:37 2017 -0400

    Correct collation rules for Malayalam.
    
    	[BZ #19922]
    	* locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF.
    
    	[BZ #19919]
    	* locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.

-----------------------------------------------------------------------

Summary of changes:
 localedata/ChangeLog                  |    8 ++++++++
 localedata/locales/iso14651_t1_common |   26 ++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)
Comment 13 cvs-commit@gcc.gnu.org 2017-06-11 14:31:39 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.24/master has been updated
       via  4e291e7c5277af2eec279e2047653f04fad483e1 (commit)
      from  0505a57d4381f2baaeed73e96b161d0fb313fa5c (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4e291e7c5277af2eec279e2047653f04fad483e1

commit 4e291e7c5277af2eec279e2047653f04fad483e1
Author: Santhosh Thottingal <santhosh.thottingal@gmail.com>
Date:   Sun Jun 11 10:08:37 2017 -0400

    Correct collation rules for Malayalam.
    
    	[BZ #19922]
    	* locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF.
    
    	[BZ #19919]
    	* locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.

-----------------------------------------------------------------------

Summary of changes:
 localedata/ChangeLog                  |    8 ++++++++
 localedata/locales/iso14651_t1_common |   26 ++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)
Comment 14 cvs-commit@gcc.gnu.org 2017-07-13 15:37:32 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, linaro/2.23/master has been updated
       via  ceeb0740ed04c48170f9f6f15fef55637ad84e1b (commit)
       via  24adabbe17d24b9cf4f42d81f546359f72515ce3 (commit)
       via  8224a992e15369224860c891e7367e6ab66f6fde (commit)
       via  ed739093d19855c71b3f38bfed7d318340b22612 (commit)
       via  fec2dc4089f6688e0f4ffc962700a0858f08bef9 (commit)
      from  6636d6f4fe5e6905bfe463874b4f958ed1ae4a84 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ceeb0740ed04c48170f9f6f15fef55637ad84e1b

commit ceeb0740ed04c48170f9f6f15fef55637ad84e1b
Author: Siddhesh Poyarekar <siddhesh@sourceware.org>
Date:   Tue Mar 7 20:52:04 2017 +0530

    Ignore and remove LD_HWCAP_MASK for AT_SECURE programs (bug #21209)
    
    The LD_HWCAP_MASK environment variable may alter the selection of
    function variants for some architectures.  For AT_SECURE process it
    means that if an outdated routine has a bug that would otherwise not
    affect newer platforms by default, LD_HWCAP_MASK will allow that bug
    to be exploited.
    
    To be on the safe side, ignore and disable LD_HWCAP_MASK for setuid
    binaries.
    
    	[BZ #21209]
    	* elf/rtld.c (process_envvars): Ignore LD_HWCAP_MASK for
    	AT_SECURE processes.
    	* sysdeps/generic/unsecvars.h: Add LD_HWCAP_MASK.
    
    (cherry picked from commit 1c1243b6fc33c029488add276e56570a07803bfd)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=24adabbe17d24b9cf4f42d81f546359f72515ce3

commit 24adabbe17d24b9cf4f42d81f546359f72515ce3
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Jun 19 22:32:12 2017 +0200

    ld.so: Reject overly long LD_AUDIT path elements
    
    Also only process the last LD_AUDIT entry.
    
    (cherry picked from commit 81b82fb966ffbd94353f793ad17116c6088dedd9)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8224a992e15369224860c891e7367e6ab66f6fde

commit 8224a992e15369224860c891e7367e6ab66f6fde
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Jun 19 22:31:04 2017 +0200

    ld.so: Reject overly long LD_PRELOAD path elements
    
    (cherry picked from commit 6d0ba622891bed9d8394eef1935add53003b12e8)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ed739093d19855c71b3f38bfed7d318340b22612

commit ed739093d19855c71b3f38bfed7d318340b22612
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Jun 19 18:34:53 2017 +0200

    CVE-2017-1000366: Ignore LD_LIBRARY_PATH for AT_SECURE=1 programs [BZ #21624]
    
    LD_LIBRARY_PATH can only be used to reorder system search paths, which
    is not useful functionality.
    
    This makes an exploitable unbounded alloca in _dl_init_paths unreachable
    for AT_SECURE=1 programs.
    
    (cherry picked from commit f6110a8fee2ca36f8e2d2abecf3cba9fa7b8ea7d)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=fec2dc4089f6688e0f4ffc962700a0858f08bef9

commit fec2dc4089f6688e0f4ffc962700a0858f08bef9
Author: Santhosh Thottingal <santhosh.thottingal@gmail.com>
Date:   Sun Jun 11 10:08:37 2017 -0400

    Correct collation rules for Malayalam.
    
    	[BZ #19922]
    	* locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF.
    
    	[BZ #19919]
    	* locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                             |   32 ++++++
 NEWS                                  |    2 +
 elf/rtld.c                            |  198 +++++++++++++++++++++++++++------
 localedata/ChangeLog                  |    8 ++
 localedata/locales/iso14651_t1_common |   26 ++++-
 sysdeps/generic/unsecvars.h           |    1 +
 6 files changed, 230 insertions(+), 37 deletions(-)