Bug 17225

Summary: ar_SY: localized month names for May and June are incorrect
Product: glibc Reporter: Muhammad Fawwaz Orabi <mfawwaz93>
Component: localedataAssignee: Mike FABIAN <maiku.fabian>
Status: RESOLVED FIXED    
Severity: normal CC: libc-locales, maiku.fabian, msaied93
Priority: P2 Flags: fweimer: security-
Version: 2.23   
Target Milestone: 2.26   
Host: Target:
Build: Last reconfirmed:
Attachments: Patch to fix the month names

Description Muhammad Fawwaz Orabi 2014-08-03 11:29:12 UTC
https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/ar_SY

Full month names of May and June at lines 127 and 128 are incorrect.
May is translated as "نواران" (I'm not even sure this is an Arabic word), and should by "أيار".

June is translated as "حزير" which would be OK as an abbreviated form (still not very much used), because it missing two letters at the end, and should be "حزيران".

Also, the abbreviated forms are translated for May as "نوار" (line 109), which shoud be "أيار" (no abbr. form) and for June as "حزيران", which is the FULL form (I see they are swapped).

TO sum up:

109 should be "<U0623><U064A><U0627><U0631>" (أيار)
110 (حزيران) should not be changed (this is the FULL form, but it should not be changed because abrreviated forms are not familiar in Arabic)

127 should be same as 109
128 should be same as 110 "<U062D><U0632><U064A><U0631><U0627><U0646>" (حزيران)


This is a long standing bug and I'm still wondering why it has not been reported before.

For reference, see Arabic month names on Wikipedia https://en.wikipedia.org/wiki/Arabic_names_of_calendar_months
Comment 1 msaied93 2016-01-17 13:01:33 UTC
Created attachment 8905 [details]
Patch to fix the month names

I've made a patch to fix the month names in this locale.

Links:
CLDR data for ar_SY locale: http://unicode.org/repos/cldr/trunk/common/main/ar_SY.xml
Local Explorer: http://demo.icu-project.org/icu-bin/locexp?_=ar_SY

There's actually a lot of mistakes in ar_* locales and maybe others; ar_LB, for instance, suffer from the same issue reported here.

I suggest creating some script or so to periodically incorporate updates and new locales from CLDR into localedata.
Comment 3 Mike Frysinger 2016-02-09 08:01:57 UTC
from what i can tell, that's not what the CLDR db says:
https://ssl.icu-project.org/icu-bin/locexp?d_=en&_=ar

it shows May as "مايو" and June as "يونيو".

unfortunately, i don't understand Arabic, so i'd lean towards the automated db import and just the CLDR values directly.
Comment 4 msaied93 2016-02-09 10:07:59 UTC
Please note that month names are not the same in all Arab countries. You should compare with ar_SY instead: https://ssl.icu-project.org/icu-bin/locexp?d_=en&_=ar_SY

I vote for the automated import too; updating locales manually is cumbersome and error prone. I'm ready to help as needed although I'm not very familiar with glibc itself.
Comment 5 Mike Frysinger 2016-02-09 10:16:06 UTC
(In reply to msaied93 from comment #4)

thanks, i've updated my script to search first the locale before falling back to the common language.  this is what it shows now:

ar_SY: abmon: changing
{كانون الثاني";"شباط";"آذار";"نيسان";"نوار";"حزيران";"تموز";"آب";"أيلول";"تشرين الأول";"تشرين الثاني";"كانون الأول}
to
{كانون الثاني";"شباط";"آذار";"نيسان";"أيار";"حزيران";"تموز";"آب";"أيلول";"تشرين الأول";"تشرين الثاني";"كانون الأول}

so this seems to agree with the original request

also, while i linked to the ICU projects while saying "CLDR", much of their data is from the CLDR, so it's practically the same.  their web interface is a bit nicer than the CLDR site :).
Comment 6 msaied93 2016-02-09 11:03:30 UTC
(In reply to Mike Frysinger from comment #5)

> ar_SY: abmon: changing
> {كانون
> الثاني";"شباط";"آذار";"نيسان";"نوار";"حزيران";"تموز";"آب";"أيلول";"تشرين
> الأول";"تشرين الثاني";"كانون الأول}
> to
> {كانون
> الثاني";"شباط";"آذار";"نيسان";"أيار";"حزيران";"تموز";"آب";"أيلول";"تشرين
> الأول";"تشرين الثاني";"كانون الأول}
> 
> so this seems to agree with the original request

Yes it does. This is for `abmon`, what about `mon` ? And also I noticed some difference in the date format (d_t_fmt and d_fmt use `,` instead of `،`). There is also LC_NAME which has English values: I didn't see an equivalent for LC_NAME in CLDR data.
Comment 7 Mike Frysinger 2016-02-09 12:03:32 UTC
(In reply to msaied93 from comment #6)

this is the abmon update:
ar_SY: abmon: changing
{كانون الثاني";"شباط";"آذار";"نيسان";"نوار";"حزيران";"تموز";"آب";"أيلول";"تشرين الأول";"تشرين الثاني";"كانون الأول}
to
{كانون الثاني";"شباط";"آذار";"نيسان";"أيار";"حزيران";"تموز";"آب";"أيلول";"تشرين الأول";"تشرين الثاني";"كانون الأول}

the abmon/mon/abday/day are all handled the same way now in my script

the xxx_fmt fields will take more time, but i think i can get them out of cldr

i haven't found a solution for LC_NAME fields as they aren't in cldr
Comment 8 Sourceware Commits 2017-07-06 13:52:25 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  1bea5858dd5b2615288e96525f3918e35f42dd2d (commit)
      from  3adfef7eaafae8dc00fa12cdecde68c01b7d565a (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1bea5858dd5b2615288e96525f3918e35f42dd2d

commit 1bea5858dd5b2615288e96525f3918e35f42dd2d
Author: Rafal Luzynski <digitalfreak@lingonborough.com>
Date:   Sat Jul 1 02:22:37 2017 +0200

    Arabic scripts: More fixes after the recent import.
    
    After the recent import of month names from CLDRv31 (bug 21217,
    commit c853f14) more imports are also needed, mostly abbreviated month
    names.
    
    This patch also updates May (full month name) in ps_AF which was
    skipped in the previous patch.
    
    Incidentally, this import fixes bug 17225 (ar_SY) and partially
    bug 19066 (ar_SA).
    
    CLDR currently has a bug in the full month name for October for ar_IQ, see
    http://unicode.org/cldr/trac/ticket/10460
    
    	* localedata/locales/ar_DZ (abmon): Full import from CLDR, abmon
    	is no longer abbreviated.
    	* localedata/locales/ar_IQ (abmon): Likewise.
    	* localedata/locales/ar_MA (abmon): Likewise.
    	* localedata/locales/ar_TN (abmon): Likewise.
    	* localedata/locales/ps_AF (abmon): Likewise.
    	* localedata/locales/ug_CN (abmon): Likewise.
    	* localedata/locales/ar_SA (abmon): Likewise, partially
    	fixes bug 19066.
    	* localedata/locales/ks_IN (abmon): A copy of mon.
    	* localedata/locales/ur_IN (abmon): Oct reworded "اكتوبر" to
    	"اکتوبر" (same change as mon).
    	* localedata/locales/ur_PK (abmon): Same changes as mon applied.
    
    	* localedata/locales/ps_AF (mon): May reworded "می" to "مۍ".
    
    	[BZ #17225]
    	* localedata/locales/ar_SY (abmon): May reworded "نوار" to
    	"أيار", this closes bug 17225.
    	* localedata/locales/ar_JO (abmon): Likewise.
    	* localedata/locales/ar_LB (abmon): Likewise.

-----------------------------------------------------------------------

Summary of changes:
 localedata/ChangeLog     |   34 ++++++++++++++++++++++++++++++++++
 localedata/locales/ar_DZ |   18 ++++++++++++------
 localedata/locales/ar_IQ |   18 ++++++++++++------
 localedata/locales/ar_JO |    2 +-
 localedata/locales/ar_LB |    2 +-
 localedata/locales/ar_MA |   18 ++++++++++++------
 localedata/locales/ar_SA |   24 ++++++++++++------------
 localedata/locales/ar_SY |    2 +-
 localedata/locales/ar_TN |   18 ++++++++++++------
 localedata/locales/ks_IN |   20 ++++++++++----------
 localedata/locales/ps_AF |   24 ++++++++++++------------
 localedata/locales/ug_CN |   24 ++++++++++++------------
 localedata/locales/ur_IN |    2 +-
 localedata/locales/ur_PK |   12 ++++++------
 14 files changed, 138 insertions(+), 80 deletions(-)
Comment 9 Mike FABIAN 2017-07-06 14:01:21 UTC
Fixed in glibc master by the recent patches by Rafal Luzynski <digitalfreak@lingonborough.com>.