This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH 1/2] Add new locale: ckb_IQ (Kurdish/Sorani spoken in Iraq) [BZ #9809]

From: Rafal Luzynski <digitalfreak at lingonborough dot com>
To: Mike FABIAN <mfabian at redhat dot com>, libc-alpha at sourceware dot org
Date: Tue, 14 Jan 2020 22:52:22 +0100 (CET)
Subject: Re: [PATCH 1/2] Add new locale: ckb_IQ (Kurdish/Sorani spoken in Iraq) [BZ #9809]
References: <s9dmuaqie8i.fsf@taka.site>
Hi,

This is an incomplete review because it's always tricky with
a locale which we are completely unfamiliar with.  That means,
there may be more issues which I am unable to notice but otherwise
it is OK to push if you fix the issues I have found and consider
whether or not to apply my suggestions.

Also here I skip the issues which are fixed by the second patch.

14.01.2020 10:22 Mike FABIAN <mfabian@redhat.com> wrote:
> Add new locale: ckb_IQ (Kurdish/Sorani spoken in Iraq) [BZ #9809]

I like this subject (a short git log line) more than the original
one...

> Part 1 of the patch is what was submitted by the contributor.
> Part 2 are my fixes.
> 
> From 17786a4aa7d43a48c5187e33c328005716b2501d Mon Sep 17 00:00:00 2001
> From: Jwtiyar Nariman <jwtiyar@gmail.com>
> Date: Mon, 13 Jan 2020 10:06:06 +0100
> Subject: [PATCH 1/2] Add ckb_IQ locale

... more than this one.  Please consider which one looks better.

> ---
>  localedata/locales/ckb_IQ | 452 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 452 insertions(+)
>  create mode 100644 localedata/locales/ckb_IQ
> 
> diff --git a/localedata/locales/ckb_IQ b/localedata/locales/ckb_IQ
> new file mode 100644
> index 0000000000..08574db72e
> --- /dev/null
> +++ b/localedata/locales/ckb_IQ
> @@ -0,0 +1,452 @@
> +comment_char %
> +escape_char /
> +
> +% This file is part of the GNU C Library and contains locale data.
> +% The Free Software Foundation does not claim any copyright interest
> +% in the locale data contained in this file.  The foregoing does not
> +% affect the license of the GNU C Library as a whole.  It does not
> +% exempt you from the conditions of the license if your use would
> +% otherwise be governed by that license.
> +
> +
> +% FOR ALL KURDISH DIALECTS USING ARABIC ALPHABETS
> +% Central Kurdish language locale for Iraq (using arabic letters):

^arabic^Arabic

> +% Contributed by Aras Noori <aras.noori@gmail.com>
> +% Filename: ckb_iq
> +% Locale name: ckb_iq.UTF-8
> +% Language: Central Kurdish (Sorani)
> +% Language abbrivation: KU-AR(Kurdish - Arabic letters)

^abbrivation^abbreviation

also add a space: "KU-AR(Kurdish" -> "KU-AR (Kurdish"

> +% Charset: UTF-8
> +% Creation Date: 2009-11-03
> +% History:
> +% January 2009: Defining CKB locale
> +% March 2009: Adding rule for CKB
> +% October 2009: bug fixing and redefine
> +% April 2010 fixing "not found category" problems
> +% March 2011 fixing all bugs
> +% Sept 2014: fixing alphabet and unicode bugs

^unicode^Unicode

> +% Oct 2014: fixing Address and fmt_name issues

What is "Address" in this context?  If it refers to a field or
a section then please use that field or section identifier.
If it's about an address in general then please use lowercase
"address".

What about using consistent month names: all full (September,
October) or all abbreviated (Jan, Mar, Oct,...)

I don't understand the format of this comment but as this is
just a comment we can put anything here and it does not change
the output.

> +
> +LC_IDENTIFICATION
> +title	   "Central Kurdish language locale for Iraq"
> +source	   "Aras Noori, Jwtiyar Nariman"
> +address    "see e-mail"
> +contact    "Aras Noori, Jwtiyar Nariman"
> +email	   "aras.noori@gmail.com, Jwtiyar@gmail.com"
> +tel	 	   ""
> +fax		   ""
> +language   "Central Kurdish"
> +territory  "Iraq"
> +revision   "0.9"
> +date	   "2020-01-05"
> +
> +category "i18n:2012";LC_IDENTIFICATION
> +category "i18n:2012";LC_CTYPE
> +category "i18n:2012";LC_COLLATE
> +category "i18n:2012";LC_TIME
> +category "i18n:2012";LC_NUMERIC
> +category "i18n:2012";LC_MONETARY
> +category "i18n:2012";LC_MESSAGES
> +category "i18n:2012";LC_PAPER
> +category "i18n:2012";LC_NAME
> +category "i18n:2012";LC_ADDRESS
> +category "i18n:2012";LC_TELEPHONE
> +category "i18n:2012";LC_MEASUREMENT
> +
> +END LC_IDENTIFICATION
> +
> +LC_CTYPE
> +copy "i18n"
> +END LC_CTYPE
> +

OK so far.

> +LC_COLLATE
> +% The Kurdish Sorani, Bahdini, and others dialects is mainly written
> using a modified (Arabic-based alphabet) with 33 letters. 

In the second patch you have removed all trailing spaces but there is
one more in the line above.

> +% Unlike the regular Arabic alphabet, which is an abjad, kurdish is an
> alphabet in which vowels are mandatory, making the script easy to read.

^kurdish^Kurdish

Please rewrap this comment so the lines are shorter, below 80 chars,
preferably about 72 chars.

> +%
> +% The kurdish alphabet order is: 

^kurdish^Kurdish

The second patch correctly removes the trailing space.

> +% in Latin: a, b, c, ç, d, e, ê, f, g, h, i, î, j, k, l, ll, m, n, o, p,
> q, r, rr, s, sh, t, u, uu, v, w, x, y, z

Any chance to rewrap this?  No problem if it is impossible here.

> +% vowels: A, E, I, O, U, UU
> +%
> +% Copy the template from ISO/IEC 14651

[ cut the part which is removed by the second patch ]

> +END LC_COLLATE
> +
> +LC_MONETARY

[ cut the part which is removed by the second patch ]

> +int_frac_digits       3
> +frac_digits	      3
> +p_cs_precedes	      1
> +p_sep_by_space	      1
> +n_cs_precedes	      1
> +n_sep_by_space	      1
> +p_sign_posn	      1
> +n_sign_posn	      2
> +
> +END LC_MONETARY

Probably OK, I am unable to verify.

> +
> +
> +LC_NUMERIC

[ cut the part which is removed by the second patch ]

> +grouping	       3
> +END LC_NUMERIC

Probably OK, I am unable to verify.

> +
> +
> +LC_TIME
> +% These are generated based on XML base Locale difintion file
> +%
> +% Abbreviated weekday names

[ cut the part which is removed by the second patch ]

Again, I am unable to verify the Arabic weekday and month names.

> +%
> +% Equivalent of AM PM
> +am_pm	    "<U067E><U0646>";"<U062F><U0646>"

Probably OK.

> +%
> +% Appropriate date and time representation
> +d_t_fmt     "%d %b %Y, %Z %I:%M:%S %p"

As a resolution of bug 24054 [1] I removed "%Z" from almost all d_t_fmt
in order to make the locales consistent with the builtin C locale.

Also the date formats usually contain "%a".  I can see that d_fmt
in this file uses "%A" instead.  I don't know why this is better here
but for the consistency please consider this format:

"%A %d %b %Y, %I:%M:%S %p"

> +%
> +% Appropriate date representation(%x)
> +d_fmt	 "%A %d %b %Y"

As I said, I don't understand why "%A" is better than "%a" but maybe
there are good reasons so OK.

> +%
> +% Appropriate time representation
> +t_fmt	 "%Z %I:%M:%S %p"
> +%
> +% Appropriate 12 h time representation
> +t_fmt_ampm  "%I:%M:%S %p"
> +%

OK

> +% Appropriate date representation 
> +date_fmt	"%A %d %B %Y"

It's not specified what date_fmt should contain but usually it contains
not just the date but also the time and the time zone.  Please consider:

"%A %d %B %Y, %Z %I:%M:%S %p"

even although I still don't understand why "%A" is better than "%a"
and (new thing here) why "%B" is better than "%b".

> +
> +week 7;19971130;1
> +first_weekday 7
> +first_workday	2
> +cal_direction	3
> +END LC_TIME
> +
> +

I am unable to verify but probably OK.

> +LC_MESSAGES

[ cut the part which is removed by the second patch ]

> +yesstr   "<U0628><U06D5><U06B5><U06CE>"
> +nostr    "<U0646><U06D5><U062E><U06CE><U0631>"
> +END LC_MESSAGES

I am unable to verify but probably OK.

> +
> +
> +LC_PAPER
> +copy "ar_IQ"
> +END LC_PAPER

Looks reasonable.

> +
> +
> +LC_NAME
> +name_fmt    "%d%t%g%t%m%t%f"
> +name_gen    "<U0628><U06D5><U0695><U06CE><U0632>"
> +name_miss   "<U062E><U0627><U062A><U0648>"
> +name_ms     "<U062E><U0627><U062A><U0648>"
> +name_mrs    "<U062E><U0627><U062A><U0648>"
> +name_mr     "<U0643><U0627><U0643>"
> +END LC_NAME

I am unable to verify but probably OK.

> +%
> +LC_ADDRESS
> +postal_fmt  "%z%c%T%s%b%e%r"
> +country_name "Iraq"
> +country_ab2 "<U0049><U0051>"
> +country_ab3 "<U0049><U0052><U0051>"
> +country_post "<U0049><U0052><U0051>"
> +country_num 368
> +country_car "<U0049><U0051>"
> +%
> +END LC_ADDRESS
> +
> +
> +LC_TELEPHONE
> +%
> +tel_int_fmt "+%c %a%t%l"
> +int_select     "<U0030><U0030>"  % 00
> +int_prefix  "<U0039><U0036><U0034>" %964
> +
> +END LC_TELEPHONE

This is already fixed by the second patch.

> +
> +
> +LC_MEASUREMENT
> +copy "ar_IQ"
> +END LC_MEASUREMENT
> -- 
> 2.24.1
> 

Looks reasonable.

Regards,

Rafal


[1] https://sourceware.org/bugzilla/show_bug.cgi?id=24054
References:
- [PATCH 1/2] Add new locale: ckb_IQ (Kurdish/Sorani spoken in Iraq) [BZ #9809]
  - From: Mike FABIAN
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]