This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [BZ 14094] Update locale data to Unicode 7.0.0

>----- Original Message -----
>From: "Joseph S. Myers" <>
>To: "Pravin Satpute" <>
>Cc:, "Carlos O'Donell" <>
>Sent: Sunday, June 22, 2014 2:34:30 AM
>Subject: Re: [PATCH] [BZ 14094] Update locale data to Unicode 7.0.0

>>  A.  Process for updating locales/i18n ctype with new Unicode release is
>> documented @ [1], I think it should get added either in WIKI, or docs
>> folder of glibc.

>The process should ideally be running a single command - no manual editing 
>at all.  (That command might be a script that wraps some other commands.)  
>If tempted to write instructions for running a sequence of commands and 
>editing the result, writing a script to automate that is better.

Agree. I will improve it with some more automation. 
Six characters which i have added manually, i am still not sure why those are present in i18n and from where those came.
I will do some more analysis on same and see if we can simply get rid of those.

>>      Report/Analysis for backward compatibility is available AT
>> backward-compatibility5_1-to-7_0 [3]

>That report is a very useful starting point, but doesn't seem to explain 
>things at the human level.  What changes have there been to previously 
>supported characters, and why, in terms of Unicode character properties, 
>are those changes correct changes?  Maybe something more verbose that 
>names the characters individually and states what the old ctype 
>information was, and what the new information is, and what the relevant 
>Unicode proeprties are that explain the new information, would help.

This report is analysis done by me on report. 
Yes, i can provide more information there. I am sure next update to ctype
will not require that's long analysis :)

>You're changing how upper/lower/alpha properties are generated.  Does that 
>fix bug 14010?  If so, you can include [BZ #14010] in your ChangeLog 

Yes, its does fixes 14010 issues as well. Will add this.

>Does it obsolete the special cases in 
>gen-unicode-ctype.c:is_alpha?  If so, you should remove the parts of 
>gen-unicode-ctype.c that are no longer used.  You should also confirm that 
>each of the special cases there is properly handled by the new logic - or 
>state explicitly that the handling of certain identified characters with 
>special cases is being deliberately changed, because the Unicode 
>properties for those characters are better than the special-case handling.

Yes. DerivedCoreProperties.txt better handling special cases. Its Alphabetic derived from 
"Uppercase + Lowercase + Lt + Lm + Lo + Nl + Other_Alphabetic"

Sure. Will modify gen-unicode-ctype.c to not generate classes for alpha, upper and lower.

>> -#define __STDC_ISO_10646__		201103L
>> +   Unicode 6.0.
>> +   Unicode 7.0.0 Published on 2014 June 16   */
>> +#define __STDC_ISO_10646__		201406L

>Now, the most recent published amendment is amendment 1 from 2013-04-15 
>(Linear A, Palmyrene, Manichaean, Khojki, Khudawadi, Bassa Vah, Duployan, 
>and other characters).  WG2 N4566 states an intent for Unicode 7.0 to 
>synchronize with amendment 2 to the 2012 edition of ISO/IEC 10646.  
>However, I can't locate a proposed publication date for that amendment (or 
>for the 2014 edition of ISO/IEC 10646 - and work appears to be underway on 
>amendments 1 and 2 to the 2014 edition, even before it's published).  So 
>maybe put 201304L there until such an amendment is published.

Thank you for this. I was not getting proper date.

>> diff --git a/scripts/ b/scripts/
>> +++ b/scripts/

>I think in scripts/ the name should be more specific about *what* is 
>having compatibility checked - scripts/ is for all of glibc, not just 
>locale data.

Might be will be good.

>> +# Copyright (C) 2013-14, Pravin Satpute <>

>glibc contributions should be assigned to the FSF (and miscellaneous 
>programs would normally by GPLv2+ / LGPLv2.1+ unless there is some reason 
>to deviate from the norm for such programs in glibc).

Will update this. 

Thanks you for analysis. I will soon submit improved patch. :)

Pravin Satpute

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]