This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] [BZ 14094] Update locale data to Unicode 7.0.0
- From: Pravin Satpute <psatpute at redhat dot com>
- To: "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: libc-alpha at sourceware dot org, "Carlos O'Donell" <carlos at redhat dot com>
- Date: Mon, 23 Jun 2014 04:54:36 -0400 (EDT)
- Subject: Re: [PATCH] [BZ 14094] Update locale data to Unicode 7.0.0
- Authentication-results: sourceware.org; auth=none
- References: <53A5DCA3 dot 4010108 at redhat dot com> <Pine dot LNX dot 4 dot 64 dot 1406212027560 dot 29257 at digraph dot polyomino dot org dot uk>
>----- Original Message -----
>From: "Joseph S. Myers" <email@example.com>
>To: "Pravin Satpute" <firstname.lastname@example.org>
>Cc: email@example.com, "Carlos O'Donell" <firstname.lastname@example.org>
>Sent: Sunday, June 22, 2014 2:34:30 AM
>Subject: Re: [PATCH] [BZ 14094] Update locale data to Unicode 7.0.0
>> A. Process for updating locales/i18n ctype with new Unicode release is
>> documented @ , I think it should get added either in WIKI, or docs
>> folder of glibc.
>The process should ideally be running a single command - no manual editing
>at all. (That command might be a script that wraps some other commands.)
>If tempted to write instructions for running a sequence of commands and
>editing the result, writing a script to automate that is better.
Agree. I will improve it with some more automation.
Six characters which i have added manually, i am still not sure why those are present in i18n and from where those came.
I will do some more analysis on same and see if we can simply get rid of those.
>> Report/Analysis for backward compatibility is available AT
>> backward-compatibility5_1-to-7_0 
>That report is a very useful starting point, but doesn't seem to explain
>things at the human level. What changes have there been to previously
>supported characters, and why, in terms of Unicode character properties,
>are those changes correct changes? Maybe something more verbose that
>names the characters individually and states what the old ctype
>information was, and what the new information is, and what the relevant
>Unicode proeprties are that explain the new information, would help.
This report is analysis done by me on check-backcompatibility.py report.
Yes, i can provide more information there. I am sure next update to ctype
will not require that's long analysis :)
>You're changing how upper/lower/alpha properties are generated. Does that
>fix bug 14010? If so, you can include [BZ #14010] in your ChangeLog
Yes, its does fixes 14010 issues as well. Will add this.
>Does it obsolete the special cases in
>gen-unicode-ctype.c:is_alpha? If so, you should remove the parts of
>gen-unicode-ctype.c that are no longer used. You should also confirm that
>each of the special cases there is properly handled by the new logic - or
>state explicitly that the handling of certain identified characters with
>special cases is being deliberately changed, because the Unicode
>properties for those characters are better than the special-case handling.
Yes. DerivedCoreProperties.txt better handling special cases. Its Alphabetic derived from
"Uppercase + Lowercase + Lt + Lm + Lo + Nl + Other_Alphabetic"
Sure. Will modify gen-unicode-ctype.c to not generate classes for alpha, upper and lower.
>> -#define __STDC_ISO_10646__ 201103L
>> + Unicode 6.0.
>> + Unicode 7.0.0 Published on 2014 June 16 */
>> +#define __STDC_ISO_10646__ 201406L
>Now, the most recent published amendment is amendment 1 from 2013-04-15
>(Linear A, Palmyrene, Manichaean, Khojki, Khudawadi, Bassa Vah, Duployan,
>and other characters). WG2 N4566 states an intent for Unicode 7.0 to
>synchronize with amendment 2 to the 2012 edition of ISO/IEC 10646.
>However, I can't locate a proposed publication date for that amendment (or
>for the 2014 edition of ISO/IEC 10646 - and work appears to be underway on
>amendments 1 and 2 to the 2014 edition, even before it's published). So
>maybe put 201304L there until such an amendment is published.
Thank you for this. I was not getting proper date.
>> diff --git a/scripts/check-backcompatibility.py b/scripts/check-backcompatibility.py
>> +++ b/scripts/check-backcompatibility.py
>I think in scripts/ the name should be more specific about *what* is
>having compatibility checked - scripts/ is for all of glibc, not just
Might be ctype-backcompatibility.py will be good.
>> +# Copyright (C) 2013-14, Pravin Satpute <email@example.com>
>glibc contributions should be assigned to the FSF (and miscellaneous
>programs would normally by GPLv2+ / LGPLv2.1+ unless there is some reason
>to deviate from the norm for such programs in glibc).
Will update this.
Thanks you for analysis. I will soon submit improved patch. :)