This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH] [BZ 14094 13064] Update locale data to Unicode 7.0.0


Hi All,

 I further worked on patch and updated utf8-gen.py script to generate UTF-8(WIDTH) data as well.
Also added new script for checking backward compatibility of newly generated UTF-8 file.

 New patch [1] is attached to bug [2] https://sourceware.org/bugzilla/attachment.cgi?id=7715

 Report for backward compatibility is available at https://raw.githubusercontent.com/pravins/glibc-i18n/master/report-utf8

 With this patch i have completed work for updating CTYPE and UTF-8 (CHARMAP and WIDTH) data to Unicode 7.0.0

 Please help me to review and get into git repo.


Best Regards,
Pravin Satpute


1. https://sourceware.org/bugzilla/attachment.cgi?id=7715
2. https://sourceware.org/bugzilla/show_bug.cgi?id=14094


>
>
>----- Original Message -----
>From: "Pravin Satpute" <psatpute@redhat.com>
>To: libc-alpha@sourceware.org
>Sent: Friday, July 4, 2014 2:48:20 PM
>Subject: PATCH] [BZ 14094 13064] Update locale data to Unicode 7.0.0
>
>Hi,
>
>  I have worked on updating UTF-8 file to Unicode 7.0.
>  Patch size is around 1.2MB, looks like libc-alpha not allowing me post
>that size attachment.
>  Attached patch [1] to bug [2].
>
>  1. Present patch is only for CHARMAP, patch for updating WIDTH will be
>available soon.
>  2. utf8-gen.py: New script to generate UTF-8 file.
>  3. patch is created by ignoring space changes (-w)
>  4.
>   ''' Where UnicodeData.txt file has given characters in range
>    Example:
>    3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
>    4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
>
>    UTF-8 file mention these range by adding 0x3F inbetween First and
>Last Unicode character.
>    Example:
>    <U3400>..<U343F>     /xe3/x90/x80         <CJK Ideograph Extension A>
>    .
>    .
>    <U4D80>..<U4DB5>     /xe4/xb6/x80         <CJK Ideograph Extension A>
>
>*    Note: No idea why Hangul syllable AC00; D7A3; were not expanded in
>Unicode **
>**    5.0 UTF-8. We are following consistency and expanding Hangul as
>well.**
>*    '''
>
>    5. Name changes are in UnicodeData.txt in some cases.
>    ''' Some characters have <control> as a name, so using "Unicode 1.0
>Name"
>     Characters U+0080, U+0081, U+0084 and U+0099 has "<control>" as a
>     name and even no "Unicode 1.0 Name" (10th field) in UnicodeData.txt
>     We can write code to take there alternate name from NameAliases.txt '''
>
>    Let me know if any issues, doubt or improvement possible.
>
>Best Regards,
>Pravin Satpute
>
>1. https://sourceware.org/bugzilla/attachment.cgi?id=7679
>2. https://sourceware.org/bugzilla/show_bug.cgi?id=14094
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]