This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: PATCH] [BZ 14094 13064] Update locale data to Unicode 7.0.0
- From: Pravin Satpute <psatpute at redhat dot com>
- To: libc-alpha at sourceware dot org
- Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, "Carlos O'Donell" <carlos at redhat dot com>
- Date: Thu, 17 Jul 2014 06:49:09 -0400 (EDT)
- Subject: Re: PATCH] [BZ 14094 13064] Update locale data to Unicode 7.0.0
- Authentication-results: sourceware.org; auth=none
- References: <53B65D65 dot 5050008 at redhat dot com> <53B6715C dot 5010903 at redhat dot com>
Hi All,
I further worked on patch and updated utf8-gen.py script to generate UTF-8(WIDTH) data as well.
Also added new script for checking backward compatibility of newly generated UTF-8 file.
New patch [1] is attached to bug [2] https://sourceware.org/bugzilla/attachment.cgi?id=7715
Report for backward compatibility is available at https://raw.githubusercontent.com/pravins/glibc-i18n/master/report-utf8
With this patch i have completed work for updating CTYPE and UTF-8 (CHARMAP and WIDTH) data to Unicode 7.0.0
Please help me to review and get into git repo.
Best Regards,
Pravin Satpute
1. https://sourceware.org/bugzilla/attachment.cgi?id=7715
2. https://sourceware.org/bugzilla/show_bug.cgi?id=14094
>
>
>----- Original Message -----
>From: "Pravin Satpute" <psatpute@redhat.com>
>To: libc-alpha@sourceware.org
>Sent: Friday, July 4, 2014 2:48:20 PM
>Subject: PATCH] [BZ 14094 13064] Update locale data to Unicode 7.0.0
>
>Hi,
>
> I have worked on updating UTF-8 file to Unicode 7.0.
> Patch size is around 1.2MB, looks like libc-alpha not allowing me post
>that size attachment.
> Attached patch [1] to bug [2].
>
> 1. Present patch is only for CHARMAP, patch for updating WIDTH will be
>available soon.
> 2. utf8-gen.py: New script to generate UTF-8 file.
> 3. patch is created by ignoring space changes (-w)
> 4.
> ''' Where UnicodeData.txt file has given characters in range
> Example:
> 3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
> 4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
>
> UTF-8 file mention these range by adding 0x3F inbetween First and
>Last Unicode character.
> Example:
> <U3400>..<U343F> /xe3/x90/x80 <CJK Ideograph Extension A>
> .
> .
> <U4D80>..<U4DB5> /xe4/xb6/x80 <CJK Ideograph Extension A>
>
>* Note: No idea why Hangul syllable AC00; D7A3; were not expanded in
>Unicode **
>** 5.0 UTF-8. We are following consistency and expanding Hangul as
>well.**
>* '''
>
> 5. Name changes are in UnicodeData.txt in some cases.
> ''' Some characters have <control> as a name, so using "Unicode 1.0
>Name"
> Characters U+0080, U+0081, U+0084 and U+0099 has "<control>" as a
> name and even no "Unicode 1.0 Name" (10th field) in UnicodeData.txt
> We can write code to take there alternate name from NameAliases.txt '''
>
> Let me know if any issues, doubt or improvement possible.
>
>Best Regards,
>Pravin Satpute
>
>1. https://sourceware.org/bugzilla/attachment.cgi?id=7679
>2. https://sourceware.org/bugzilla/show_bug.cgi?id=14094
>