This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug localedata/14094] Update locale data to Unicode 7.0.0
- From: "pravin.d.s at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Fri, 04 Jul 2014 09:13:23 +0000
- Subject: [Bug localedata/14094] Update locale data to Unicode 7.0.0
- Auto-submitted: auto-generated
- References: <bug-14094-131 at http dot sourceware dot org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #13 from Pravin S <pravin.d.s at gmail dot com> ---
Created attachment 7679
--> https://sourceware.org/bugzilla/attachment.cgi?id=7679&action=edit
Patch to update UTF-8 CHARMAP to unicode 7.0
I have worked on updating UTF-8 file to Unicode 7.0. Following are the
important points before review this patch.
1. Present patch is only for CHARMAP, patch for updating WIDTH will be
available soon.
2. utf8-gen.py: New script to generate UTF-8 file.
3. patch is created by ignoring space changes (-w)
4.
''' Where UnicodeData.txt file has given characters in range
Example:
3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
UTF-8 file mention these range by adding 0x3F inbetween First and
Last Unicode character.
Example:
<U3400>..<U343F> /xe3/x90/x80 <CJK Ideograph Extension A>
.
.
<U4D80>..<U4DB5> /xe4/xb6/x80 <CJK Ideograph Extension A>
* Note: No idea why Hangul syllable AC00; D7A3; were not expanded in
Unicode **
** 5.0 UTF-8. We are following consistency and expanding Hangul as
well.**
* '''
5. Name changes are in UnicodeData.txt in some cases.
''' Some characters have <control> as a name, so using "Unicode 1.0
Name"
Characters U+0080, U+0081, U+0084 and U+0099 has "<control>" as a
name and even no "Unicode 1.0 Name" (10th field) in UnicodeData.txt
We can write code to take there alternate name from NameAliases.txt '''
--
You are receiving this mail because:
You are on the CC list for the bug.