This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug localedata/14094] Update locale data to Unicode 7.0.0

From: "pravin.d.s at gmail dot com" <sourceware-bugzilla at sourceware dot org>
To: glibc-bugs at sourceware dot org
Date: Fri, 04 Jul 2014 09:13:23 +0000
Subject: [Bug localedata/14094] Update locale data to Unicode 7.0.0
Auto-submitted: auto-generated
References: <bug-14094-131 at http dot sourceware dot org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=14094

--- Comment #13 from Pravin S <pravin.d.s at gmail dot com> ---
Created attachment 7679
  --> https://sourceware.org/bugzilla/attachment.cgi?id=7679&action=edit
Patch to update UTF-8 CHARMAP to unicode 7.0

 I have worked on updating UTF-8 file to Unicode 7.0. Following are the
important points before review this patch.

  1. Present patch is only for CHARMAP, patch for updating WIDTH will be
available soon.
  2. utf8-gen.py: New script to generate UTF-8 file.
  3. patch is created by ignoring space changes (-w)
  4.
   ''' Where UnicodeData.txt file has given characters in range
    Example:
    3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
    4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;

    UTF-8 file mention these range by adding 0x3F inbetween First and
Last Unicode character.
    Example:
    <U3400>..<U343F>     /xe3/x90/x80         <CJK Ideograph Extension A>
    .
    .
    <U4D80>..<U4DB5>     /xe4/xb6/x80         <CJK Ideograph Extension A>

*    Note: No idea why Hangul syllable AC00; D7A3; were not expanded in
Unicode **
**    5.0 UTF-8. We are following consistency and expanding Hangul as
well.**
*    '''

    5. Name changes are in UnicodeData.txt in some cases.
    ''' Some characters have <control> as a name, so using "Unicode 1.0
Name" 
     Characters U+0080, U+0081, U+0084 and U+0099 has "<control>" as a
name and even no "Unicode 1.0 Name" (10th field) in UnicodeData.txt
     We can write code to take there alternate name from NameAliases.txt '''

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]