[Bug localedata/14094] Update locale data to Unicode 7.0.0

Tue Oct 14 08:08:00 GMT 2014

https://sourceware.org/bugzilla/show_bug.cgi?id=14094

--- Comment #18 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Pravin S from comment #14)
> Created attachment 7715 [details]
> Patch to update UTF-8 CHARMAP and WIDTH to unicode 7.0
> 
> Done with all work with UTF-8 file. 
> Added two script:
> 1. utf8-gen.py to generate UTF-8 file
> 2. utf8-compatibility.py : to check backward compatibility of newly
> generated UTF-8 file
> 3. Report of new UTF-8 file backward compatibility is available AT
> https://raw.githubusercontent.com/pravins/glibc-i18n/master/report-utf8
> 
> Submitting to glibc-alpha, please help to quick review and push to git.

I checked the scripts Pravin used and the resulting UTF-8 file.

I found only one minor problem:

In some cases, both UnicodeData.txt and EastAsianWidth.txt have information
about width. For example, EastAsianWidth.txt has:

    302A..302D;W     # Mn     [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC
ENTERING TONE MARK

which gives us width 2 for these 4 characters (because of “W”) but
UnicodeData.txt has:

    302A;IDEOGRAPHIC LEVEL TONE MARK;Mn;218;NSM;;;;;N;;;;;
    302B;IDEOGRAPHIC RISING TONE MARK;Mn;228;NSM;;;;;N;;;;;
    302C;IDEOGRAPHIC DEPARTING TONE MARK;Mn;232;NSM;;;;;N;;;;;
    302D;IDEOGRAPHIC ENTERING TONE MARK;Mn;222;NSM;;;;;N;;;;;

which would give width 0 (because of “NSM”).

I changed Pravin’s script a bit to prefer the information from
EastAsianWidth.txt in case of conflicts.

Pravin has already merged my change into his git repository.

-- 
You are receiving this mail because:
You are on the CC list for the bug.