[Bug localedata/14094] Update locale data to Unicode 7.0.0
maiku.fabian at gmail dot com
sourceware-bugzilla@sourceware.org
Tue Oct 14 08:08:00 GMT 2014
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #18 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Pravin S from comment #14)
> Created attachment 7715 [details]
> Patch to update UTF-8 CHARMAP and WIDTH to unicode 7.0
>
> Done with all work with UTF-8 file.
> Added two script:
> 1. utf8-gen.py to generate UTF-8 file
> 2. utf8-compatibility.py : to check backward compatibility of newly
> generated UTF-8 file
> 3. Report of new UTF-8 file backward compatibility is available AT
> https://raw.githubusercontent.com/pravins/glibc-i18n/master/report-utf8
>
> Submitting to glibc-alpha, please help to quick review and push to git.
I checked the scripts Pravin used and the resulting UTF-8 file.
I found only one minor problem:
In some cases, both UnicodeData.txt and EastAsianWidth.txt have information
about width. For example, EastAsianWidth.txt has:
302A..302D;W # Mn [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC
ENTERING TONE MARK
which gives us width 2 for these 4 characters (because of “W”) but
UnicodeData.txt has:
302A;IDEOGRAPHIC LEVEL TONE MARK;Mn;218;NSM;;;;;N;;;;;
302B;IDEOGRAPHIC RISING TONE MARK;Mn;228;NSM;;;;;N;;;;;
302C;IDEOGRAPHIC DEPARTING TONE MARK;Mn;232;NSM;;;;;N;;;;;
302D;IDEOGRAPHIC ENTERING TONE MARK;Mn;222;NSM;;;;;N;;;;;
which would give width 0 (because of “NSM”).
I changed Pravin’s script a bit to prefer the information from
EastAsianWidth.txt in case of conflicts.
Pravin has already merged my change into his git repository.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Libc-locales
mailing list