This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug localedata/17588] Update UTF-8 charmap and width to Unicode 7.0.0

From: "maiku.fabian at gmail dot com" <sourceware-bugzilla at sourceware dot org>
To: glibc-bugs at sourceware dot org
Date: Wed, 03 Dec 2014 07:17:25 +0000
Subject: [Bug localedata/17588] Update UTF-8 charmap and width to Unicode 7.0.0
Auto-submitted: auto-generated
References: <bug-17588-131 at http dot sourceware dot org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=17588

--- Comment #9 from Mike FABIAN <maiku.fabian at gmail dot com> ---
I built glibc with the patch from comment#8.

I produces some FAILs in âmake checkâ:

    FAIL: localedata/cs_CZ.UTF-8/LC_CTYPE
    ... similar FAILs ...

Shortly after starting âmake checkâ one sees:

    ./charmaps/UTF-8:42734: unknown character `U00009FCD'
    ... similar messages ...

All the above problems are cause by ranges of reserved code points
which are listed in EastAsianWidth.txt like this:

    9FCD..9FFF;W     # Cn    [51] <reserved-9FCD>..<reserved-9FFF>

and these code points are not in UnicodeData.txt.

Therefore, they are not generated into the CHARMAP section
of glibcâs UTF-8 file and it causes the above problems if they
are generated into the WIDTH section of glibcâs  UTF-8 file.

This can be fixed by not generating reserved code points into
the WIDTH section, i.e. by ignoring the  reserved  code points
mentioned in EastAsianWidth.txt. Patch for utf8-gen.py:

diff --git a/utf8-gen.py b/utf8-gen.py
index 57875b6..20b68bb 100755
--- a/utf8-gen.py
+++ b/utf8-gen.py
@@ -218,6 +218,8 @@ if __name__ == "__main__":
         write_comments(outfile, 1)
         elines = []
         for line in easta_file.readlines():
+                if re.match(r'.*<reserved-.+>\.\.<reserved-.+>.*', line):
+                        continue
                 if re.match(r'^[^;]*;[WF]', line):
                         elines.append(line.strip())
         process_width(outfile, flines, elines)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]