Summary: | charmaps/UTF-8: incorrect wcwidth for U+3099 and U+309A | ||
---|---|---|---|
Product: | glibc | Reporter: | Egmont Koblinger <egmont> |
Component: | localedata | Assignee: | Mike FABIAN <maiku.fabian> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | aoliva, libc-locales, maiku.fabian, mfabian, tg |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | 2.23 | ||
Target Milestone: | 2.27 | ||
See Also: |
https://sourceware.org/bugzilla/show_bug.cgi?id=14094 https://sourceware.org/bugzilla/show_bug.cgi?id=19919 https://sourceware.org/bugzilla/show_bug.cgi?id=4335 |
||
Host: | Target: | ||
Build: | Last reconfirmed: |
Description
Egmont Koblinger
2016-03-22 09:12:18 UTC
Forwarding VTE maintainer's observation here: The bug was introduced in glibc commit 4a4839c94a4c93ffc0d5b95c69a08b02a57007f2. It's due to a bug in the unicode generation scripts, see https://sourceware.org/bugzilla/show_bug.cgi?id=14094#c18 where the problem was mentioned but the wrong choice made; the script needs to be smarter. isn't the issue fundamentally that the official unicode's data is wrong ? so once this is fixed in unicode.org, glibc will roll the fix automatically ? they have a form for it: http://unicode.org/reporting.html I cannot tell if it's a bug or an unfortunate design in Unicode database, sorry. At least, even if it's a Unicode bug, glibc used to contain a workaround for this bug which was accidentally removed and probably should be restored for the time being. i think we should get this clarified/documented before we continue to stumble blindly hoping for the best :) seems like bug 4335 is also related ... (In reply to Mike Frysinger from comment #4) > seems like bug 4335 is also related ... Not too much, I think. That one is about defining locales where ambiguous width characters take up 2 cells instead of 1. This one is about the width of combining accents themselves that are intended to be applied on top of double wide (not ambiguous but clearly double wide) characters. I’ve filed https://sourceware.org/bugzilla/show_bug.cgi?id=21750 noting _all_ differences from Markus Kuhn’s xterm code (updated for Unicode 10) to the current glibc localedata. For this particular problem, the fix is easy (interestingly enough, I had a similar bug in MirBSD when redoing the wcwidth code): read EastAsianWidth before, not after, UnicodeData, so the NSM bidi class overrides the width set by the former. This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via bb6274ee1293a6bc76d9d7c889783303de181295 (commit) via c14b84baae83bfb73f7cd00ba7c24964ad1c712c (commit) via 7a79e321c6f85b204036c33d85f6b2aa794e7c76 (commit) via 267ee5d7ab57591a6b1bc2d2a010c88188427063 (commit) via 41b6f0ce85d98c62739b04863e8c38a1f4154e80 (commit) via 580be3035d2e0f479c4ac955bf719b0bf936f5cf (commit) from 038d1cafafb3094a9fbebd35f4aa8d0ebae0e55b (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bb6274ee1293a6bc76d9d7c889783303de181295 commit bb6274ee1293a6bc76d9d7c889783303de181295 Author: Akhilesh Kumar <akhilesh.k@samsung.com> Date: Wed Aug 16 15:33:58 2017 +0530 Fix abmon for bem_ZM Until now the abbreviated month names were in English. [BZ #21960] * locales/bem_ZM (LC_TIME): Fix abmon, make it agree with CLDR. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c14b84baae83bfb73f7cd00ba7c24964ad1c712c commit c14b84baae83bfb73f7cd00ba7c24964ad1c712c Author: Akhilesh Kumar <akhilesh.k@samsung.com> Date: Wed Aug 16 18:01:53 2017 +0530 Fix country name for xh_ZA [BZ #21959] * locales/xh_ZA (LC_ADDRESS): Fix country name. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7a79e321c6f85b204036c33d85f6b2aa794e7c76 commit 7a79e321c6f85b204036c33d85f6b2aa794e7c76 Author: Thorsten Glaser <tg@mirbsd.de> Date: Fri Jul 14 14:02:50 2017 +0200 Refresh generated charmap data and ChangeLog [BZ #21750] * charmaps/UTF-8: Refresh. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=267ee5d7ab57591a6b1bc2d2a010c88188427063 commit 267ee5d7ab57591a6b1bc2d2a010c88188427063 Author: Thorsten Glaser <tg@mirbsd.de> Date: Fri Jul 14 14:02:46 2017 +0200 Resolve some historically special cases of ambiguous width [BZ #21750] * unicode-gen/utf8_gen.py (U+00AD): Set width to 1. * unicode-gen/utf8_gen.py (U+1160..U+11FF): Set width to 0. * unicode-gen/utf8_gen.py (U+3248..U+324F): Set width to 2. * unicode-gen/utf8_gen.py (U+4DC0..U+4DFF): Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=41b6f0ce85d98c62739b04863e8c38a1f4154e80 commit 41b6f0ce85d98c62739b04863e8c38a1f4154e80 Author: Thorsten Glaser <tg@mirbsd.de> Date: Fri Jul 14 14:02:44 2017 +0200 Handle more cases of combining characters [BZ #21750] * unicode-gen/utf8_gen.py: Treat category Me and Mn as combining. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=580be3035d2e0f479c4ac955bf719b0bf936f5cf commit 580be3035d2e0f479c4ac955bf719b0bf936f5cf Author: Thorsten Glaser <tg@mirbsd.de> Date: Fri Jul 14 14:02:37 2017 +0200 UnicodeData has precedence over EastAsianWidth [BZ #19852] [BZ #21750] * unicode-gen/utf8_gen.py: Process EastAsianWidth lines before UnicodeData lines so the latter have precedence; remove hack to group output by EastAsianWidth ranges. ----------------------------------------------------------------------- Summary of changes: localedata/ChangeLog | 24 + localedata/charmaps/UTF-8 |111468 +++++++++++++++++++++++++++++++++++- localedata/locales/bem_ZM | 25 +- localedata/locales/xh_ZA | 5 +- localedata/unicode-gen/utf8_gen.py | 38 +- 5 files changed, 111400 insertions(+), 160 deletions(-) FIXED thanks to Thorsten Glaser. |