[Bug localedata/24658] New: wcwidth inconsistencies with Unicode 12.1

rob.ross at ymail dot com sourceware-bugzilla@sourceware.org
Mon Jun 10 16:49:00 GMT 2019


https://sourceware.org/bugzilla/show_bug.cgi?id=24658

            Bug ID: 24658
           Summary: wcwidth inconsistencies with Unicode 12.1
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: rob.ross at ymail dot com
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

For "en_US.utf8", the 2019-06-10 trunk closely follows Unicode standard except
for U+3248 to U+324F (Circled numbers with Ambiguous [A] width) and U+4DC0 to
U+4DFF (Yijing hexagram symbols with Neutral [N] width) where wcwidth returns 2
instead of 1.  Those deviations were intentionally added to
"localedata/unicode-gen/utf8_gen.py" starting at line 262.  The rationale
starting at line 263 refers to
<http://www.unicode.org/mail-arch/unicode-ml/y2017-m08/0023.html> which only
applies to the first range and depends on the definition of "context".  The
interpretation that glibc is a context, regardless of locale, is likely not
what was intended.  In particular, UAX 11
(<http://www.unicode.org/reports/tr11/tr11-36.html>) makes it clear that the
"EastAsianWidth.txt" context is either "East Asian" or "non-East Asian".  It
also states that "narrow characters include N, Na, H, and A (when not in East
Asian context)."

This bug relates to 21750
(<https://sourceware.org/bugzilla/show_bug.cgi?id=21750>) item 5.  Part of the
rationale there for forcing a width of 2 was based on xterm's implementation
but xterm defaults to using wcwidth (unless you set mkWidth) so it's not very
convincing.  Another rationale was "glyphs for these characters are quadratic
in most fonts" which is a good point but lots of characters have this problem. 
Should there be wcwidth bugs for those characters?  Why should some ranges
receive special treatment?  The last rationale related to application
compatibility.  Changing widths to better track the Unicode database will break
old versions of applications, but programs are increasingly tracking that
database themselves so the problem will resolve itself.  A concrete example is
vim which needs its own table in order to function consistently on platforms
without wcwidth.  Egmont Koblinger provided good rationales for a width of 1
and I don't see why they were discounted.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libc-locales mailing list