Summary: | charmaps/UTF-8: EastAsianAmbiguous character width is always 1 | ||
---|---|---|---|
Product: | glibc | Reporter: | VDR dai (bugzilla) <d+bugzilla> |
Component: | localedata | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED WONTFIX | ||
Severity: | normal | CC: | glibc-bugs, libc-locales, maiku.fabian |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | unspecified | ||
Target Milestone: | --- | ||
See Also: | https://sourceware.org/bugzilla/show_bug.cgi?id=19852 | ||
Host: | Target: | ||
Build: | Last reconfirmed: |
Description
VDR dai (bugzilla)
2007-04-08 13:18:00 UTC
The "character width" is mostly useful when dealing with cell-based terminal emulators. IMO it makes no sense to make such a change in glibc (i.e. to create an alternative charmap UTF-8-CJK and to build locales like ja_JP.UTF-8 against it) in isolation. What needs to be considered is the majority of the terminal emulators; see for example the list at http://packages.debian.org/stable/virtual/x-terminal-emulator If you change the most important among these terminal emulators to choose their font configuration according to the locale, in such a way that in CJK locales the Ambiguous Width characters have width 2, and in other locales they have width 1, _then_ IMO the change makes also sense in glibc. I created UTF-8-CJK (EastAsianAmbiguous character width 2) and built ja_JP.UTF-8 against it. Then, I test terminal emulators; debian's x-terminal-emulator list. Terminal Emulators that be able to handle UTF-8 works well and chooses font correctly. (I leave terminal emulators that be unable to handle UTF-8 out of consideration) works well: gnome-terminal konsole mlterm (mlterm-tiny) rxvt (rxvt-ml) rxvt-beta rxvt-unicode (rxvt-unicode-ml, rxvt-unicode-lite) tilda xfce4-terminal xterm does not handle UTF-8: aterm (aterm-ml) eterm kterm mrxvt (mrxvt-cjk, mrxvt-mini) multi-gnome-terminal wterm (wterm-ml) does not handle ja_JP.eucJP: hanterm-xf powershell pterm terminal.app xvt Any progress? It is still present in glibc 2.7 (Debian). % /lib/libc.so.6 GNU C Library stable release version 2.7, by Roland McGrath et al. Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 4.2.3 20071123 (prerelease) (Debian 4.2.2-4). Compiled on a Linux >>2.6.22.12<< system on 2007-11-26. Available extensions: crypt add-on version 2.1 by Michael Glad and others GNU Libidn by Simon Josefsson Native POSIX Threads Library by Ulrich Drepper et al BIND-8.2.3-T5B For bug reporting instructions, please see: <http://www.gnu.org/software/libc/bugs.html>. % cat test.c #include <stdio.h> #include <locale.h> #define __USE_XOPEN #include <wchar.h> int main( void ) { wchar_t i; wchar_t euc, utf8; for( i = 0x00; i < 0x100; i++ ) { setlocale( LC_CTYPE, "ja_JP.eucJP" ); euc = wcwidth( i ); setlocale( LC_CTYPE, "ja_JP.UTF-8" ); utf8 = wcwidth( i ); if( euc > 0 && euc != utf8 ) { fprintf( stdout, "%02x : %d : %d : [%c]\n", i, euc, utf8, i ); } } return 0; } Using default UTF-8 locale: % ./a.out a1 : 2 : 1 : [�¢Â] a2 : 2 : 1 : [¡ñ] a3 : 2 : 1 : [¡ò] a4 : 2 : 1 : [�¢ð] a6 : 2 : 1 : [üü] a7 : 2 : 1 : [¡ø] a8 : 2 : 1 : [¡¯] a9 : 2 : 1 : [�¢í] aa : 2 : 1 : [�¢ì] ac : 2 : 1 : [¢Ì] ae : 2 : 1 : [�¢î] af : 2 : 1 : [�¢´] b0 : 2 : 1 : [¡ë] b1 : 2 : 1 : [¡Þ] b4 : 2 : 1 : [¡] b6 : 2 : 1 : [¢ù] b8 : 2 : 1 : [�¢±] ba : 2 : 1 : [�¢ë] bf : 2 : 1 : [�¢Ä] c0 : 2 : 1 : [�ª¢] c1 : 2 : 1 : [�ª¡] c2 : 2 : 1 : [�ª¤] c3 : 2 : 1 : [�ªª] c4 : 2 : 1 : [�ª£] c5 : 2 : 1 : [�ª©] c6 : 2 : 1 : [�©¡] c7 : 2 : 1 : [�ª®] c8 : 2 : 1 : [�ª²] c9 : 2 : 1 : [�ª±] ca : 2 : 1 : [�ª´] cb : 2 : 1 : [�ª³] cc : 2 : 1 : [�ªÀ] cd : 2 : 1 : [�ª¿] ce : 2 : 1 : [�ªÂ] cf : 2 : 1 : [�ªÁ] d1 : 2 : 1 : [�ªÐ] d2 : 2 : 1 : [�ªÒ] d3 : 2 : 1 : [�ªÑ] d4 : 2 : 1 : [�ªÔ] d5 : 2 : 1 : [�ªØ] d6 : 2 : 1 : [�ªÓ] d7 : 2 : 1 : [¡ß] d8 : 2 : 1 : [�©¬] d9 : 2 : 1 : [�ªã] da : 2 : 1 : [�ªâ] db : 2 : 1 : [�ªå] dc : 2 : 1 : [�ªä] dd : 2 : 1 : [�ªò] de : 2 : 1 : [�©°] df : 2 : 1 : [�©Î] e0 : 2 : 1 : [�«¢] e1 : 2 : 1 : [�«¡] e2 : 2 : 1 : [�«¤] e3 : 2 : 1 : [�«ª] e4 : 2 : 1 : [�«£] e5 : 2 : 1 : [�«©] e6 : 2 : 1 : [�©Á] e7 : 2 : 1 : [�«®] e8 : 2 : 1 : [�«²] e9 : 2 : 1 : [�«±] ea : 2 : 1 : [�«´] eb : 2 : 1 : [�«³] ec : 2 : 1 : [�«À] ed : 2 : 1 : [�«¿] ee : 2 : 1 : [�«Â] ef : 2 : 1 : [�«Á] f0 : 2 : 1 : [�©Ã] f1 : 2 : 1 : [�«Ð] f2 : 2 : 1 : [�«Ò] f3 : 2 : 1 : [�«Ñ] f4 : 2 : 1 : [�«Ô] f5 : 2 : 1 : [�«Ø] f6 : 2 : 1 : [�«Ó] f7 : 2 : 1 : [¡à] f8 : 2 : 1 : [�©Ì] f9 : 2 : 1 : [�«ã] fa : 2 : 1 : [�«â] fb : 2 : 1 : [�«å] fc : 2 : 1 : [�«ä] fd : 2 : 1 : [�«ò] fe : 2 : 1 : [�©Ð] ff : 2 : 1 : [�«ó] Using modified (EastAsianAmbiguous character width == 2, according to EastAsianWidth-5.0.0.txt) UTF-8 locale: % ./a.out a2 : 2 : 1 : [¡ñ] a3 : 2 : 1 : [¡ò] a6 : 2 : 1 : [üü] a9 : 2 : 1 : [�¢í] ac : 2 : 1 : [¢Ì] af : 2 : 1 : [�¢´] c0 : 2 : 1 : [�ª¢] c1 : 2 : 1 : [�ª¡] c2 : 2 : 1 : [�ª¤] c3 : 2 : 1 : [�ªª] c4 : 2 : 1 : [�ª£] c5 : 2 : 1 : [�ª©] c7 : 2 : 1 : [�ª®] c8 : 2 : 1 : [�ª²] c9 : 2 : 1 : [�ª±] ca : 2 : 1 : [�ª´] cb : 2 : 1 : [�ª³] cc : 2 : 1 : [�ªÀ] cd : 2 : 1 : [�ª¿] ce : 2 : 1 : [�ªÂ] cf : 2 : 1 : [�ªÁ] d1 : 2 : 1 : [�ªÐ] d2 : 2 : 1 : [�ªÒ] d3 : 2 : 1 : [�ªÑ] d4 : 2 : 1 : [�ªÔ] d5 : 2 : 1 : [�ªØ] d6 : 2 : 1 : [�ªÓ] d9 : 2 : 1 : [�ªã] da : 2 : 1 : [�ªâ] db : 2 : 1 : [�ªå] dc : 2 : 1 : [�ªä] dd : 2 : 1 : [�ªò] e2 : 2 : 1 : [�«¤] e3 : 2 : 1 : [�«ª] e4 : 2 : 1 : [�«£] e5 : 2 : 1 : [�«©] e7 : 2 : 1 : [�«®] eb : 2 : 1 : [�«³] ee : 2 : 1 : [�«Â] ef : 2 : 1 : [�«Á] f1 : 2 : 1 : [�«Ð] f4 : 2 : 1 : [�«Ô] f5 : 2 : 1 : [�«Ø] f6 : 2 : 1 : [�«Ó] fb : 2 : 1 : [�«å] fd : 2 : 1 : [�«ò] ff : 2 : 1 : [�«ó] % diff -u utf8-cjk-default utf8-cjk-modified --- utf8-cjk-default 2007-11-28 01:03:07.000000000 +0900 +++ utf8-cjk-modified 2007-11-28 01:02:55.000000000 +0900 @@ -1,29 +1,15 @@ -a1 : 2 : 1 : [�¢Â] a2 : 2 : 1 : [¡ñ] a3 : 2 : 1 : [¡ò] -a4 : 2 : 1 : [�¢ð] a6 : 2 : 1 : [üü] -a7 : 2 : 1 : [¡ø] -a8 : 2 : 1 : [¡¯] a9 : 2 : 1 : [�¢í] -aa : 2 : 1 : [�¢ì] ac : 2 : 1 : [¢Ì] -ae : 2 : 1 : [�¢î] af : 2 : 1 : [�¢´] -b0 : 2 : 1 : [¡ë] -b1 : 2 : 1 : [¡Þ] -b4 : 2 : 1 : [¡] -b6 : 2 : 1 : [¢ù] -b8 : 2 : 1 : [�¢±] -ba : 2 : 1 : [�¢ë] -bf : 2 : 1 : [�¢Ä] c0 : 2 : 1 : [�ª¢] c1 : 2 : 1 : [�ª¡] c2 : 2 : 1 : [�ª¤] c3 : 2 : 1 : [�ªª] c4 : 2 : 1 : [�ª£] c5 : 2 : 1 : [�ª©] -c6 : 2 : 1 : [�©¡] c7 : 2 : 1 : [�ª®] c8 : 2 : 1 : [�ª²] c9 : 2 : 1 : [�ª±] @@ -39,44 +25,23 @@ d4 : 2 : 1 : [�ªÔ] d5 : 2 : 1 : [�ªØ] d6 : 2 : 1 : [�ªÓ] -d7 : 2 : 1 : [¡ß] -d8 : 2 : 1 : [�©¬] d9 : 2 : 1 : [�ªã] da : 2 : 1 : [�ªâ] db : 2 : 1 : [�ªå] dc : 2 : 1 : [�ªä] dd : 2 : 1 : [�ªò] -de : 2 : 1 : [�©°] -df : 2 : 1 : [�©Î] -e0 : 2 : 1 : [�«¢] -e1 : 2 : 1 : [�«¡] e2 : 2 : 1 : [�«¤] e3 : 2 : 1 : [�«ª] e4 : 2 : 1 : [�«£] e5 : 2 : 1 : [�«©] -e6 : 2 : 1 : [�©Á] e7 : 2 : 1 : [�«®] -e8 : 2 : 1 : [�«²] -e9 : 2 : 1 : [�«±] -ea : 2 : 1 : [�«´] eb : 2 : 1 : [�«³] -ec : 2 : 1 : [�«À] -ed : 2 : 1 : [�«¿] ee : 2 : 1 : [�«Â] ef : 2 : 1 : [�«Á] -f0 : 2 : 1 : [�©Ã] f1 : 2 : 1 : [�«Ð] -f2 : 2 : 1 : [�«Ò] -f3 : 2 : 1 : [�«Ñ] f4 : 2 : 1 : [�«Ô] f5 : 2 : 1 : [�«Ø] f6 : 2 : 1 : [�«Ó] -f7 : 2 : 1 : [¡à] -f8 : 2 : 1 : [�©Ì] -f9 : 2 : 1 : [�«ã] -fa : 2 : 1 : [�«â] fb : 2 : 1 : [�«å] -fc : 2 : 1 : [�«ä] fd : 2 : 1 : [�«ò] -fe : 2 : 1 : [�©Ð] ff : 2 : 1 : [�«ó] Here is rxvt-unicode author's opinion. http://lists.schmorp.de/pipermail/rxvt-unicode/2007q1/000402.html > > > > ja_JP.eucJP locale is fixed by src/rxvt.h r1.265. > > > > But ja_JP.UTF-8 locale is still weird. > > > > > > No, its correct, thats what the locale specified. > > > > Do you mean that ja_JP.UTF-8 locale specifies > > "0xd7" (EastAsianAmbiguous) is HALFWIDTH and > > rxvt-unicode simply respects it? > > Basically, yes. At least that is how it *should* be: urxvt always respects > your locale, as should all other programs do too. If your locale says > something and urxvt doesn't follow that, that is considered a bug in > urxvt. > > > > > Do you plan to merge doc/solaris9.patch? > > > > > > No, thats an ugly hack around solaris being broken. > > > > Uh, I mean mk_wcwidth() that is a part of doc/solaris9.patch. > > mk_wcwidth() variant with configurable option is imported into vim, > > xterm and so on. > > Yes, they are all buggy as long as they use that. > > > Yes, rxvt-unicode respects that locale tells. > > But vim, xterm, etc have option that gives EastAsianAmbiguous > > special treatment that EastAsiwnAmbiguous char width is 2. > > vim has ambiwidth=double option, xterm has -cjk_width option. > > Yes, I know. But its stupid to add such hacks to each and every program > and force the user to enable them. The right way is to use or modify the > locale, then suddenly all well-written programs with or without such hacks > just magically work. > > Ignoring the locale is just wrong. It leads to interoperability > problems between programs that simply wouldn't exist if everybody just > respected the locale instead of relying on their own hacks. > > The only justification for adding hacks is for systems that do not support > required locales (such as one providing utf-8), but those systems either > die or get upgraded, so the time is much better spent improving the locale > system on those rare sytems rather than adding hacks to each and every > program. > > > Do you mean locale is wrong/broken then programs do not need to > > If the locale specifies a character width that you do not want, then the > locale is pretty much broken from your perspective, isn't it? At least its > not the locale you want. > > > Do I need to ask not rxvt-unicode but glibc? > > I think glibc (or any software distribution either using it or something > else) should provide the means to configure it regarding such details such > as character width, at least for commonly wanted cases such as east asian > widths. > > I am open to reasoning against my arguments, but to change my mind one > would have to overcome the arguments above. It just plain makes no > sense to hack eahc and every program on the world to workaround locale > limitations: there are far more editors and terminals around than libcs. Each application should implements each approach for EastAsianAmbiguous character width issue now. For example, own one, Markus Kuhn's wcwidth() (http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c). Unable to expand glibc wcwidth()'s current implementation and locale definition, then, could glibc offer common method for this issue? *** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla. Restoring changes lost in system crash and restore from backup. https://sourceware.org/ml/glibc-bugs/2017-08/msg00369.html |