[PATCH] locale modifier @cjkwide

Thomas Wolff towo@towo.net
Tue Feb 27 22:59:00 GMT 2018


Am 27.02.2018 um 15:58 schrieb Corinna Vinschen:
> Hi Thomas,
>
> On Feb 26 22:42, Thomas Wolff wrote:
>> I wrote yesterday:
>>> It had been discussed how to reflect ambiguous character widths in
>>> cygwin locales, with the result of an implicit wide property assumed for
>>> the CJK locales, and an overriding @cjknarrow modifier:
>>> https://sourceware.org/ml/cygwin/2009-06/msg00240.html
>>> https://sourceware.org/ml/cygwin/2009-06/msg00521.html
>>> https://sourceware.org/ml/cygwin/2009-06/msg00616.html
>>>
>>> Now I’m getting occasional complaints about mintty support for wide
>>> display of certain symbol characters, particularly as used for some
>>> fancy “Powerline” add-on, and it seems that other terminals apply
>>> “ambiguous wide mode” (e.g. xterm -cjk_width) in order to enable
>>> Powerline.
>>> While mintty has an option Charwidth=ambig-wide meanwhile, using this
>>> option clearly has the drawback that it makes character width handling
>>> inconsistent with the locale model as used by wcwidth.
>>> Actually for mintty, the desired behaviour can be achieved in a
>>> locale-consistent way by selecting one of the CJK locales for LC_CTYPE;
>>> that’s not what most people would expect, however, and if they do it the
>>> easy way, using LANG or LC_ALL, they are baffled by also getting
>>> their message language obscured.
>>> So I would prefer the option to use ambiguous wide mode in combination
>>> with non-CJK locales in a locale-compatible way.
>> So I suggest to revisit the proposal of another generic modifier, also for
>> symmetry, which is @cjkwide applicable to non-CJK locales.
>> Patch attached.
>> Thomas
> Just one point:
>> ...
>> Subject: [PATCH] locale modifier @cjkwide
> It would be most helpful to get a v2 patch with a commit message
> describing why adding cjkwide makes sense, for later reference.
> The subject "locale modifier @cjkwide" is rather terse.
New patch attached. I'll also provide a patch for the Cygwin user guide, 
to cygwin-patches.
Thomas
-------------- next part --------------
From f97028789cb8e18fd97a65fb8f5b08f25856bb94 Mon Sep 17 00:00:00 2001
From: Thomas Wolff <towo@towo.net>
Date: Tue, 27 Feb 2018 23:47:21 +0100
Subject: [PATCH] Locale modifier @cjkwide makes Unicode "ambiguous width"
 characters wide. So ambiguous width characters can be enforced to have width
 2 even in non-CJK locales. This gives e.g. users of "Powerline symbols" the
 opportunity to adjust their width to the desired behaviour (and the behaviour
 apparently expected by some tools) without having to set a CJK locale and
 without losing consistence of terminal character width with wcwidth/wcswidth
 locale width.

---
 newlib/libc/locale/locale.c | 39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/newlib/libc/locale/locale.c b/newlib/libc/locale/locale.c
index baa5451..e654c5c 100644
--- a/newlib/libc/locale/locale.c
+++ b/newlib/libc/locale/locale.c
@@ -74,15 +74,16 @@ Cygwin additionally supports locales from the file
 (<<"">> is also accepted; if given, the settings are read from the
 corresponding LC_* environment variables and $LANG according to POSIX rules.)
 
-This implementation also supports the modifier <<"cjknarrow">>, which
-affects how the functions <<wcwidth>> and <<wcswidth>> handle characters
-from the "CJK Ambiguous Width" category of characters described at
-http://www.unicode.org/reports/tr11/#Ambiguous. These characters have a width
-of 1 for singlebyte charsets and a width of 2 for multibyte charsets
-other than UTF-8. For UTF-8, their width depends on the language specifier:
+This implementation also supports the modifiers <<"cjknarrow">> and
+<<"cjkwide">>, which affect how the functions <<wcwidth>> and <<wcswidth>>
+handle characters from the "CJK Ambiguous Width" category of characters
+described at http://www.unicode.org/reports/tr11/#Ambiguous.
+These characters have a width of 1 for singlebyte charsets and a width of 2
+for multibyte charsets other than UTF-8.
+For UTF-8, their width depends on the language specifier:
 it is 2 for <<"zh">> (Chinese), <<"ja">> (Japanese), and <<"ko">> (Korean),
-and 1 for everything else. Specifying <<"cjknarrow">> forces a width of 1,
-independent of charset and language.
+and 1 for everything else. Specifying <<"cjknarrow">> or <<"cjkwide">>
+forces a width of 1 or 2, respectively, independent of charset and language.
 
 If you use <<NULL>> as the <[locale]> argument, <<setlocale>> returns a
 pointer to the string representing the current locale.  The acceptable
@@ -480,6 +481,7 @@ __loadlocale (struct __locale_t *loc, int category, const char *new_locale)
   wctomb_p l_wctomb;
   mbtowc_p l_mbtowc;
   int cjknarrow = 0;
+  int cjkwide = 0;
 
   /* Avoid doing everything twice if nothing has changed.
 
@@ -593,11 +595,13 @@ restart:
   if (c && c[0] == '@')
     {
       /* Modifier */
-      /* Only one modifier is recognized right now.  "cjknarrow" is used
-         to modify the behaviour of wcwidth() for East Asian languages.
+      /* Modifiers "cjknarrow" or "cjkwide" are recognized to modify the 
+         behaviour of wcwidth() and wcswidth() for East Asian languages.
          For details see the comment at the end of this function. */
       if (!strcmp (c + 1, "cjknarrow"))
 	cjknarrow = 1;
+      else if (!strcmp (c + 1, "cjkwide"))
+	cjkwide = 1;
     }
   /* We only support this subset of charsets. */
   switch (charset[0])
@@ -894,12 +898,15 @@ restart:
          single-byte charsets, and double width for multi-byte charsets
          other than UTF-8. For UTF-8, use double width for the East Asian
          languages ("ja", "ko", "zh"), and single width for everything else.
-         Single width can also be forced with the "@cjknarrow" modifier. */
-      loc->cjk_lang = !cjknarrow && mbc_max > 1
-		      && (charset[0] != 'U'
-			  || strncmp (locale, "ja", 2) == 0
-			  || strncmp (locale, "ko", 2) == 0
-			  || strncmp (locale, "zh", 2) == 0);
+         Single width can also be forced with the "@cjknarrow" modifier.
+         Double width can also be forced with the "@cjkwide" modifier.
+       */
+      loc->cjk_lang = cjkwide ||
+		      (!cjknarrow && mbc_max > 1
+		       && (charset[0] != 'U'
+			   || strncmp (locale, "ja", 2) == 0
+			   || strncmp (locale, "ko", 2) == 0
+			   || strncmp (locale, "zh", 2) == 0));
 #ifdef __HAVE_LOCALE_INFO__
       ret = __ctype_load_locale (loc, locale, (void *) l_wctomb, charset,
 				 mbc_max);
-- 
2.16.2



More information about the Newlib mailing list