[PATCH] locale modifier @cjkwide

Thomas Wolff towo@towo.net
Mon Feb 26 21:42:00 GMT 2018


I wrote yesterday:
> It had been discussed how to reflect ambiguous character widths in 
> cygwin locales, with the result of an implicit wide property assumed 
> for the CJK locales, and an overriding @cjknarrow modifier:
> https://sourceware.org/ml/cygwin/2009-06/msg00240.html
> https://sourceware.org/ml/cygwin/2009-06/msg00521.html
> https://sourceware.org/ml/cygwin/2009-06/msg00616.html
>
> Now I’m getting occasional complaints about mintty support for wide 
> display of certain symbol characters, particularly as used for some 
> fancy “Powerline” add-on, and it seems that other terminals apply 
> “ambiguous wide mode” (e.g. xterm -cjk_width) in order to enable 
> Powerline.
> While mintty has an option Charwidth=ambig-wide meanwhile, using this 
> option clearly has the drawback that it makes character width handling 
> inconsistent with the locale model as used by wcwidth.
> Actually for mintty, the desired behaviour can be achieved in a 
> locale-consistent way by selecting one of the CJK locales for LC_CTYPE;
> that’s not what most people would expect, however, and if they do it 
> the easy way, using LANG or LC_ALL, they are baffled by also getting
> their message language obscured.
> So I would prefer the option to use ambiguous wide mode in combination 
> with non-CJK locales in a locale-compatible way.

So I suggest to revisit the proposal of another generic modifier, also 
for symmetry, which is @cjkwide applicable to non-CJK locales.
Patch attached.
Thomas
-------------- next part --------------
From 12b87350eb70c83cd654eec37dae3773bf58d231 Mon Sep 17 00:00:00 2001
From: Thomas Wolff <towo@towo.net>
Date: Sun, 25 Feb 2018 16:27:33 +0100
Subject: [PATCH] locale modifier @cjkwide

---
 newlib/libc/locale/locale.c | 39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/newlib/libc/locale/locale.c b/newlib/libc/locale/locale.c
index baa5451..e654c5c 100644
--- a/newlib/libc/locale/locale.c
+++ b/newlib/libc/locale/locale.c
@@ -74,15 +74,16 @@ Cygwin additionally supports locales from the file
 (<<"">> is also accepted; if given, the settings are read from the
 corresponding LC_* environment variables and $LANG according to POSIX rules.)
 
-This implementation also supports the modifier <<"cjknarrow">>, which
-affects how the functions <<wcwidth>> and <<wcswidth>> handle characters
-from the "CJK Ambiguous Width" category of characters described at
-http://www.unicode.org/reports/tr11/#Ambiguous. These characters have a width
-of 1 for singlebyte charsets and a width of 2 for multibyte charsets
-other than UTF-8. For UTF-8, their width depends on the language specifier:
+This implementation also supports the modifiers <<"cjknarrow">> and
+<<"cjkwide">>, which affect how the functions <<wcwidth>> and <<wcswidth>>
+handle characters from the "CJK Ambiguous Width" category of characters
+described at http://www.unicode.org/reports/tr11/#Ambiguous.
+These characters have a width of 1 for singlebyte charsets and a width of 2
+for multibyte charsets other than UTF-8.
+For UTF-8, their width depends on the language specifier:
 it is 2 for <<"zh">> (Chinese), <<"ja">> (Japanese), and <<"ko">> (Korean),
-and 1 for everything else. Specifying <<"cjknarrow">> forces a width of 1,
-independent of charset and language.
+and 1 for everything else. Specifying <<"cjknarrow">> or <<"cjkwide">>
+forces a width of 1 or 2, respectively, independent of charset and language.
 
 If you use <<NULL>> as the <[locale]> argument, <<setlocale>> returns a
 pointer to the string representing the current locale.  The acceptable
@@ -480,6 +481,7 @@ __loadlocale (struct __locale_t *loc, int category, const char *new_locale)
   wctomb_p l_wctomb;
   mbtowc_p l_mbtowc;
   int cjknarrow = 0;
+  int cjkwide = 0;
 
   /* Avoid doing everything twice if nothing has changed.
 
@@ -593,11 +595,13 @@ restart:
   if (c && c[0] == '@')
     {
       /* Modifier */
-      /* Only one modifier is recognized right now.  "cjknarrow" is used
-         to modify the behaviour of wcwidth() for East Asian languages.
+      /* Modifiers "cjknarrow" or "cjkwide" are recognized to modify the 
+         behaviour of wcwidth() and wcswidth() for East Asian languages.
          For details see the comment at the end of this function. */
       if (!strcmp (c + 1, "cjknarrow"))
 	cjknarrow = 1;
+      else if (!strcmp (c + 1, "cjkwide"))
+	cjkwide = 1;
     }
   /* We only support this subset of charsets. */
   switch (charset[0])
@@ -894,12 +898,15 @@ restart:
          single-byte charsets, and double width for multi-byte charsets
          other than UTF-8. For UTF-8, use double width for the East Asian
          languages ("ja", "ko", "zh"), and single width for everything else.
-         Single width can also be forced with the "@cjknarrow" modifier. */
-      loc->cjk_lang = !cjknarrow && mbc_max > 1
-		      && (charset[0] != 'U'
-			  || strncmp (locale, "ja", 2) == 0
-			  || strncmp (locale, "ko", 2) == 0
-			  || strncmp (locale, "zh", 2) == 0);
+         Single width can also be forced with the "@cjknarrow" modifier.
+         Double width can also be forced with the "@cjkwide" modifier.
+       */
+      loc->cjk_lang = cjkwide ||
+		      (!cjknarrow && mbc_max > 1
+		       && (charset[0] != 'U'
+			   || strncmp (locale, "ja", 2) == 0
+			   || strncmp (locale, "ko", 2) == 0
+			   || strncmp (locale, "zh", 2) == 0));
 #ifdef __HAVE_LOCALE_INFO__
       ret = __ctype_load_locale (loc, locale, (void *) l_wctomb, charset,
 				 mbc_max);
-- 
2.16.2



More information about the Newlib mailing list