[PATCH] locale modifier @cjkwide

Thomas Wolff towo@towo.net
Fri Mar 2 19:31:00 GMT 2018


Am 01.03.2018 um 18:21 schrieb Corinna Vinschen:
> On Feb 27 23:58, Thomas Wolff wrote:
>> Am 27.02.2018 um 15:58 schrieb Corinna Vinschen:
>>> It would be most helpful to get a v2 patch with a commit message
>>> describing why adding cjkwide makes sense, for later reference.
>>> The subject "locale modifier @cjkwide" is rather terse.
>> New patch attached. I'll also provide a patch for the Cygwin user guide, to
>> cygwin-patches.
> Thanks, but the commit message is incorrect.  A single, short first
> line, followed by an empty line, followed by the more detailed commit
> message.  Otherwise, as you can see, the entire message will become the
> commit title.
Update attached, hope it's OK this time.
Thomas
-------------- next part --------------
From 3979072d80a2b4cc079aa719776d9e338fc62fd3 Mon Sep 17 00:00:00 2001
From: Thomas Wolff <towo@towo.net>
Date: Fri, 2 Mar 2018 20:21:09 +0100
Subject: [PATCH] Locale modifier @cjkwide to adjust ambiguous-width in non-CJK locales

Locale modifier @cjkwide makes Unicode "ambiguous width" characters wide.
So ambiguous width characters can be enforced to have width 2 even in
non-CJK locales. This gives e.g. users of "Powerline symbols" the opportunity
to adjust their width to the desired behaviour (and the behaviour apparently
expected by some tools) without having to set a CJK locale and without losing
consistence of terminal character width with wcwidth/wcswidth locale width.
---
 newlib/libc/locale/locale.c | 39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/newlib/libc/locale/locale.c b/newlib/libc/locale/locale.c
index baa5451..e654c5c 100644
--- a/newlib/libc/locale/locale.c
+++ b/newlib/libc/locale/locale.c
@@ -74,15 +74,16 @@ Cygwin additionally supports locales from the file
 (<<"">> is also accepted; if given, the settings are read from the
 corresponding LC_* environment variables and $LANG according to POSIX rules.)
 
-This implementation also supports the modifier <<"cjknarrow">>, which
-affects how the functions <<wcwidth>> and <<wcswidth>> handle characters
-from the "CJK Ambiguous Width" category of characters described at
-http://www.unicode.org/reports/tr11/#Ambiguous. These characters have a width
-of 1 for singlebyte charsets and a width of 2 for multibyte charsets
-other than UTF-8. For UTF-8, their width depends on the language specifier:
+This implementation also supports the modifiers <<"cjknarrow">> and
+<<"cjkwide">>, which affect how the functions <<wcwidth>> and <<wcswidth>>
+handle characters from the "CJK Ambiguous Width" category of characters
+described at http://www.unicode.org/reports/tr11/#Ambiguous.
+These characters have a width of 1 for singlebyte charsets and a width of 2
+for multibyte charsets other than UTF-8.
+For UTF-8, their width depends on the language specifier:
 it is 2 for <<"zh">> (Chinese), <<"ja">> (Japanese), and <<"ko">> (Korean),
-and 1 for everything else. Specifying <<"cjknarrow">> forces a width of 1,
-independent of charset and language.
+and 1 for everything else. Specifying <<"cjknarrow">> or <<"cjkwide">>
+forces a width of 1 or 2, respectively, independent of charset and language.
 
 If you use <<NULL>> as the <[locale]> argument, <<setlocale>> returns a
 pointer to the string representing the current locale.  The acceptable
@@ -480,6 +481,7 @@ __loadlocale (struct __locale_t *loc, int category, const char *new_locale)
   wctomb_p l_wctomb;
   mbtowc_p l_mbtowc;
   int cjknarrow = 0;
+  int cjkwide = 0;
 
   /* Avoid doing everything twice if nothing has changed.
 
@@ -593,11 +595,13 @@ restart:
   if (c && c[0] == '@')
     {
       /* Modifier */
-      /* Only one modifier is recognized right now.  "cjknarrow" is used
-         to modify the behaviour of wcwidth() for East Asian languages.
+      /* Modifiers "cjknarrow" or "cjkwide" are recognized to modify the 
+         behaviour of wcwidth() and wcswidth() for East Asian languages.
          For details see the comment at the end of this function. */
       if (!strcmp (c + 1, "cjknarrow"))
 	cjknarrow = 1;
+      else if (!strcmp (c + 1, "cjkwide"))
+	cjkwide = 1;
     }
   /* We only support this subset of charsets. */
   switch (charset[0])
@@ -894,12 +898,15 @@ restart:
          single-byte charsets, and double width for multi-byte charsets
          other than UTF-8. For UTF-8, use double width for the East Asian
          languages ("ja", "ko", "zh"), and single width for everything else.
-         Single width can also be forced with the "@cjknarrow" modifier. */
-      loc->cjk_lang = !cjknarrow && mbc_max > 1
-		      && (charset[0] != 'U'
-			  || strncmp (locale, "ja", 2) == 0
-			  || strncmp (locale, "ko", 2) == 0
-			  || strncmp (locale, "zh", 2) == 0);
+         Single width can also be forced with the "@cjknarrow" modifier.
+         Double width can also be forced with the "@cjkwide" modifier.
+       */
+      loc->cjk_lang = cjkwide ||
+		      (!cjknarrow && mbc_max > 1
+		       && (charset[0] != 'U'
+			   || strncmp (locale, "ja", 2) == 0
+			   || strncmp (locale, "ko", 2) == 0
+			   || strncmp (locale, "zh", 2) == 0));
 #ifdef __HAVE_LOCALE_INFO__
       ret = __ctype_load_locale (loc, locale, (void *) l_wctomb, charset,
 				 mbc_max);
-- 
2.16.2



More information about the Newlib mailing list