[PATCH] locale modifier @cjkwide
Thomas Wolff
towo@towo.net
Mon Feb 26 21:42:00 GMT 2018
I wrote yesterday:
> It had been discussed how to reflect ambiguous character widths in
> cygwin locales, with the result of an implicit wide property assumed
> for the CJK locales, and an overriding @cjknarrow modifier:
> https://sourceware.org/ml/cygwin/2009-06/msg00240.html
> https://sourceware.org/ml/cygwin/2009-06/msg00521.html
> https://sourceware.org/ml/cygwin/2009-06/msg00616.html
>
> Now Iâm getting occasional complaints about mintty support for wide
> display of certain symbol characters, particularly as used for some
> fancy âPowerlineâ add-on, and it seems that other terminals apply
> âambiguous wide modeâ (e.g. xterm -cjk_width) in order to enable
> Powerline.
> While mintty has an option Charwidth=ambig-wide meanwhile, using this
> option clearly has the drawback that it makes character width handling
> inconsistent with the locale model as used by wcwidth.
> Actually for mintty, the desired behaviour can be achieved in a
> locale-consistent way by selecting one of the CJK locales for LC_CTYPE;
> thatâs not what most people would expect, however, and if they do it
> the easy way, using LANG or LC_ALL, they are baffled by also getting
> their message language obscured.
> So I would prefer the option to use ambiguous wide mode in combination
> with non-CJK locales in a locale-compatible way.
So I suggest to revisit the proposal of another generic modifier, also
for symmetry, which is @cjkwide applicable to non-CJK locales.
Patch attached.
Thomas
-------------- next part --------------
From 12b87350eb70c83cd654eec37dae3773bf58d231 Mon Sep 17 00:00:00 2001
From: Thomas Wolff <towo@towo.net>
Date: Sun, 25 Feb 2018 16:27:33 +0100
Subject: [PATCH] locale modifier @cjkwide
---
newlib/libc/locale/locale.c | 39 +++++++++++++++++++++++----------------
1 file changed, 23 insertions(+), 16 deletions(-)
diff --git a/newlib/libc/locale/locale.c b/newlib/libc/locale/locale.c
index baa5451..e654c5c 100644
--- a/newlib/libc/locale/locale.c
+++ b/newlib/libc/locale/locale.c
@@ -74,15 +74,16 @@ Cygwin additionally supports locales from the file
(<<"">> is also accepted; if given, the settings are read from the
corresponding LC_* environment variables and $LANG according to POSIX rules.)
-This implementation also supports the modifier <<"cjknarrow">>, which
-affects how the functions <<wcwidth>> and <<wcswidth>> handle characters
-from the "CJK Ambiguous Width" category of characters described at
-http://www.unicode.org/reports/tr11/#Ambiguous. These characters have a width
-of 1 for singlebyte charsets and a width of 2 for multibyte charsets
-other than UTF-8. For UTF-8, their width depends on the language specifier:
+This implementation also supports the modifiers <<"cjknarrow">> and
+<<"cjkwide">>, which affect how the functions <<wcwidth>> and <<wcswidth>>
+handle characters from the "CJK Ambiguous Width" category of characters
+described at http://www.unicode.org/reports/tr11/#Ambiguous.
+These characters have a width of 1 for singlebyte charsets and a width of 2
+for multibyte charsets other than UTF-8.
+For UTF-8, their width depends on the language specifier:
it is 2 for <<"zh">> (Chinese), <<"ja">> (Japanese), and <<"ko">> (Korean),
-and 1 for everything else. Specifying <<"cjknarrow">> forces a width of 1,
-independent of charset and language.
+and 1 for everything else. Specifying <<"cjknarrow">> or <<"cjkwide">>
+forces a width of 1 or 2, respectively, independent of charset and language.
If you use <<NULL>> as the <[locale]> argument, <<setlocale>> returns a
pointer to the string representing the current locale. The acceptable
@@ -480,6 +481,7 @@ __loadlocale (struct __locale_t *loc, int category, const char *new_locale)
wctomb_p l_wctomb;
mbtowc_p l_mbtowc;
int cjknarrow = 0;
+ int cjkwide = 0;
/* Avoid doing everything twice if nothing has changed.
@@ -593,11 +595,13 @@ restart:
if (c && c[0] == '@')
{
/* Modifier */
- /* Only one modifier is recognized right now. "cjknarrow" is used
- to modify the behaviour of wcwidth() for East Asian languages.
+ /* Modifiers "cjknarrow" or "cjkwide" are recognized to modify the
+ behaviour of wcwidth() and wcswidth() for East Asian languages.
For details see the comment at the end of this function. */
if (!strcmp (c + 1, "cjknarrow"))
cjknarrow = 1;
+ else if (!strcmp (c + 1, "cjkwide"))
+ cjkwide = 1;
}
/* We only support this subset of charsets. */
switch (charset[0])
@@ -894,12 +898,15 @@ restart:
single-byte charsets, and double width for multi-byte charsets
other than UTF-8. For UTF-8, use double width for the East Asian
languages ("ja", "ko", "zh"), and single width for everything else.
- Single width can also be forced with the "@cjknarrow" modifier. */
- loc->cjk_lang = !cjknarrow && mbc_max > 1
- && (charset[0] != 'U'
- || strncmp (locale, "ja", 2) == 0
- || strncmp (locale, "ko", 2) == 0
- || strncmp (locale, "zh", 2) == 0);
+ Single width can also be forced with the "@cjknarrow" modifier.
+ Double width can also be forced with the "@cjkwide" modifier.
+ */
+ loc->cjk_lang = cjkwide ||
+ (!cjknarrow && mbc_max > 1
+ && (charset[0] != 'U'
+ || strncmp (locale, "ja", 2) == 0
+ || strncmp (locale, "ko", 2) == 0
+ || strncmp (locale, "zh", 2) == 0));
#ifdef __HAVE_LOCALE_INFO__
ret = __ctype_load_locale (loc, locale, (void *) l_wctomb, charset,
mbc_max);
--
2.16.2
More information about the Newlib
mailing list