[Fwd: [1.7] wcwidth failing configure tests]

Thomas Wolff towo@towo.net
Fri Jun 5 16:25:00 GMT 2009


IWAMURO Motonori wrote:
> 2009/5/21 Thomas Wolff <towo@towo.net>:
> >> > Therefore, I propose to use *_cjk() when the language part of LC_CTYPE
> >> > is 'ja', 'ko', 'vi' or 'zh'.
> > The problem with this is
> > 1. As you say, there is no standard.

> But,
> - I think that my proposal doesn't violate any specification.
I think it does. Part of the locale information is the "charmap" 
(called "codepage" on DOS/Windows). It may be implicit like 
with LC_CTYPE=zh_CN which defines "GB2312" as its charmap, but it 
is typically explicit like in en_US.UTF-8 - the intention is 
that the "codepage" information should be the same for all locales 
having thbe "UTF-8" (or any other) charmap. So you cannot freely 
change width information among locales with the same charmap.
Also, if ja_JP.UTF-8 would mean "CJK width", how would you specify 
a working locale setting for a terminal that does not run a CJK width 
font but should yet use other Japanese settings? E.g. with rxvt which 
does not support CJK width.

However, there is one resort within the locale mechanism that can be used;
the locale syntax allows for an optional "modifier" which can be used to 
specify deviations, e.g.
	de_DE           has charmap ISO-8859-1
	de_DE@euro      has charmap ISO-8859-15
	uz_UZ           has charmap ISO-8859-1
	uz_UZ@cyrillic  has charmap UTF-8
	aa_ER and aa_ER@saaho both have charmap UTF-8 (with some other difference).
Thus you could define e.g.
	ja_JP.UTF-8@cjk
or
	ja_JP.UTF-8@cjkwidth
to indicate CJK width properties. I guess this is the most compliant way to go.


> - I heard that there is an existing implementation that behave like my
> proposal. (Sorry, I didn't hear the system name.)
Even if so, I think the way I described is more compatible with the locale 
mechanism as used elsewhere.


> > 2. If you wish to handle character widths compliant with the terminal
> > ? your application is running in, there is no guarantee that your
> > ? assumption of CJK width (or the actual locale setting if that model
> > ? would be implemented) does indeed reflect the terminal's width properties.

> Yes, I understand it, too. My proposal is completely workaround.
> But it is the best solution because we have no specification/standard
> for my wish.
A well-chosen option like above, that stays within the described standard 
options, would be best accepted by other communities, I think, and could 
be established for this purpose.


> > 3. In mintty, you can dynamically change width properties by selecting
> > ? different fonts; mintty changes CJK width behaviour according to certain
> > ? font properties. "static" configuration in your shell using a locale
> > ? variable would not reflect this change

> It is no problem because we -- most Japanese language users -- need
> not change the settings of mintty and locale after first setup.
> We set LANG=ja_JP.UTF-8 and select a Japanese font for mintty.
In any case, mined running in mintty will detect CJK width itself, 
regardless of locale setting, with coming versions of both programs 
even when it gets changed on-the-fly :)


> > ? b) Determine the actual CJK width behaviour dynamically. That's what
> > ? ? ?mined does (in addition to other width property detection in general).

> It is the best solution. I think that we need specify the following:
> - the escape sequence about language context for terminal emulater.
> -- setting language context
> -- getting language context
> -- getting capability of language context
>    (context is fixed, static or dynamic / acceptable languages)
> - new multilingualized string/terminal API for terminal based applications.
This sounds complicated.
With my proposal, an application that wishes to auto-adjust on width 
properties (maybe even when changing) and which (unlike mined) uses 
the system wcwidth functions could proceed as follows:
* Detect CJK width by using a simple test string width detection.
* (Optional) When receiving a SIGWINCH signal (future version of MinTTY), 
  repeat this detection.
* If e.g. LC_CTYPE starts with "ja_JP.UTF-8", call setlocale with 
  either "ja_JP.UTF-8@cjkwidth" or "ja_JP.UTF-8".
  The application would need to stay with the same locale prefix 
  "ja_JP..." because there is no reasonable way to choose a completely 
  different locale, which is another reason to just use the modifier 
  suffix, rather than reserving the complete "ja_JP..." setting for 
  CJK width.

Advantage of this approach: The system does not have to care about 
this issue and can just follow the locale setting.


> And, we need rewrite too many applications by new API.
Well, alternatively, the system could follow the approach outlined 
above, but maybe that's not the proper level to do it (?)


> > I'm not happy with the idea of a cygwin-specific solution (or workaround).
> I think that it is not cygwin-specific solution.
As I tried to suggest above, using "UTF-8" for different width data on one 
system would be quite specific, using the "@" modifier syntax would not.


Kind regards,
Thomas

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list