[PATCH] Add "@cjknarrow" modifier (was Re: [Fwd: [1.7] wcwidth failing configure tests])
Corinna Vinschen
vinschen@redhat.com
Wed Jun 17 16:40:00 GMT 2009
Jeff,
do you have any opinion about this change? I would like to get it (or
some variation of it) into Cygwin 1.7.
Thanks,
Corinna
On Jun 15 10:44, Corinna Vinschen wrote:
> On Jun 14 22:18, IWAMURO Motonori wrote:
> > 2009/6/13 Corinna Vinschen
> > > The problem appears to be that there is no standard for the handling
> > > of ambiguous characters.
> >
> > Yes, but the guideline exists.
> > http://cygwin.com/ml/cygwin/2009-05/msg00444.html
>
> A single mail in a single mailing list of a single project. That's rather
> a suggestion than a guideline...
>
> > > > Ambiguous characters behave like wide or narrow characters depending
> > > > on the context (language tag, script identification, associated
> > > > font, source of data, or explicit markup; all can provide the
> > > > context). If the context cannot be established reliably, they should
> > > > be treated as narrow characters by default.
> >
> > > Define the default for ja, ko, and zh to use width = 2, with a
> > > @cjknarrow (or whatever) modifier to use width = 1.
> >
> > I think it is good idea.
>
> If everybody agrees to this suggestion, here's the patch. Tested
> with various combinations like
>
> LANG=ja_JP.UTF-8@cjknarrow
> LANG=ja_JP@cjknarrow
> LANG=ja.UTF-8@cjknarrow
> LANG=ja@cjknarrow
>
>
> Corinna
>
>
> * libc/locale/locale.c (loadlocale): Add handling of "@cjknarrow"
> modifier on _MB_CAPABLE targets. Add comment to explain.
>
>
> Index: libc/locale/locale.c
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/locale/locale.c,v
> retrieving revision 1.20
> diff -u -p -r1.20 locale.c
> --- libc/locale/locale.c 3 Jun 2009 19:28:22 -0000 1.20
> +++ libc/locale/locale.c 15 Jun 2009 08:40:46 -0000
> @@ -397,6 +397,9 @@ loadlocale(struct _reent *p, int categor
> int (*l_wctomb) (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
> int (*l_mbtowc) (struct _reent *, wchar_t *, const char *, size_t,
> const char *, mbstate_t *);
> +#ifdef _MB_CAPABLE
> + int cjknarrow = 0;
> +#endif
>
> /* "POSIX" is translated to "C", as on Linux. */
> if (!strcmp (locale, "POSIX"))
> @@ -427,10 +430,14 @@ loadlocale(struct _reent *p, int categor
> if (c[0] == '.')
> {
> /* Charset */
> - strcpy (charset, c + 1);
> - if ((c = strchr (charset, '@')))
> + char *chp;
> +
> + ++c;
> + strcpy (charset, c);
> + if ((chp = strchr (charset, '@')))
> /* Strip off modifier */
> - *c = '\0';
> + *chp = '\0';
> + c += strlen (charset);
> }
> else if (c[0] == '\0' || c[0] == '@')
> /* End of string or just a modifier */
> @@ -442,6 +449,17 @@ loadlocale(struct _reent *p, int categor
> else
> /* Invalid string */
> return NULL;
> +#ifdef _MB_CAPABLE
> + if (c[0] == '@')
> + {
> + /* Modifier */
> + /* Only one modifier is recognized right now. "cjknarrow" is used
> + to modify the behaviour of wcwidth() for East Asian languages.
> + For details see the comment at the end of this function. */
> + if (!strcmp (c + 1, "cjknarrow"))
> + cjknarrow = 1;
> + }
> +#endif
> }
> /* We only support this subset of charsets. */
> switch (charset[0])
> @@ -604,13 +622,15 @@ loadlocale(struct _reent *p, int categor
> __mbtowc = l_mbtowc;
> __set_ctype (charset);
> /* Check for the language part of the locale specifier. In case
> - of "ja", "ko", or "zh", assume the use of CJK fonts. This is
> - stored in lc_ctype_cjk_lang and tested in wcwidth() to figure
> - out the width to return (1 or 2) for the "CJK Ambiguous Width"
> - category of characters. */
> - lc_ctype_cjk_lang = (strncmp (locale, "ja", 2) == 0
> - || strncmp (locale, "ko", 2) == 0
> - || strncmp (locale, "zh", 2) == 0);
> + of "ja", "ko", or "zh", assume the use of CJK fonts, unless the
> + "@cjknarrow" modifier has been specifed.
> + The result is stored in lc_ctype_cjk_lang and tested in wcwidth()
> + to figure out the width to return (1 or 2) for the "CJK Ambiguous
> + Width" category of characters. */
> + lc_ctype_cjk_lang = !cjknarrow
> + && ((strncmp (locale, "ja", 2) == 0
> + || strncmp (locale, "ko", 2) == 0
> + || strncmp (locale, "zh", 2) == 0));
> #endif
> }
> else if (category == LC_MESSAGES)
>
>
> --
> Corinna Vinschen Please, send mails regarding Cygwin to
> Cygwin Project Co-Leader cygwin AT cygwin DOT com
> Red Hat
--
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat
More information about the Newlib
mailing list