[PATCH] CJK ambiguous width for non-Unicode charsets

Andy Koppe andy.koppe@gmail.com
Fri Nov 26 04:42:00 GMT 2010


On 18 November 2010 11:03, Corinna Vinschen wrote:
> On Nov 17 21:34, Andy Koppe wrote:
>> On 16 November 2010 17:58, Corinna Vinschen wrote:
>> > On Nov  9 22:06, Andy Koppe wrote:
>> >> The attached small patch affects character widths as reported by
>> >> wcwidth(). It addresses an obscure issue.
>> >>[...]
>> >>       * libc/locale/locale.c: Fix ambigous width to one for singlebyte
>> >>       charsets and two for non-Unicode multibyte charsets.
>> >
>> > This appears to make a lot of sense.  Would you mind to enhance your
>> > patch slightly to fix also the description in the locale.c
>> > documentation?  There's a related paragraph starting with "This
>> > implementation also supports a single modifier, <<"cjknarrow">>..."
>>
>> Sorry, I hadn't seen that. Amended patch attached.
>>
>>       * libc/locale/locale.c (loadlocale): Fix width of CJK ambigous
>>       characters to 1 for singlebyte charsets and 2 for non-Unicode
>>       multibyte charsets. Change documentation accordingly.
>
> Thank you.  Applied with a minor change.  @ is a special character
> in the docs and has to be doubled ("@@") to be treated literally.
> I just removed it entirely since the @ is not part of the modifier
> itself.

Thanks.

In further testing I realised that the cjknarrow modifier wasn't
implemented for "C.<charset>" locales (since previously there was no
point in that). Patch attached to make it work.

	* libc/locale/locale.c (loadlocale): Recognise the "cjknarrow"
	modifier on "C.<charset>" locales too.

Here's a small test for this:

$ cat width.c
#include <wchar.h>
#include <locale.h>
#include <stdio.h>

int main(void) {
  setlocale(LC_CTYPE, "");
  puts(setlocale(LC_CTYPE, 0));
  puts(wcwidth(0xA1) == 1 ? "narrow" : "wide");
}

$ cc width.c

$ ./a
C.UTF-8
narrow

$ LANG=C.GBK ./a
C.GBK
wide

$ LANG=C.GBK@cjknarrow ./a
C.GBK@cjknarrow
narrow

$ LANG=ja_JP.UTF-8 ./a
ja_JP.UTF-8
wide

$ LANG=ja_JP.UTF-8@cjknarrow ./a
ja_JP.UTF-8@cjknarrow
narrow

$ LANG=de_DE.UTF-8 ./a
de_DE.UTF-8
narrow

Andy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ambiwidth3.patch
Type: application/octet-stream
Size: 1868 bytes
Desc: not available
URL: <http://sourceware.org/pipermail/newlib/attachments/20101126/462ffe7e/attachment.obj>


More information about the Newlib mailing list