Bug 13757

Summary: mbstowcs(3) unable to handle 8bit characters.
Product: glibc Reporter: Steven Drake <sdrake>
Component: libcAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED INVALID    
Severity: normal CC: bugdal, drepper.fsp
Priority: P2 Flags: fweimer: security-
Version: 2.13   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: Simple mbstowcs test program

Description Steven Drake 2012-02-26 08:07:46 UTC
Created attachment 6246 [details]
Simple mbstowcs test program

Compiling and running the attached program with glibc emmits:
ERROR: mbstowcs: Invalid or incomplete multibyte or wide character

Compiling and running the progam on a system with a different libc
implentaion it gives the expected output.
Comment 1 Andreas Schwab 2012-02-26 08:47:46 UTC
You need to use a locale that defines a meaning to this byte.  The default (ASCII) locale doesn't.
Comment 2 Steven Drake 2012-02-26 09:11:06 UTC
(In reply to comment #1)
> You need to use a locale that defines a meaning to this byte.  The default
> (ASCII) locale doesn't.

$ env LANG=en_US.iso88591 ./test-mbstowcs
ERROR: mbstowcs: Invalid or incomplete multibyte or wide character
Comment 3 Andreas Schwab 2012-02-26 09:37:05 UTC
You don't use a non-default locale without calling setlocale.
Comment 4 Steven Drake 2012-02-29 05:54:36 UTC
(In reply to comment #1)
> You need to use a locale that defines a meaning to this byte.  The default
> (ASCII) locale doesn't.

Thats wrong, 'locale charmap' gives 'ANSI_X3.4-1968' and there lies the
problem, the charmap for the 'C' locale should probable be ISO-8896-1.

To be more accurate it should be the charmap that is used by system
calls (e.g. readdir and readlink).
Comment 5 Andreas Schwab 2012-02-29 09:16:58 UTC
If you want something else than the C locale you must use setlocale.
Comment 6 Steven Drake 2012-03-02 02:29:16 UTC
Please ignore comment 2, the problem is not the locale in use but the charmap of the C locale.
Comment 7 Rich Felker 2012-03-03 13:41:43 UTC
The charmap for the C locale should definitely not be ISO-8859-anything. All that does is encourage broken, non-portable program behavior. If you are going to use mbrtowc and family and intend to process characters not in the portable character set, you MUST call setlocale for the LC_CTYPE category.

The system calls you referred to (e.g. readdir and readlink) do not use any character map. They process bytes. In any case, if you wanted the C locale to match the filesystem's encoding, it would have to be UTF-8, not ISO-8859-1, at least on any modern system, and I'm pretty sure that's not what you want since you seem to be advocating for very backwards behavior...
Comment 8 Ulrich Drepper 2012-03-07 08:44:55 UTC
The charmap for the C locale is ANSI.  Just use an appropriate locale as you have been told several times already.