Bug 13757 - mbstowcs(3) unable to handle 8bit characters.
Summary: mbstowcs(3) unable to handle 8bit characters.
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.13
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-26 08:07 UTC by Steven Drake
Modified: 2014-06-26 14:36 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Simple mbstowcs test program (179 bytes, text/plain)
2012-02-26 08:07 UTC, Steven Drake
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steven Drake 2012-02-26 08:07:46 UTC
Created attachment 6246 [details]
Simple mbstowcs test program

Compiling and running the attached program with glibc emmits:
ERROR: mbstowcs: Invalid or incomplete multibyte or wide character

Compiling and running the progam on a system with a different libc
implentaion it gives the expected output.
Comment 1 Andreas Schwab 2012-02-26 08:47:46 UTC
You need to use a locale that defines a meaning to this byte.  The default (ASCII) locale doesn't.
Comment 2 Steven Drake 2012-02-26 09:11:06 UTC
(In reply to comment #1)
> You need to use a locale that defines a meaning to this byte.  The default
> (ASCII) locale doesn't.

$ env LANG=en_US.iso88591 ./test-mbstowcs
ERROR: mbstowcs: Invalid or incomplete multibyte or wide character
Comment 3 Andreas Schwab 2012-02-26 09:37:05 UTC
You don't use a non-default locale without calling setlocale.
Comment 4 Steven Drake 2012-02-29 05:54:36 UTC
(In reply to comment #1)
> You need to use a locale that defines a meaning to this byte.  The default
> (ASCII) locale doesn't.

Thats wrong, 'locale charmap' gives 'ANSI_X3.4-1968' and there lies the
problem, the charmap for the 'C' locale should probable be ISO-8896-1.

To be more accurate it should be the charmap that is used by system
calls (e.g. readdir and readlink).
Comment 5 Andreas Schwab 2012-02-29 09:16:58 UTC
If you want something else than the C locale you must use setlocale.
Comment 6 Steven Drake 2012-03-02 02:29:16 UTC
Please ignore comment 2, the problem is not the locale in use but the charmap of the C locale.
Comment 7 Rich Felker 2012-03-03 13:41:43 UTC
The charmap for the C locale should definitely not be ISO-8859-anything. All that does is encourage broken, non-portable program behavior. If you are going to use mbrtowc and family and intend to process characters not in the portable character set, you MUST call setlocale for the LC_CTYPE category.

The system calls you referred to (e.g. readdir and readlink) do not use any character map. They process bytes. In any case, if you wanted the C locale to match the filesystem's encoding, it would have to be UTF-8, not ISO-8859-1, at least on any modern system, and I'm pretty sure that's not what you want since you seem to be advocating for very backwards behavior...
Comment 8 Ulrich Drepper 2012-03-07 08:44:55 UTC
The charmap for the C locale is ANSI.  Just use an appropriate locale as you have been told several times already.