Summary: | mbstowcs(3) unable to handle 8bit characters. | ||
---|---|---|---|
Product: | glibc | Reporter: | Steven Drake <sdrake> |
Component: | libc | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED INVALID | ||
Severity: | normal | CC: | bugdal, drepper.fsp |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | 2.13 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | Simple mbstowcs test program |
You need to use a locale that defines a meaning to this byte. The default (ASCII) locale doesn't. (In reply to comment #1) > You need to use a locale that defines a meaning to this byte. The default > (ASCII) locale doesn't. $ env LANG=en_US.iso88591 ./test-mbstowcs ERROR: mbstowcs: Invalid or incomplete multibyte or wide character You don't use a non-default locale without calling setlocale. (In reply to comment #1) > You need to use a locale that defines a meaning to this byte. The default > (ASCII) locale doesn't. Thats wrong, 'locale charmap' gives 'ANSI_X3.4-1968' and there lies the problem, the charmap for the 'C' locale should probable be ISO-8896-1. To be more accurate it should be the charmap that is used by system calls (e.g. readdir and readlink). If you want something else than the C locale you must use setlocale. Please ignore comment 2, the problem is not the locale in use but the charmap of the C locale. The charmap for the C locale should definitely not be ISO-8859-anything. All that does is encourage broken, non-portable program behavior. If you are going to use mbrtowc and family and intend to process characters not in the portable character set, you MUST call setlocale for the LC_CTYPE category. The system calls you referred to (e.g. readdir and readlink) do not use any character map. They process bytes. In any case, if you wanted the C locale to match the filesystem's encoding, it would have to be UTF-8, not ISO-8859-1, at least on any modern system, and I'm pretty sure that's not what you want since you seem to be advocating for very backwards behavior... The charmap for the C locale is ANSI. Just use an appropriate locale as you have been told several times already. |
Created attachment 6246 [details] Simple mbstowcs test program Compiling and running the attached program with glibc emmits: ERROR: mbstowcs: Invalid or incomplete multibyte or wide character Compiling and running the progam on a system with a different libc implentaion it gives the expected output.