This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/19932] New: mbrtowc returns (size_t) -1 in C locale


https://sourceware.org/bugzilla/show_bug.cgi?id=19932

            Bug ID: 19932
           Summary: mbrtowc returns (size_t) -1 in C locale
           Product: glibc
           Version: 2.22
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: eggert at gnu dot org
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Created attachment 9173
  --> https://sourceware.org/bugzilla/attachment.cgi?id=9173&action=edit
test mbrtowc in the C locale

This follows up on a bug reported by BjÃrn Jacke against GNU grep 2.23; see
<http://bugs.gnu.org/23234>. The bug occurs because GNU grep uses mbrtowc to
detect encoding errors, and because glibc mbrtowc reports an encoding error in
the C locale when given a byte in the range 128-255 decimal.

It was always the intent of POSIX that all 256 bytes be valid characters in the
C locale, as that was the traditional behavior. This wasn't clearly stated in
the standard, but this is a bug that is planned to be fixed in a future version
of POSIX; see <http://austingroupbugs.net/view.php?id=663#c2738> (2015-07-02).
Glibc should be fixed to conform to this, i.e., mbrtowc should never return
(size_t) -1 in the C locale.

I plan to work around this bug in the gnulib mbrtowc module, which should fix
the grep bug; but this is a hack and will slow grep down a bit. The bug should
be fixed in glibc.

Please see the attached program for an illustration of the bug. The program
should output nothing and exit with status 0, but on glibc it outputs lines
like the following:

byte 0x80 (0200) encoding error
byte 0x81 (0201) encoding error
...
byte 0xff (0377) encoding error

and exits with status 1.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]