This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/21093] New: mbsnrtowcs: *src is not left pointing to the next multibyte sequence to be converted when input buffer ends with incomplete multibyte sequence


https://sourceware.org/bugzilla/show_bug.cgi?id=21093

            Bug ID: 21093
           Summary: mbsnrtowcs: *src is not left pointing to the next
                    multibyte sequence to be converted when input buffer
                    ends with incomplete multibyte sequence
           Product: glibc
           Version: 2.24
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: igor.liferenko at gmail dot com
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

According to this bugreport[1], if the input buffer ends (end of buffer is
determined by nms argument) with an incomplete multibyte sequence, mbsnrtowcs()
stops conversion before it.

The words "stops conversion" mean that the next byte is not processed.
Thus, *src must be left pointing to the next byte, before which the conversion
was stopped.

But, as can be seen from the following example, mbsnrtowcs() tries to do
further
conversion (and advances *src to point after incomplete multibyte character) -
it does not actually *stop* the conversion, which is a contradiction.

In the example the following UTF-8 sequences are used:

\320     = incomplete
\321\215 = U+044D (CYRILLIC SMALL LETTER E)

    #include <locale.h>
    #include <wchar.h>
    #include <stdio.h>
    int main(void)
    {
      setlocale(LC_CTYPE, "en_US.UTF-8");
      char *s = "\321\215\320";
      const char *x = s;
      wchar_t wcs[3];
      printf("status: %d\n", mbsnrtowcs(wcs,&x,3,3,NULL));
      perror(NULL);
      printf("ori=%p\nnew=%p\n",(void *)s,(void *)x);
      return 0;
    }

Output:

    status: 1
    Success
    ori=0x556497c29980
    new=0x556497c29983

As the output confirms, conversion was stopped before incomplete multibyte
sequence. The problem is, that *src does not point to the next multibyte
sequence to be converted (0x556497c29982).

Compare this with the following example, which also does not set errno to
EILSEQ (Success), and returns the same number of successfully converted
characters (status: 1). But this time *src is left pointing to the next
multibyte sequence to be converted:

    #include <locale.h>
    #include <wchar.h>
    #include <stdio.h>
    int main(void)
    {
      setlocale(LC_CTYPE, "en_US.UTF-8");
      char *s = "\321\215";
      const char *x = s;
      wchar_t wcs[2];
      printf("status: %d\n", mbsnrtowcs(wcs,&x,2,2,NULL));
      perror(NULL);
      printf("ori=%p\nnew=%p\n",(void *)s,(void *)x);
      return 0;
    }

Output:

    status: 1
    Success
    ori=0x556497c29980
    new=0x556497c29982


[1]: https://sourceware.org/bugzilla/show_bug.cgi?id=20860

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]