Bug 21093 - mbsnrtowcs: *src is not left pointing to the next multibyte sequence to be converted when input buffer ends with incomplete multibyte sequence
Summary: mbsnrtowcs: *src is not left pointing to the next multibyte sequence to be co...
Status: UNCONFIRMED
Alias: None
Product: glibc
Classification: Unclassified
Component: locale (show other bugs)
Version: 2.24
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-30 02:46 UTC by Igor Liferenko
Modified: 2018-01-24 16:28 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Igor Liferenko 2017-01-30 02:46:25 UTC
According to this bugreport[1], if the input buffer ends (end of buffer is
determined by nms argument) with an incomplete multibyte sequence, mbsnrtowcs()
stops conversion before it.

The words "stops conversion" mean that the next byte is not processed.
Thus, *src must be left pointing to the next byte, before which the conversion
was stopped.

But, as can be seen from the following example, mbsnrtowcs() tries to do further
conversion (and advances *src to point after incomplete multibyte character) -
it does not actually *stop* the conversion, which is a contradiction.

In the example the following UTF-8 sequences are used:

\320     = incomplete
\321\215 = U+044D (CYRILLIC SMALL LETTER E)

    #include <locale.h>
    #include <wchar.h>
    #include <stdio.h>
    int main(void)
    {
      setlocale(LC_CTYPE, "en_US.UTF-8");
      char *s = "\321\215\320";
      const char *x = s;
      wchar_t wcs[3];
      printf("status: %d\n", mbsnrtowcs(wcs,&x,3,3,NULL));
      perror(NULL);
      printf("ori=%p\nnew=%p\n",(void *)s,(void *)x);
      return 0;
    }

Output:

    status: 1
    Success
    ori=0x556497c29980
    new=0x556497c29983

As the output confirms, conversion was stopped before incomplete multibyte
sequence. The problem is, that *src does not point to the next multibyte
sequence to be converted (0x556497c29982).

Compare this with the following example, which also does not set errno to
EILSEQ (Success), and returns the same number of successfully converted
characters (status: 1). But this time *src is left pointing to the next
multibyte sequence to be converted:

    #include <locale.h>
    #include <wchar.h>
    #include <stdio.h>
    int main(void)
    {
      setlocale(LC_CTYPE, "en_US.UTF-8");
      char *s = "\321\215";
      const char *x = s;
      wchar_t wcs[2];
      printf("status: %d\n", mbsnrtowcs(wcs,&x,2,2,NULL));
      perror(NULL);
      printf("ori=%p\nnew=%p\n",(void *)s,(void *)x);
      return 0;
    }

Output:

    status: 1
    Success
    ori=0x556497c29980
    new=0x556497c29982


[1]: https://sourceware.org/bugzilla/show_bug.cgi?id=20860
Comment 1 Igor Liferenko 2017-02-17 07:38:17 UTC
Let me pose the problem in another way:

The problem here is not that *src is changed incorrectly - according
to bugreport[1] it is changed correctly.

The problem here is that in the first example it is an incomplete multibyte sequence and mbsnrtowcs() *must exit with error*, but it exits with success.