This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug libc/21093] New: mbsnrtowcs: *src is not left pointing to the next multibyte sequence to be converted when input buffer ends with incomplete multibyte sequence
- From: "igor.liferenko at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Mon, 30 Jan 2017 02:46:25 +0000
- Subject: [Bug libc/21093] New: mbsnrtowcs: *src is not left pointing to the next multibyte sequence to be converted when input buffer ends with incomplete multibyte sequence
- Auto-submitted: auto-generated
https://sourceware.org/bugzilla/show_bug.cgi?id=21093
Bug ID: 21093
Summary: mbsnrtowcs: *src is not left pointing to the next
multibyte sequence to be converted when input buffer
ends with incomplete multibyte sequence
Product: glibc
Version: 2.24
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: libc
Assignee: unassigned at sourceware dot org
Reporter: igor.liferenko at gmail dot com
CC: drepper.fsp at gmail dot com
Target Milestone: ---
According to this bugreport[1], if the input buffer ends (end of buffer is
determined by nms argument) with an incomplete multibyte sequence, mbsnrtowcs()
stops conversion before it.
The words "stops conversion" mean that the next byte is not processed.
Thus, *src must be left pointing to the next byte, before which the conversion
was stopped.
But, as can be seen from the following example, mbsnrtowcs() tries to do
further
conversion (and advances *src to point after incomplete multibyte character) -
it does not actually *stop* the conversion, which is a contradiction.
In the example the following UTF-8 sequences are used:
\320 = incomplete
\321\215 = U+044D (CYRILLIC SMALL LETTER E)
#include <locale.h>
#include <wchar.h>
#include <stdio.h>
int main(void)
{
setlocale(LC_CTYPE, "en_US.UTF-8");
char *s = "\321\215\320";
const char *x = s;
wchar_t wcs[3];
printf("status: %d\n", mbsnrtowcs(wcs,&x,3,3,NULL));
perror(NULL);
printf("ori=%p\nnew=%p\n",(void *)s,(void *)x);
return 0;
}
Output:
status: 1
Success
ori=0x556497c29980
new=0x556497c29983
As the output confirms, conversion was stopped before incomplete multibyte
sequence. The problem is, that *src does not point to the next multibyte
sequence to be converted (0x556497c29982).
Compare this with the following example, which also does not set errno to
EILSEQ (Success), and returns the same number of successfully converted
characters (status: 1). But this time *src is left pointing to the next
multibyte sequence to be converted:
#include <locale.h>
#include <wchar.h>
#include <stdio.h>
int main(void)
{
setlocale(LC_CTYPE, "en_US.UTF-8");
char *s = "\321\215";
const char *x = s;
wchar_t wcs[2];
printf("status: %d\n", mbsnrtowcs(wcs,&x,2,2,NULL));
perror(NULL);
printf("ori=%p\nnew=%p\n",(void *)s,(void *)x);
return 0;
}
Output:
status: 1
Success
ori=0x556497c29980
new=0x556497c29982
[1]: https://sourceware.org/bugzilla/show_bug.cgi?id=20860
--
You are receiving this mail because:
You are on the CC list for the bug.