This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: should mbrtowc(&wc, "", 1, &ps) set wc?
- To: Bruno Haible <haible at ilog dot fr>
- Subject: Re: should mbrtowc(&wc, "", 1, &ps) set wc?
- From: Edmund GRIMLEY EVANS <edmundo at rano dot org>
- Date: Sun, 12 Nov 2000 23:39:46 +0000
- Bcc: Edmund GRIMLEY EVANS <edmundo at rano dot org>
- Cc: libc-alpha at sources dot redhat dot com
- References: <20001111211606.P13595@rano.org> <14862.61490.683148.47554@honolulu.ilog.fr>
Bruno Haible <haible@ilog.fr>:
> > mbrtowc(&wc, "\302\240", (size_t)(-1), &ps) returns -1. So does
> > mbrtowc(&wc, s, (char *)0 - s, &ps), where s is "\302\240"
>
> > I don't know whether strings are allowed to wrap around the address
> > space.
>
> No. And by calling mbrtowc(pwc,s,n,ps) you are allowing the function
> to inspect the bytes s[0], ..., s[n-1]. Quoting ISO C 99:
>
> "If s is not a null pointer, the mbrtowc function inspects at most
> n bytes beginning with the byte pointed to by s to determine the
> number of bytes needed ..."
>
> Thus the value for n that you pass here is invalid.
You might be right, but the sentence you quoted doesn't on its own
imply that mbrtowc is permitted to look beyond the terminating '\0' of
a string.
Does ISO C 99 state that strncpy will never look beyond the '\0'? If
it does, then maybe the absence of such a statement about mbrtowc
could be taken to imply that mbrtowc might look beyond the '\0'.
Anyway, it wasn't looking beyond the '\0'. The problem (if you want to
call it a problem) is that there is an inconsistency between the way
mbrtowc treats a single byte and a multi-byte character. If you use
(size_t)(-1) for n it works fine for ASCII, then breaks for non-ascii.
I don't claim it's a bug, but it doesn't seem ideal either.
Edmund