Bug 9674 - mbtowc keeps internal state even for stateless encodings
Summary: mbtowc keeps internal state even for stateless encodings
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.8
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-19 01:08 UTC by Bruno Haible
Modified: 2014-07-02 07:48 UTC (History)
2 users (show)

See Also:
Host: i686-suse-linux
Target: i686-suse-linux
Build: i686-suse-linux
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bruno Haible 2008-12-19 01:08:45 UTC
Run the following program on a system with a fr_FR.UTF-8 locale.
===================================================================
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>

int main ()
{
  int locale_found = setlocale (LC_ALL, "fr_FR.UTF-8") != NULL;
  printf ("%d\n", locale_found);

  printf ("%d\n", mbtowc (NULL, NULL, 0));

  static const char input[4] = { 195, 188, 195, 159 };
  wchar_t wc;

  int res2a = mbtowc (&wc, input, 4);
  printf ("%d\n", res2a);

  int res1 = mbtowc (&wc, input, 1);
  printf ("%d\n", res1);

  int res2b = mbtowc (&wc, input, 4);
  printf ("%d\n", res2b);

  return 0;
}
===================================================================
$ gcc -O -Wall foo.c
$ ./a.out 
1
0
2
-1
-1

Expected output:

1
0
2
-1
2

Rationale:
The first line shows that the locale was correctly set. So the locale encoding
is UTF-8.
The second line that the UTF-8 encoding is non state dependent.
The third and fifth line show that the same call has different results, that
is, it must depend on a hidden state.
But the mbtowc specification says that "For a state-dependent encoding ...
Subsequent calls with s as other than a null pointer shall cause the internal 
state of the function to be altered as necessary."
However, the encoding in use here is not state dependent. Hence the function's
results should not depend on hidden state.

Reference:
POSIX:2008 specification of mbtowc:
<http://www.opengroup.org/onlinepubs/9699919799/functions/mbtowc.html>
Comment 1 Paolo Bonzini 2008-12-22 10:14:26 UTC
For completeness, I will add that this requirement is also present in ISO C at
the beginning of 7.20.7 (outside 7.20.7.2 which is where mbtowc is defined).

OTOH, the fact that "For a state-dependent encoding, each function is placed
into its initial state by a call for which its character pointer argument, s, is
a null pointer" does not imply that an implementation cannot do the same for
state-independent encodings too...
Comment 2 Ulrich Drepper 2008-12-26 18:54:54 UTC
The state does not only contain the shift state but also incomplete input.  That
explains the behavior and it is a correct implementation according to the
wording of the specification.

You'll have to get the standard body to explicitly confirm your reading before
anything will be changed.
Comment 3 Ulrich Drepper 2011-05-16 00:10:38 UTC
I'm closing this now.  The "state object" as meant to be used like this.  "State" as in stateful is something else.
Comment 4 Jackie Rosen 2014-02-16 19:41:25 UTC Comment hidden (spam)