This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/20639] Inconsistency in definitions of fputwc(), putwc() and putwchar() in glibc


https://sourceware.org/bugzilla/show_bug.cgi?id=20639

--- Comment #5 from Igor Liferenko <igor.liferenko at gmail dot com> ---
Hi all,

This is the proof that current glibc-2.24 sources are incorrect. Quoting libc
reference, last paragraph of section 5.2:

  Some of the memory and string functions take single characters as arguments.
Since
  a value of type char is automatically promoted into a value of type int when
used as a
  parameter, the functions are declared with int as the type of the parameter
in question. In
  case of the wide character functions the situation is similar: the parameter
type for a single
  wide character is wint_t and not wchar_t. This would for many implementations
not be
  necessary since wchar_t is large enough to not be automatically promoted, but
since the
  ISO C standard does not require such a choice of types the wint_t type is
used.

Also, in section 6.1 it is said:

  wint_t is a data type used for parameters and variables that contain a single
wide
  character. As the name suggests this type is the equivalent of int when using
the
  normal char strings. The types wchar_t and wint_t often have the same
representation
  if their size is 32 bits wide but if wchar_t is defined as char the type
wint_t
  must be defined as int due to the parameter promotion.

>From this quote above it follows, that if "wchar_t" is represented as "char"
and
"wint_t" is represented as "int", the functions fputwc(), putwc() and
putwchar() must
be compiled the same way as fputc(), putc() and putchar() - without warnings.
But with current
glibc sources there will be warnings in such case. 

If we look at the definitions of analogous non-wide character functions -
fputc(), putc()
and putchar(), we will see that they all take argument of type "int".
Although, according to comment #1 they could reasonably be defined with
argument of type
"char", because we need to check for EOF first anyway.
The purpose that the type is "int" in non-wide character functions is _exactly_
to get rid
of type conversion warnings.

Therefore, I suggest to fix the function interfaces this way:

- wint_t fputwc (wchar_t wc, _IO_FILE *fp) {
+ wint_t fputwc (wint_t wc, _IO_FILE *fp) {

- wint_t putwc (wchar_t wc, _IO_FILE *fp) {
+ wint_t putwc (wint_t wc, _IO_FILE *fp) {

- wint_t putwchar (wchar_t wc) {
+ wint_t putwchar (wint_t wc) {

These fixes will not break any existing code. Explanation follows:

+++++++++++

It is said in libc reference that GNU libc uses UCS-4:

(section 6.1)

    in the GNU C Library wchar_t is always 32 bits

(section 6.3.1)

    wide character set is always UCS-4 in the GNU C Library

(section 6.5):

    in the case of the GNU C Library it is
    always UCS-4 encoded ISO 10646

It is also explained in the reference what is UCS-4:

(section 6.1)

    UCS-4 is a 32-bit word

(section 6.5.4.2)

    UCS-4 value consists of four bytes

Also, in section 6.1 it is written:

    ISO 10646 was designed to be a 31-bit large code space

Finally, in section 6.1 it is written:

    The macro WEOF evaluates to a constant expression
    of type wint_t whose value is different from any
    member of the extended character set.

>From the above follows that wchar_t can hold 2^31 extra values different from
any member of the extended character set,
with the first bit being "1":

    1xxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx

Any of these extra values can be used to represent WEOF.

There will be no problem if wint_t will be signed (and WEOF will be -1), since
both -1 and 0xffffffff have first bit "1", in any representation.

+++++++++++

In fact, the right type for wint_t is already stated in libc reference (section
12.15):

    With the GNU C Library, WEOF is -1.

(because for comparison with -1, wint_t must be signed)

But in the current glibc-2.24 sources (wctype.h and wchar.h), WEOF is unsigned:

    # define WEOF (0xffffffffu)

This is an inconsistency.



Regards,
Igor

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]