2173 – Unable to read UCS-4 chars with fgetws() using fopen(..., "w,ccs=UCS-4LE")

Bug 2173 - Unable to read UCS-4 chars with fgetws() using fopen(..., "w,ccs=UCS-4LE")

Summary: Unable to read UCS-4 chars with fgetws() using fopen(..., "w,ccs=UCS-4LE")

Status:	RESOLVED FIXED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	libc (show other bugs)
Version:	2.3.5

Importance:	P2 normal
Target Milestone:	---
Assignee:	Ulrich Drepper

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-01-18 11:00 UTC by Hrvoje Niksic
Modified:	2018-04-19 14:02 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Hrvoje Niksic 2006-01-18 11:00:39 UTC

Summary: reading wide chars directly from an input stream seems to be
impossible, failing silently without any meaningful diagnostic.

I need to read raw wide chars from an input stream.  At a first glance, fgetws()
seemed like the function for that purpose.  However, the fgetws man page claims,
under NOTES:
       In  the  absence  of  additional information passed to the fopen()
       call, it is reasonable to expect that fgetws() will actually  read
       a  multibyte  string from the stream and then convert it to a wide
       character string.

This is a problem, as a "multibyte stream" cannot be expected to be composed of
raw wide chars.  However, GNU libc *does* allow "additional information" to be
passed to fopen, using the ",ccs=CODING" extension.  Unfortunately, that doesn't
seem to work -- only the first narrow character is read from the stream.  This
program demonstrates the problem:

// a.c:
#include <stdio.h>
#include <wchar.h>
#define countof(x) (sizeof(x) / sizeof(*(x)))
// read one line of wide chars from "fl" and print it out.
int main()
{
  wchar_t buf[128];
  FILE *fp = fopen("fl", "r,ccs=UCS-4LE");
  fgetws(buf, countof(buf), fp);
  if (ferror(fp)) perror("fgets");
  fclose(fp);
  printf("%ls", buf);
  return 0;
}

$ gcc a.c
$ printf 'a\0\0\0b\0\0\0' > fl
$ ./a.out
a

How to repeat:
  Run the provided program, as shown above.

Expected result:
  The characters "ab" are printed.

Actual result:
  The character "a" is printed.

iconv -l shows that UCS-4LE is a known encoding (when the encoding is changed to
an unknown one, fopen fails and the program crashes).

If the documentation is wrong, and it is in fact not possible to use ,ccs=CODING
to "hint" to fgetws (and other wide char functions) to directly read wide
characters, then fopen(..., "w,ccs=UCS-4LE") should probably fail.  Also, the
documentation should be amended not to imply that it is possible to read wide
chars by passing "additional information" to fopen.

Comment 1 Ulrich Drepper 2006-01-19 01:25:08 UTC

Fixed in CVS.

Comment 2 Srikrishna Erra 2009-12-29 08:51:28 UTC

(In reply to comment #1)
> Fixed in CVS.

Hello All,
   How can i get this fix?
I am also facing the same.

please let me know from where i can get this fix.

Thanks

regards,
Srikrishna Erra.