This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug libc/2173] New: Unable to read UCS-4 chars with fgetws() using fopen(..., "w,ccs=UCS-4LE")
- From: "hniksic at xemacs dot org" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sources dot redhat dot com
- Date: 18 Jan 2006 11:00:42 -0000
- Subject: [Bug libc/2173] New: Unable to read UCS-4 chars with fgetws() using fopen(..., "w,ccs=UCS-4LE")
- Reply-to: sourceware-bugzilla at sourceware dot org
Summary: reading wide chars directly from an input stream seems to be
impossible, failing silently without any meaningful diagnostic.
I need to read raw wide chars from an input stream. At a first glance, fgetws()
seemed like the function for that purpose. However, the fgetws man page claims,
under NOTES:
In the absence of additional information passed to the fopen()
call, it is reasonable to expect that fgetws() will actually read
a multibyte string from the stream and then convert it to a wide
character string.
This is a problem, as a "multibyte stream" cannot be expected to be composed of
raw wide chars. However, GNU libc *does* allow "additional information" to be
passed to fopen, using the ",ccs=CODING" extension. Unfortunately, that doesn't
seem to work -- only the first narrow character is read from the stream. This
program demonstrates the problem:
// a.c:
#include <stdio.h>
#include <wchar.h>
#define countof(x) (sizeof(x) / sizeof(*(x)))
// read one line of wide chars from "fl" and print it out.
int main()
{
wchar_t buf[128];
FILE *fp = fopen("fl", "r,ccs=UCS-4LE");
fgetws(buf, countof(buf), fp);
if (ferror(fp)) perror("fgets");
fclose(fp);
printf("%ls", buf);
return 0;
}
$ gcc a.c
$ printf 'a\0\0\0b\0\0\0' > fl
$ ./a.out
a
How to repeat:
Run the provided program, as shown above.
Expected result:
The characters "ab" are printed.
Actual result:
The character "a" is printed.
iconv -l shows that UCS-4LE is a known encoding (when the encoding is changed to
an unknown one, fopen fails and the program crashes).
If the documentation is wrong, and it is in fact not possible to use ,ccs=CODING
to "hint" to fgetws (and other wide char functions) to directly read wide
characters, then fopen(..., "w,ccs=UCS-4LE") should probably fail. Also, the
documentation should be amended not to imply that it is possible to read wide
chars by passing "additional information" to fopen.
--
Summary: Unable to read UCS-4 chars with fgetws() using
fopen(..., "w,ccs=UCS-4LE")
Product: glibc
Version: 2.3.5
Status: NEW
Severity: normal
Priority: P2
Component: libc
AssignedTo: drepper at redhat dot com
ReportedBy: hniksic at xemacs dot org
CC: glibc-bugs at sources dot redhat dot com
http://sourceware.org/bugzilla/show_bug.cgi?id=2173
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.