This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/2173] New: Unable to read UCS-4 chars with fgetws() using fopen(..., "w,ccs=UCS-4LE")


Summary: reading wide chars directly from an input stream seems to be
impossible, failing silently without any meaningful diagnostic.

I need to read raw wide chars from an input stream.  At a first glance, fgetws()
seemed like the function for that purpose.  However, the fgetws man page claims,
under NOTES:
       In  the  absence  of  additional information passed to the fopen()
       call, it is reasonable to expect that fgetws() will actually  read
       a  multibyte  string from the stream and then convert it to a wide
       character string.

This is a problem, as a "multibyte stream" cannot be expected to be composed of
raw wide chars.  However, GNU libc *does* allow "additional information" to be
passed to fopen, using the ",ccs=CODING" extension.  Unfortunately, that doesn't
seem to work -- only the first narrow character is read from the stream.  This
program demonstrates the problem:

// a.c:
#include <stdio.h>
#include <wchar.h>
#define countof(x) (sizeof(x) / sizeof(*(x)))
// read one line of wide chars from "fl" and print it out.
int main()
{
  wchar_t buf[128];
  FILE *fp = fopen("fl", "r,ccs=UCS-4LE");
  fgetws(buf, countof(buf), fp);
  if (ferror(fp)) perror("fgets");
  fclose(fp);
  printf("%ls", buf);
  return 0;
}

$ gcc a.c
$ printf 'a\0\0\0b\0\0\0' > fl
$ ./a.out
a

How to repeat:
  Run the provided program, as shown above.

Expected result:
  The characters "ab" are printed.

Actual result:
  The character "a" is printed.

iconv -l shows that UCS-4LE is a known encoding (when the encoding is changed to
an unknown one, fopen fails and the program crashes).

If the documentation is wrong, and it is in fact not possible to use ,ccs=CODING
to "hint" to fgetws (and other wide char functions) to directly read wide
characters, then fopen(..., "w,ccs=UCS-4LE") should probably fail.  Also, the
documentation should be amended not to imply that it is possible to read wide
chars by passing "additional information" to fopen.

-- 
           Summary: Unable to read UCS-4 chars with fgetws() using
                    fopen(..., "w,ccs=UCS-4LE")
           Product: glibc
           Version: 2.3.5
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper at redhat dot com
        ReportedBy: hniksic at xemacs dot org
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=2173

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]