This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

BUG REPORT: transliteration breaks wcrtomb and printf


BUG REPORT: glibc 2.1.94 fails the attached conformance test for ISO/IEC
9899:1999(E) section 7.19.3 paragraphs 11-14 and section 7.19.6.1
paragraph 8. The ISO C99 standard requires that all the wide i/o
functions as well as the %lc and %ls format specifiers in fprintf() have
to convert wide characters into the external multibyte representation
exactly "as if by a call to the wcrtomb function". However, in the
present implementation, transliteration is applied only for the wide
output functions (fputwc, etc.), but not for wcrtomb() and printf("%ls",
...). In other words, the wide output functions do currently not convert
as required "as if by a call to the wcrtomb function".

Reproducing the problem:

  - Compile the attached wtest4.c, which is a test program that
    outputs a wchar_t value (argv[1]) under a specified locale (argv[2])
    in eight different ways into eight files. In a conforming implementation,
    the resulting eight files must all be identical.
  - Run the test with the attached shell script wtest4-run. You will
    note that

      ./wtest4-run 0x00fc C
      ./wtest4-run 0x201c en_GB

    both fail because transliteration is used, while

      ./wtest4-run 0x0061 C
      ./wtest4-run 0x201c en_GB.UTF-8

    both pass because no transliteration is used.

Recommended fix:
wcrtomb() and printf("%ls", ...) have to be fixed to perform
transliteration, exactly as it is already implemented correctly for the
wide output functions. MB_CUR_MAX has to indicate the maximum number of
bytes that a call to fputwc() or equivalently wcrtomb() can output
in the presence of transliteration under the selected locale.

Urgency:
If this is not fixed, transliteration will remain practically useless,
and there will be much confusion, e.g. on whether wcwidth() is related
to the output of printf() or wprintf(). In addition, the standard
will be violated by locales that use transliteration.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

//
// Test conformance of C library wide-character/multi-byte
// implementation to ISO/IEC 9899:1999(E) section 7.19.3 paragraphs
// 11-14 and section 7.19.6.1 paragraph 8, which requires that all the
// wide i/o functions as well as the %lc and %ls format specifiers in
// fprintf() have to convert wide characters into the external
// multibyte representation exactly "as if by a call to the wcrtomb
// function".
//
// Conformance test:
// After a run of this program, all produced files "wtest4-?" must
// have exactly identical byte content and no assertion must have
// failed, or the C library is in violation of the standard.
//
// Markus.Kuhn@cl.cam.ac.uk -- 2000-10-08
//

// #define _ISOC99_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <assert.h>
#include <wchar.h>
#include <locale.h>

int main(int argc, char **argv) {
  FILE *f;
  wchar_t c = 0x00fc;   // LATIN SMALL LETTER U WITH DIAERESIS
  wchar_t s[2];
  wchar_t *ss;
  char buf[256];
  wint_t wr;
  size_t sr;
  int r;
  mbstate_t mbs;

  assert(sizeof(buf) > MB_CUR_MAX);

  if (argc > 1)
    c = strtol(argv[1], (char **) NULL, 0);
  if (argc > 2)
    if (!setlocale(LC_ALL, argv[2]))
      fprintf(stderr,"Couldn't set locale to '%s'!\n", argv[2]);

  s[0] = c;
  s[1] = L'\0';

  // test 0: wcrtomb
  f = fopen("wtest4-0", "w"); assert(f);
  sr = wcrtomb(NULL, L'\0', &mbs);
  assert(sr > 0);
  sr = wcrtomb(buf, c, &mbs);
  if (sr == (size_t)(-1)) {
    assert(errno == EILSEQ);
  } else {
    fwrite(buf, sr, 1, f);
    assert(sr > 0);
    assert(sr <= MB_CUR_MAX);
  }
  fclose(f);

  // test 1: wcsrtombs
  f = fopen("wtest4-1", "w"); assert(f);
  sr = wcrtomb(NULL, L'\0', &mbs);
  assert(sr > 0);
  ss = s;
  sr = wcsrtombs(buf, &ss, sizeof(buf), &mbs);
  if (sr == (size_t)(-1)) {
    assert(errno == EILSEQ);
  } else {
    assert(sr > 0);
    assert(sr <= MB_CUR_MAX);
    assert(ss == NULL);
    fwrite(buf, sr, 1, f);
  }
  fclose(f);

  // test 2: fputwc
  f = fopen("wtest4-2", "w"); assert(f);
  wr = fputwc(c, f);
  if (wr == WEOF) {
    assert(errno == EILSEQ);
  } else {
    assert(wr == (wint_t) c);
  }
  fclose(f);

  // test 3: fputws
  f = fopen("wtest4-3", "w"); assert(f);
  r = fputws(s, f);
  if (r == EOF) {
    assert(errno == EILSEQ);
  } else {
    assert(r >= 0);
  }
  fclose(f);

  // test 4: fprintf %ls
  f = fopen("wtest4-4", "w"); assert(f);
  r = fprintf(f, "%ls", s);
  if (r < 0) {
    assert(errno == EILSEQ);
  } else {
    assert(r <= (int) MB_CUR_MAX);
  }
  fclose(f);

  // test 5: fprintf %lc
  f = fopen("wtest4-5", "w"); assert(f);
  r = fprintf(f, "%lc", c);
  if (r < 0) {
    assert(errno == EILSEQ);
  } else {
    assert(r <= (int) MB_CUR_MAX);
  }
  fclose(f);

  // test 6: fwprintf %ls
  f = fopen("wtest4-6", "w"); assert(f);
  r = fwprintf(f, L"%ls", s);
  if (r < 0) {
    assert(errno == EILSEQ);
  } else {
    assert(r == 1);
  }
  fclose(f);

  // test 7: fwprintf %lc
  f = fopen("wtest4-7", "w"); assert(f);
  r = fwprintf(f, L"%lc", c);
  if (r < 0) {
    assert(errno == EILSEQ);
  } else {
    assert(r == 1);
  }
  fclose(f);

  return 0;
}
#!/bin/bash
make wtest4
rm -f wtest4-?
./wtest4 $*
ls -l wtest4-?
cmp wtest4-0 wtest4-1 && \
cmp wtest4-0 wtest4-2 && \
cmp wtest4-0 wtest4-3 && \
cmp wtest4-0 wtest4-4 && \
cmp wtest4-0 wtest4-5 && \
cmp wtest4-0 wtest4-6 && \
cmp wtest4-0 wtest4-7 && echo Test PASSED

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]