This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Complete GB18030 charmap


On Wed, May 9, 2012 at 5:08 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> "Carlos O'Donell" <carlos@systemhalted.org> writes:
>
>> How did you develop the patch?
>
> From ICU
> (http://source.icu-project.org/repos/icu/data/trunk/charset/source/gb18030
> and
> http://source.icu-project.org/repos/icu/icu/trunk/source/data/mappings/gb18030.ucm)

Thanks for this pointer. I've added a reference to ICU in the wiki
section on locales.

>> What testing did you do with this patch?
>
> tst-tables.sh tests for consistency.

Does the truncation of GB18030 in iconvdata/tst-table.sh still mean
that all unicode scalar values, as required, are tested for
conversion?
~~~
...
# When the charset is GB18030, truncate this table because for this encoding,
# the tst-table-from and tst-table-to programs scan the Unicode BMP only.
if test ${charset} = GB18030; then
  grep '0x....$' < ${objpfx}tst-${charset}.charmap.table \
    > ${objpfx}tst-${charset}.truncated.table
  mv ${objpfx}tst-${charset}.truncated.table
${objpfx}tst-${charset}.charmap.table
fi
...
~~~

My worry is that our testing doesn't test everything that is required
to verify GB18030 is correct.

Given the grep above I think we miss out testing the upper range e.g.
0x10000-0x10FFFF

Removing the grep I get:
~~~
This might take a while
Testing GB18030 *** FAILED ***
~~~

I'm not an expert *at all*, but I don't get a warm and fuzzy feeling
that we are testing everything for GB18030.

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]