This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Complete GB18030 charmap
On Wed, May 9, 2012 at 5:08 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> "Carlos O'Donell" <carlos@systemhalted.org> writes:
>
>> How did you develop the patch?
>
> From ICU
> (http://source.icu-project.org/repos/icu/data/trunk/charset/source/gb18030
> and
> http://source.icu-project.org/repos/icu/icu/trunk/source/data/mappings/gb18030.ucm)
Thanks for this pointer. I've added a reference to ICU in the wiki
section on locales.
>> What testing did you do with this patch?
>
> tst-tables.sh tests for consistency.
Does the truncation of GB18030 in iconvdata/tst-table.sh still mean
that all unicode scalar values, as required, are tested for
conversion?
~~~
...
# When the charset is GB18030, truncate this table because for this encoding,
# the tst-table-from and tst-table-to programs scan the Unicode BMP only.
if test ${charset} = GB18030; then
grep '0x....$' < ${objpfx}tst-${charset}.charmap.table \
> ${objpfx}tst-${charset}.truncated.table
mv ${objpfx}tst-${charset}.truncated.table
${objpfx}tst-${charset}.charmap.table
fi
...
~~~
My worry is that our testing doesn't test everything that is required
to verify GB18030 is correct.
Given the grep above I think we miss out testing the upper range e.g.
0x10000-0x10FFFF
Removing the grep I get:
~~~
This might take a while
Testing GB18030 *** FAILED ***
~~~
I'm not an expert *at all*, but I don't get a warm and fuzzy feeling
that we are testing everything for GB18030.
Cheers,
Carlos.