Bug 18399 - Review / update charmap data
Summary: Review / update charmap data
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: 2.21
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-12 06:59 UTC by Marko Myllynen
Modified: 2015-05-15 12:46 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marko Myllynen 2015-05-12 06:59:15 UTC
The localedata/charmaps/* files are probably, judging by comments in them, based on the corresponding standards and are thus fine as-is. However, there are at least few charmaps which should be reviewed and updated or even added/removed if deemed appropriate.

- ISO_10646 was not updated as part of bug 14094 work and thus looks outdated. Its use of mnemonics could also be reconsidered.

- POSIX Portable Character Set charmap is not available, should it?

http://pubs.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap06.html
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_02

- ISO_8859-1,GL looks like ISO-8859-1 written in mnemonics but the name is derived from its author and thus it doesn't look like a general purpose character map that should be part of glibc so perhaps it could be removed?

Thanks.
Comment 1 Marko Myllynen 2015-05-12 13:11:53 UTC
Actually charmaps using mnemonics won't work since the mnemonics files have been removed:

2000-07-07  Ulrich Drepper  <drepper@redhat.com>

        * locales/POSIX: Remove repertoire map reference.
        * locales/ar_SA: Remove repertoire map reference.

        * repertoiremaps/charids.894: Removed.
        * repertoiremaps/mnemonics.ds: Removed.
        * repertoiremaps/mnemonics.ja: Removed.
        * repertoiremaps/mnemonics.ko: Removed.
        * repertoiremaps/mnemonics.zh: Removed.

A quick grep indicates that the following charmaps are still using mnemonics:

ISO_10646
ISO_8859-1,GL
JIS_C6220-1969-JP
JIS_C6229-1984-A
JIS_C6229-1984-B-ADD
JIS_C6229-1984-HAND
JIS_C6229-1984-HAND-ADD
JIS_C6229-1984-KANA
NATS-DANO-ADD
NATS-SEFI-ADD

Also, if doing changes for charmaps, it's probably a good idea to check whether any related changes are needed for possibly corresponding iconv modules under iconvdata/.

Thanks.
Comment 2 keld@keldix.com 2015-05-13 16:07:01 UTC
On Tue, May 12, 2015 at 01:11:53PM +0000, myllynen at redhat dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=18399
> 
> --- Comment #1 from Marko Myllynen <myllynen at redhat dot com> ---
> Actually charmaps using mnemonics won't work since the mnemonics files have
> been removed:
> 
> 2000-07-07  Ulrich Drepper  <drepper@redhat.com>
> 
>         * locales/POSIX: Remove repertoire map reference.
>         * locales/ar_SA: Remove repertoire map reference.
> 
>         * repertoiremaps/charids.894: Removed.
>         * repertoiremaps/mnemonics.ds: Removed.
>         * repertoiremaps/mnemonics.ja: Removed.
>         * repertoiremaps/mnemonics.ko: Removed.
>         * repertoiremaps/mnemonics.zh: Removed.
> 
> A quick grep indicates that the following charmaps are still using mnemonics:
> 
> ISO_10646
> ISO_8859-1,GL
> JIS_C6220-1969-JP
> JIS_C6229-1984-A
> JIS_C6229-1984-B-ADD
> JIS_C6229-1984-HAND
> JIS_C6229-1984-HAND-ADD
> JIS_C6229-1984-KANA
> NATS-DANO-ADD
> NATS-SEFI-ADD
> 
> Also, if doing changes for charmaps, it's probably a good idea to check whether
> any related changes are needed for possibly corresponding iconv modules under
> iconvdata/.

Some mnemonics are mandated by the POSIX standard, ISO/IEC 9945.

I think it was an error to remove the more extensive mnemonics,, the hex
codes make the locales much harder to proofread.

Best regrads
Keld
Comment 3 Marko Myllynen 2015-05-15 12:23:33 UTC
(In reply to keld@keldix.com from comment #2)
> Some mnemonics are mandated by the POSIX standard, ISO/IEC 9945.

Checking for commonly used mnemonics like "dollar-sign" ("grep -ri dollar-sign glibc.git") shows that they are unimplemented. If you think this is important, perhaps filing a separate bug would be best.

> I think it was an error to remove the more extensive mnemonics,, the hex
> codes make the locales much harder to proofread.

There are nowadays some tips in the upstream manual page which help checking locales (see http://man7.org/linux/man-pages/man1/locale.1.html).

Thanks.