This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
wrong charmap name for Shift_JIS
- To: libc-alpha at sources dot redhat dot com
- Subject: wrong charmap name for Shift_JIS
- From: Bruno Haible <haible at ilog dot fr>
- Date: Mon, 28 May 2001 16:36:08 +0200 (CEST)
Hi,
For the Shift_JIS encoding, glibc uses the name "SJIS", but in the IANA
charset registry "SJIS" doesn't exist, only "Shift_JIS" (as preferred MIME
name) and "MS_Kanji". Use of the standard name "Shift_JIS" as argument to
localedef doesn't lead to a working locale:
# localedef -c -f SHIFT_JIS -i ja_JP ja_JP.SJIS
<lots of error messages>
# LC_ALL=ja_JP.SJIS locale charmap
ANSI_X3.4-1968
And use of "SJIS" leads to nl_langinfo(CODESET) returning a nonstandard name:
# localedef -c -f SJIS -i ja_JP ja_JP.SJIS
character map `SJIS' is not ASCII compatible, locale not ISO C compliant
# LC_ALL=ja_JP.SJIS locale charmap
SJIS
The fact that GNU gettext expects PO files labelled with "charset=SJIS",
a choice which was made for consistency with glibc, has been reported as
a bug in GNU gettext.
To fix this, here is a patch along the same lines as we did with GB2312
(which was previously named EUC-CN in glibc). It leads to the following
behaviour:
# localedef -c -f SHIFT_JIS -i ja_JP ja_JP.SJIS
character map `SHIFT_JIS' is not ASCII compatible, locale not ISO C compliant
# LC_ALL=ja_JP.SJIS locale charmap
SHIFT_JIS
# localedef -c -f SJIS -i ja_JP ja_JP.SJIS
character map `SHIFT_JIS' is not ASCII compatible, locale not ISO C compliant
# LC_ALL=ja_JP.SJIS locale charmap
SHIFT_JIS
localedata/ChangeLog:
2001-05-26 Bruno Haible <haible@clisp.cons.org>
* charmaps/SHIFT_JIS: Renamed from charmaps/SJIS. Change code_set_name
to SHIFT_JIS. Add SJIS as alias.
* Makefile (CHARMAPS): For SJIS locale, use SHIFT_JIS charmap.
* gen-locale.sh: Likewise.
ChangeLog:
2001-05-26 Bruno Haible <haible@clisp.cons.org>
* iconvdata/tst-tables.sh: For SJIS module, use SHIFT_JIS charmap.
* manual/charset.texi: Write Shift_JIS, not Shift-JIS.
Please rename localedata/charmaps/SJIS to localedata/charmaps/SHIFT_JIS
before applying the patch.
--- glibc-20010430/localedata/charmaps/SJIS.bak Mon Dec 4 19:53:45 2000
+++ glibc-20010430/localedata/charmaps/SHIFT_JIS Sat May 26 16:22:11 2001
@@ -1,9 +1,10 @@
-<code_set_name> SJIS
+<code_set_name> SHIFT_JIS
<comment_char> %
<escape_char> /
<mb_cur_min> 1
<mb_cur_max> 2
+% alias SJIS
CHARMAP
<U0000> /x00 NULL (NUL)
<U0001> /x01 START OF HEADING (SOH)
--- glibc-20010430/localedata/Makefile.bak Tue Feb 6 14:39:11 2001
+++ glibc-20010430/localedata/Makefile Sat May 26 16:40:33 2001
@@ -125,7 +125,8 @@
en_US.ISO-8859-1 ja_JP.EUC-JP da_DK.ISO-8859-1 \
hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1
LOCALE_SRCS := $(shell echo "$(LOCALES)"|sed 's/\([^ .]*\)[^ ]*/\1/g')
-CHARMAPS := $(shell echo "$(LOCALES)"|sed 's/[^ .]*[.]\([^ ]*\)/\1/g')
+CHARMAPS := $(shell echo "$(LOCALES)" | \
+ sed -e 's/[^ .]*[.]\([^ ]*\)/\1/g' -e s/SJIS/SHIFT_JIS/g)
CTYPE_FILES = $(addsuffix /LC_CTYPE,$(LOCALES))
generated-dirs += $(LOCALES)
--- glibc-20010430/localedata/gen-locale.sh.bak Thu Jul 13 19:45:51 2000
+++ glibc-20010430/localedata/gen-locale.sh Sat May 26 16:45:21 2001
@@ -1,6 +1,6 @@
#! /bin/sh
# Generate test locale files.
-# Copyright (C) 2000 Free Software Foundation, Inc.
+# Copyright (C) 2000-2001 Free Software Foundation, Inc.
# This file is part of the GNU C Library.
#
# The GNU C Library is free software; you can redistribute it and/or
@@ -43,4 +43,5 @@
charmap=`echo $locfile|sed 's|[^.]*[.]\(.*\)/LC_CTYPE|\1|'`
echo "Generating locale $locale.$charmap: this might take a while..."
-generate_locale $charmap $locale $locale.$charmap
+generate_locale `echo $charmap | sed -e s/SJIS/SHIFT_JIS/` $locale \
+ $locale.$charmap
--- glibc-20010430/iconvdata/tst-tables.sh.bak Sat Oct 28 01:18:54 2000
+++ glibc-20010430/iconvdata/tst-tables.sh Sat May 26 16:58:54 2001
@@ -1,5 +1,5 @@
#!/bin/sh
-# Copyright (C) 2000 Free Software Foundation, Inc.
+# Copyright (C) 2000-2001 Free Software Foundation, Inc.
# This file is part of the GNU C Library.
# Contributed by Bruno Haible <haible@clisp.cons.org>, 2000.
#
@@ -184,7 +184,7 @@
#
# Multibyte encodings come here
#
- SJIS
+ SJIS SHIFT_JIS
EUC-KR
CP949
JOHAB
--- glibc-20010430/manual/charset.texi.bak Mon Apr 30 22:26:42 2001
+++ glibc-20010430/manual/charset.texi Sat May 26 16:28:18 2001
@@ -247,6 +247,7 @@
bytes.
@cindex EUC
+@cindex Shift_JIS
@cindex SJIS
In most uses of @w{ISO 2022} the defined character sets do not allow
state changes which cover more than the next character. This has the
@@ -254,7 +255,7 @@
sequence of a character one can interpret a text correctly. Examples of
character sets using this policy are the various EUC character sets
(used by Sun's operations systems, EUC-JP, EUC-KR, EUC-TW, and EUC-CN)
-or SJIS (Shift-JIS, a Japanese encoding).
+or Shift_JIS (SJIS, a Japanese encoding).
But there are also character sets using a state which is valid for more
than one character and has to be changed by another byte sequence.