This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Character classifications and language-dependence


Hi,

Currently, many locale definition files that come with glibc (actually
mostly those of western languages) include the "i18n" FDCC-set under
their `LC_CTYPE' category.

However, the "i18n" FDCC-set contains a very broad character
classification: it considers at least all Latin, Greek and Cyrillic
letters as part of the `alpha' character class (as seen in Section 4.3.2
of ISO 14652 [0] and glibc's version).  Thus, all the languages whose
locale includes "i18n" end up having a lot of letters in their `alpha'
character class, more than actually exist in the language.

For instance, while `ê' (`e' circumflex) is a letter in French, it is
not a letter in Castellano; likewise, `ñ' is a letter in Castellano, but
not in French.  But since glibc's locale definitions for `fr_FR' and
`es_ES' both include "i18n", `isalpha(3)' returns true for both locales.

Section 4 of ISO 14652 reads:

  This Technical Report also defines an FDCC-set named "i18n" with
  values for some of the above categories in order to simplify FDCC-set
  descriptions for a number of cultures.  The contents of "i18n"
  categories should not necessarily be considered as the most commonly
  accepted values, while in many cases it could be the recommended
  values.

Thus, my understanding is that glibc's heavy use of "i18n" for character
classifications is acceptable, though not representative of "the most
commonly accepted values".  Therefore, one could for instance refine the
`fr_FR' character classification so that only French letters (e.g., not
`ñ') are found under its `alpha' class.

Is this correct?  If so, are there plans to actually refine (some of)
these character classifications?

Thanks,
Ludovic.

[0] http://www.open-std.org/jtc1/sc22/wg20/docs/projects#14652


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]