This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Default installation of UTF-8 locales


Ulrich Drepper wrote on 2000-11-08 17:11 UTC:
> Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk> writes:
> > I do fully understand that UTF-8 locale files consume a bit more disk
> > space, but I think it is very important symbolically that at least the
> > following ones are installed by default and considered to be officially
> > supported in 2.2
> 
> No.  And exactly for the reason you got a reply: every
> "i18n-enthusiast" in every country will jump up and say "me too".  I
> already get complains about the size of all the unused data installed
> and the time to make the installation (it really takes quite some time
> to process UTF-8 locales on slow and small systems).

My short list was meant to be roughly equivalent to what for instance
Solaris installs by default. It also covers probably almost half of all
Linux users worldwide.

Sun is even smarter here, because they factored out the country
identifiers from many locales. The country identifier affects in most
language variants only the money formatting code (which NOBODY uses
ever!), and therefore dramatically reduces the number of locales. For
English for instance, I really don't see why more than "en" (generic
Commonwealth English, A4 paper, metric units, unspecified monetary
formatting, etc.) and "en_US" (US English, US paper, US units, USD) are
necessary.

I think we should factor out most countries of the locales in glibc
2.2.1, such that installing all locales in UTF-8 becomes feasible. That
would leave a far more manageable set (57) of locales:

POSIX af ar be bg ca cs da de el en es et eu fa fi fo fr ga gl gv he hi
hr hu i18n id is it iw ja kl ko kw lt lv mk mr mt nl nn no pl pt ro ru
sk sl sq sr sr@cyrillic sv th tr uk vi zh

instead of the currently ridiculously large (136) set

POSIX af_ZA ar_AE ar_BH ar_DZ ar_EG ar_IQ ar_JO ar_KW ar_LB ar_LY ar_MA
ar_OM ar_QA ar_SA ar_SD ar_SY ar_TN ar_YE be_BY bg_BG ca_ES ca_ES@euro
cs_CZ da_DK de_AT de_AT@euro de_BE de_BE@euro de_CH de_DE de_DE@euro
de_LU de_LU@euro el_GR en_AU en_BW en_CA en_DK en_GB en_IE en_IE@euro
en_NZ en_US en_ZA en_ZW es_AR es_BO es_CL es_CO es_CR es_DO es_EC es_ES
es_ES@euro es_GT es_HN es_MX es_NI es_PA es_PE es_PR es_PY es_SV es_US
es_UY es_VE et_EE eu_ES eu_ES@euro fa_IR fi_FI fi_FI@euro fo_FO fr_BE
fr_BE@euro fr_CA fr_CH fr_FR fr_FR@euro fr_LU fr_LU@euro ga_IE
ga_IE@euro gl_ES gl_ES@euro gv_GB he_IL hi_IN hr_HR hu_HU i18n id_ID
is_IS iso14651_t1 it_CH it_IT it_IT@euro iw_IL ja_JP kl_GL ko_KR kw_GB
lt_LT lv_LV mk_MK mr_IN mt_MT nl_BE nl_BE@euro nl_NL nl_NL@euro nn_NO
no_NO pl_PL pt_BR pt_PT pt_PT@euro ro_RO ru_RU ru_UA sk_SK sl_SI sq_AL
sr_YU sr_YU@cyrillic sv_FI sv_FI@euro sv_SE th_TH tr_TR uk_UA vi_VN
zh_CN zh_HK zh_TW

I understand that the Li18nux specification requires lots of UTF-8
locales to be present in distributions, so I guess many distributors
will generate them all anyway (and that will waste in contrast to my
short list really a noticeable amount of memory, ca. 0.4 MB per European
UTF-8 locale, mostly due to the LC_COLLATE data (0.3 MB)). I'd rather
prefer Li18nux to also factor out some of the country variants to save
disk space.

I agree that on-demand creation of locales (what TeX distributions have
been done for ages with fonts) also seems like a very good idea. Perhaps
there is a way to extend localedef such that it can be made an s-bit
program safely such that users can install system-wide locales securely.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]