1015 – be_BY@tarask: new locale

Bug 1015 - be_BY@tarask: new locale

Summary: be_BY@tarask: new locale

Status:	REOPENED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	localedata (show other bugs)
Version:	unspecified

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	GNU C Library Locale Maintainers

URL:
Keywords:

Duplicates (2):	4020 7014 (view as bug list)
Depends on:
Blocks:

Reported:	2005-06-16 15:11 UTC by Alexander Mikhailian
Modified:	2017-11-28 22:41 UTC (History)
CC List:	8 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
be_BY@classic belarusian locale (1.03 KB, text/plain) 2005-06-22 00:27 UTC, Alexander Mikhailian	Details
be_BY@tarask locale definition for glibc (1.02 KB, text/plain) 2010-01-17 17:11 UTC, Hleb Valoshka	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Alexander Mikhailian 2005-06-16 15:11:28 UTC

I am listed as a contributor for the be_BY locale

We had a heated discussion almost 1 year ago on the belarusian i18n mailing list
<i18n@mova.org> around belarusian locales.

Many belarusian translators for FOSS software are on the list and we invited
Bruno Haible to help us on the topic.

Upon the discussion results, I suggest introducing a new locale named
be_BY@classic for the Belarusian classic writing which is the only productive
(in the linguistic sense) writing in Belarus nowadays.

This matches the advice of Bruno. Whoever will commit this change should contact
him if more details are required. Petter Reinholdtsen has also been included in
the dicussion but fell off pretty quickily.

Below comes the contents of the locale definition:


comment_char %
escape_char  /
%
% Belarusian Language Locale for Belarus
% Contact: Alexander Mikhailian
% Email: mikhailian@altern.org
% Language: be
% Territory: BY
% Revision: 0.5
% Date: 2004-08-24
% Application: general
% Users: general
% Repertoiremap: mnemonic.ds
% Charset: CP1251, UTF-8
% Distribution and use is free, also
% for commercial purposes.

LC_IDENTIFICATION
title      "Belarusian locale for Belarus, traditional spelling"
source     "Belarusian i18n mailing list"
address    "i18n@mova.org"
contact    "Alexander Mikhailian"
email      "mikhailian@altern.org"
tel        "+32 494 60 91 31"
fax        ""
language   "Belarusian"
territory  "Belarus"
revision   "1.0"
date       "2000-06-29"
audience   ""
application ""
abbreviation "taraskievica"
%
category  "be_BY:2000";LC_IDENTIFICATION
category  "be_BY:2000";LC_CTYPE
category  "be_BY:2000";LC_COLLATE
category  "be_BY:2000";LC_TIME
category  "be_BY:2000";LC_NUMERIC
category  "be_BY:2000";LC_MONETARY
category  "be_BY:2000";LC_MESSAGES
category  "be_BY:2000";LC_PAPER
category  "be_BY:2000";LC_TELEPHONE
category  "be_BY:2000";LC_MEASUREMENT
category  "be_BY:2000";LC_NAME
category  "be_BY:2000";LC_ADDRESS

END LC_IDENTIFICATION

LC_COLLATE
copy "iso14651_t1"

% iso14651_t1 is missing Ukrainian ghe
collating-symbol <UKR-GHE>

reorder-after <CYR-GZHE>
<UKR-GHE>

reorder-after <U0453>
<U0491> <UKR-GHE>;<BAS>;<MIN>;IGNORE

reorder-after <U0403>
<U0490> <UKR-GHE>;<BAS>;<CAP>;IGNORE

reorder-end
END LC_COLLATE

LC_CTYPE
copy "i18n"
END LC_CTYPE

LC_MESSAGES
yesexpr "<U005E><U005B><U0414><U0434><U0059><U0079><U005D><U002E><U002A>"
noexpr  "<U005E><U005B><U041D><U043D><U004E><U006E><U005D><U002E><U002A>"
END LC_MESSAGES

LC_MONETARY
int_curr_symbol           "<U0042><U0059><U0052><U0020>"
currency_symbol           "<U0440><U0443><U0431>"
mon_decimal_point         "<U002E>"
mon_thousands_sep         "<U0020>"
mon_grouping              3;3
positive_sign             ""
negative_sign             "<U002D>"
int_frac_digits           2
frac_digits               2
p_cs_precedes             0
p_sep_by_space            1
n_cs_precedes             0
n_sep_by_space            1
p_sign_posn               1
n_sign_posn               1
END LC_MONETARY

LC_NUMERIC
decimal_point             "<U002C>"
thousands_sep             "<U002E>"
grouping                  3;3
END LC_NUMERIC

LC_TIME
day     "<U041D><U044F><U0434><U0437><U0435><U043B><U044F>";/
        "<U041F><U0430><U043D><U044F><U0434><U0437><U0435><U043B><U0430><U043A>";/
        "<U0410><U045E><U0442><U043E><U0440><U0430><U043A>";/
        "<U0421><U0435><U0440><U0430><U0434><U0430>";/
	"<U0427><U0430><U0446><U044C><U0432><U0435><U0440>";/
        "<U041F><U044F><U0442><U043D><U0456><U0446><U0430>";/
        "<U0421><U0443><U0431><U043E><U0442><U0430>"
abday   "<U041D><U044F><U0434>";/
        "<U041F><U0430><U043D>";/
        "<U0410><U045E><U0442>";/
        "<U0421><U0440><U0434>";/
        "<U0427><U0446><U0432>";/
        "<U041F><U044F><U0442>";/
        "<U0421><U0443><U0431>"
first_weekday 2
first_workday 2
mon     "<U0421><U0442><U0443><U0434><U0437><U0435><U043D><U044C>";/
        "<U041B><U044E><U0442><U044B>";/
        "<U0421><U0430><U043A><U0430><U0432><U0456><U043A>";/
        "<U041A><U0440><U0430><U0441><U0430><U0432><U0456><U043A>";/
        "<U0422><U0440><U0430><U0432><U0435><U043D><U044C>";/
        "<U0427><U044D><U0440><U0432><U0435><U043D><U044C>";/
        "<U041B><U0456><U043F><U0435><U043D><U044C>";/
        "<U0416><U043D><U0456><U0432><U0435><U043D><U044C>";/
        "<U0412><U0435><U0440><U0430><U0441><U0435><U043D><U044C>";/
        "<U041A><U0430><U0441><U0442><U0440><U044B><U0447><U043D><U0456><U043A>";/
        "<U041B><U0456><U0441><U0442><U0430><U043F><U0430><U0434>";/
        "<U0421><U044C><U043D><U0435><U0436><U0430><U043D><U044C>"
abmon   "<U0421><U0442><U0434>";/
        "<U041B><U044E><U0442>";/
        "<U0421><U0430><U043A>";/
        "<U041A><U0440><U0441>";/
        "<U0422><U0440><U0430>";/
        "<U0427><U044D><U0440>";/
        "<U041B><U0456><U043F>";/
        "<U0416><U043D><U0432>";/
        "<U0412><U0440><U0441>";/
        "<U041A><U0441><U0442>";/
        "<U041B><U0456><U0441>";/
        "<U0421><U043D><U0436>"
d_t_fmt "<U0025><U0061><U0020><U0025><U0064><U0020><U0025><U0062>/
<U0020><U0025><U0059><U0020><U0025><U0054>"
d_fmt   "<U0025><U0064><U002E><U0025><U006D><U002E><U0025><U0059>"
t_fmt   "<U0025><U0054>"
am_pm   "";""
t_fmt_ampm  ""
date_fmt    "<U0025><U0061><U0020><U0025><U0062><U0020><U0025><U0065>/
<U0020><U0025><U0048><U003A><U0025><U004D><U003A><U0025><U0053><U0020>/
<U0025><U005A><U0020><U0025><U0059>"
END LC_TIME

LC_PAPER
% FIXME
height   297
% FIXME
width    210
END LC_PAPER

LC_TELEPHONE
tel_int_fmt    "<U002B><U0025><U0063><U0020><U0025><U0061><U0020><U0025>/
<U006C>"
int_prefix     "<U0033><U0037><U0035>"
int_select     "<U0038><U007E><U0031><U0030>"
END LC_TELEPHONE

LC_MEASUREMENT
% FIXME
measurement    1
END LC_MEASUREMENT

LC_NAME
name_mr     "<U0441><U043F><U0430><U0434><U0430><U0440>"
name_ms     "<U0441><U043F><U0430><U0434><U0430><U0440><U044B><U043D><U044F>"
name_mrs    "<U0441><U043F><U0430><U0434><U0430><U0440><U044B><U043D><U044F>"
name_miss   ""
name_gen    ""
name_fmt    "<U0025><U0064><U0025><U0074><U0025><U0067><U0025><U0074>/
<U0025><U006D><U0025><U0074><U0025><U0066>"
END LC_NAME

LC_ADDRESS
postal_fmt  "<U0025><U0066><U0025><U004E><U0025><U0061><U0025><U004E>/
<U0025><U0064><U0025><U004E><U0025><U0062><U0025><U004E><U0025><U0073>/
<U0020><U0025><U0068><U0020><U0025><U0065><U0020><U0025><U0072><U0025>/
<U004E><U0025><U0025><U007A><U0020><U0025><U0054><U0025>/
<U004E><U0025><U0063><U0025><U004E>"
country_name "<U0411><U0435><U043B><U0430><U0440><U0443><U0441><U044C>"
country_post ""
country_car "<U0042><U0059>"
country_ab2 "<U0042><U0059>"
country_ab3 "<U0042><U004C><U0052>"
country_num 112
country_isbn 5
lang_name   "<U0411><U0435><U043B><U0430><U0440><U0443><U0441><U043A>/
<U0430><U044F><U0020><U043C><U043E><U0432><U0430>"
END LC_ADDRESS

Comment 1 Denis Barbier 2005-06-16 20:49:24 UTC

Alexander, you should consider replacing LC_* sections by
  copy "be_BY"
when they are identical to be_BY (i.e. all except LC_TIME,
from what I've seen, and of course LC_IDENTIFICATION),
this would help future maintenance of your locale files.

Comment 2 Alexander Mikhailian 2005-06-22 00:27:23 UTC

Created attachment 529 [details]
be_BY@classic belarusian locale

OK, here comes the modified version. Sorry for the delay.

Comment 3 Yury Tarasievich 2005-07-12 09:20:46 UTC

I think, word "classic" here would be misleading.

This qualifier of the aforementioned orthography is self-awarded and is inaccurate by any definition, as both qualifier 
and orthography aren't used or recognized as such by anybody excepting rather minor minority.

I'd suggest naming this be_BY@alternative. This would be 100% accurate per all definitions of "classic" and 
"alternative", pertain accurately to the goals of the mentioned minority movement, and there's a precedent of the 
usage of the term "alternative" in this part of the world (DOS Cyrillic codepages in late 1980-s).

Also, you won't have a problem with multitude of "alternatives" as there seems to exist now some kind of standard 
on this alternative.

Yes, I was participating in aforementioned discussion and No, I won't re-start the discussion *here*, unless asked 
for further explanations. However, feel free to contact me privately via e-mail, if needed or interested.

Comment 4 Wad V Mashckoff 2005-08-03 13:26:00 UTC

Belarusian locale is very necessary!

+1 !

Comment 5 Kirill A. Shutemov 2005-08-03 13:30:52 UTC

> I think, word "classic" here would be misleading.
See to linguistic literature - http://www.knihi.net/index.php?productID=224
"classic" is name of the spelling.

Comment 6 Aliaksei 2005-08-03 13:55:16 UTC

We need Belarusian locale

Comment 7 Yury Tarasievich 2005-08-04 09:35:22 UTC

I didn't make myself clear, then. This isn't about linguistics at all.

In Belarusian language community, there exists certain interest group, promoting use of several (long obsoleted) 
orthography rules. This group calls their variant of Belarusian orthography "classic".
Alexander Mikhailian proposes creating additional be_BY@... branch, which would assert usage of the mentioned 
orthography variant. And that's perfectly okay! Just the qualifier isn't chosen well.
I wouldn't put "classic" but rather, e.g., "alternative" there because:

The term "classic", by every definition, is something well-recognized, widely or traditionally used.
However, virtually nobody in Belarusian community outside of the interest group (which isn't numerous and/or 
popular!) recognizes the mentioned variant as "classic", neither by knowing or referring the name, nor by usage 
tradition -- as the variant's key orthography features were obsoleted about 70 years ago!
Even the group-promoted usage of name "classic" started, it seems, between 1992 and 1994 (judging by two big 
publications on Belarusian orthography by one of the group leaders).

On the other hand, term "alternative" here would be immediately recognizable, both by popular understanding and 
by group self-imaging.

P.S. The book Kirill A. Shutemov pointed me to contains one of the editions of the mentioned orthography variant, 
published by the interest group, supervised, even authored, it seems, by one of the interest group leaders.

Comment 8 Ulrich Drepper 2005-10-14 18:05:17 UTC

I really have no interest to get in the middle of all this.  The extension
@classic seems indeed to be wrong to me from what I read.  And there is already
a Belarusian locale.

Unless this second language variant is the official one (which I doubt it is) it
is best to just collect a tarball with all the appropriate files and distribute
it separately.  There is nothing a separately distribute locale source file
cannot do if it is compiled using localedef upon installation.

Adding variants like this (as opposed to Latin vs Cyrillic, for instance) would
mean we open ourselves to all kind of fights like this.

So, unless I get some really convincing arguments I'll close this as WONTFIX.

Comment 9 Yury Tarasievich 2005-10-18 08:25:10 UTC

(In reply to comment #8)
...
> So, unless I get some really convincing arguments I'll close this as WONTFIX.

If nothing else comes up, I'm supporting this as it goes.

Comment 10 Siarhej Shupa 2005-12-08 13:45:31 UTC

Yury Tarasievich says:

"I think, word "classic" here would be misleading. This qualifier of the 
aforementioned orthography is self-awarded and is inaccurate by any definition, 
as both qualifier and orthography aren't used or recognized as such by anybody 
excepting rather minor minority."

I wouldn't like to start the old discussion with Mr Tarasevich who has some 
unexplained repugnance for that other orthography and has been the only one to 
fight it vehemently in all relevant net discussions.

However, I want to draw your attention to an inaccuracy in his comment - upon 
which all his argumentation is built:

Those who follow that "other" orthography are in slight MAJORITY (not in minor 
minority) on the Net. You can easily prove it - just google any pair of words 
spelled differently in the two orthographies.

Comment 11 Yury Tarasievich 2005-12-13 23:09:09 UTC

Let me re-iterate (and bring this back to topic):

I am, generally, *in* *favour* of this separation of locales.

The way I see it, if folks want their very own sub-locale, then okay and good riddance.
There's already latin-scripted sub-locale approved, created, I hear, for the userbase that is yet to 
emerge one day. So why not one extra?

But then, the initially proposed "classic" qualifier is inappropriate and unmerited, either measured by 
popular support or by usage tradition.
And google hits aren't relevant at all to this exact question of being or not being classic.
Other things, and quite material at that, are.

Be it noticed, I do *not* accept even the general quality of Mr.Shupa's expoundations. 
But that kind of discussion would be well out of scope of this issue.

Comment 12 Viktar Siarheichyk 2005-12-14 08:35:04 UTC

If the word 'classic' is controversial point then let name it e.g.
'alternative'. Do not let our puristic discussions lead us to nothing. But we
need the locale.

Comment 13 Yury Tarasievich 2005-12-26 12:24:56 UTC

Just for the record: I've nothing against naming this branch "alternative".

Comment 14 booxter 2006-11-09 11:47:00 UTC

So what's the result?
Maybe, it's time to register be_BY@alternative and to move/copy the existing
*.po files of GNOME/coreutils etc. into this "namespace"?

Also we should change the be_BY locale for the norms of standard Belarusian I think.

Comment 15 booxter 2006-11-13 10:41:28 UTC

BTW, the Debian be-locale-data supports the be_BY@alternative extension quite a
lot. Please, make this extension upstream.

Comment 16 booxter 2007-12-19 21:02:19 UTC

IANA authorities has already approved the official name of alternative
Belarusian orthography variant: be-tarask
Here is the link: http://www.iana.org/assignments/language-subtag-registry
Can we register this locale then?

Comment 17 Hleb Valoshka 2008-05-17 05:32:38 UTC

we really need additional belarusian locale (be@tarask as aproved by IANA). Just
because the most of translations made for be@tarask, not for be, and you can't
ignore this fact

Comment 18 booxter 2009-10-23 20:29:30 UTC

Dear Ulrich: Is there any chance we can get the bug fixed in glibc?

Comment 19 Yauhen Kharuzhy 2009-10-23 20:42:19 UTC

We want to use alternative Belarusian locale in our project (openinkpot.org),
but we don't want to create yet another (216th) patch for glibc.

Comment 20 Hleb Valoshka 2010-01-17 17:11:50 UTC

Created attachment 4525 [details]
be_BY@tarask locale definition for glibc

it's a be_BY@tarask locale definition for glibc. please, accept it at last.

Comment 21 Petr Baudis 2010-11-06 12:46:22 UTC

I believe Ulrich's aim is to avoid all kinds of fringe variations plaguing glibc locale database. If only small minority uses be_BY@tarask, it should be distributed separately. If only small minority would use current be_BY, be_BY@tarask maybe should just be entered as be_BY. If there is rough equilibrium between the two groups (which you seem to indicate), I would say there is a value in having this. (The only real argument in this bug seemed to be about the naming, which seems to have been resolved in the IANA scope.)

Can you somehow show that the equilibrium is the case? E.g. are there (pre-existing) Wikipedia articles about this, other notable sources (e.g. major newspaper articles) or such?

Comment 22 booxter 2010-11-06 13:15:25 UTC

Ok, let me show you that this language variant is quite strong to have its own locale.

1. There are 2 (two) Belarusian Wikipedias: be.wikipedia.org (IANA: be_BY-1959acad, be_BY in glibc) and be-x-old.wikipedia.org (IANA: be_BY-tarask) with quite similar articles count: 24914 (be) vs. 29004 (be-x-old).

2. As for localisation, we have different open- and closed-source software having either one or another language variant translations. F.e. OpenOffice, Mozilla Suite, Firefox, Thunderbird, KDE work with academic language variant (be_BY-1959acad) though GNOME, Gimp, Xfce4 have tarashkevitsa (be_BY-tarask) translation (the latter was forced to use be_BY locale till now because there is no proper place for their contributions). Mediawiki and some other software packages have both language versions translation.

3. As for real life, Belarusian Academy of Science, schools, state publishers and media work in -1959acad version. Though some other popular media work in alternative, -tarask version (mainly: Radio Liberty for Belarus, Radio Racyja, ARCHE, some private publishers).

I don't know about any official statistics on the percentage of usage of each of the variants but I think every Belarusian language user will support that this percentage can vary (80/20 to 20/80 percentage in different spheres with total stats of about 75/25 for -1959acad and -tarask respectively).

You can read a bit more on the roots of two language norm variants existance on: http://en.wikipedia.org/wiki/Taraškievica

So this is not the case of "fringe variations plaguing glibc locale database." :)

Comment 23 Petr Baudis 2012-04-04 15:48:51 UTC

*** Bug 4020 has been marked as a duplicate of this bug. ***

Comment 24 Petr Baudis 2012-04-04 15:49:02 UTC

*** Bug 7014 has been marked as a duplicate of this bug. ***

Comment 25 Hleb Valoshka 2012-12-28 19:11:04 UTC

Hey, guys, how much years do you need more to accept this trivial patch and close at last this annoying bug?

Comment 26 Rafal Luzynski 2017-11-21 06:50:41 UTC

This is what I've received from a native speaker:

"I don't think this change should be done. Since the time
the bug was reported, usage of the alternative spelling has
significantly decreased both in localization projects as well as in
real/web life, and I don't believe there are a lot of people who would
contribute in this specific locale variant. Having all those variants
at this point is just confusing for users. If I were you, I would just
close the bug without a fix.

Ihar"

Therefore I'm closing this.

Comment 27 Hleb Valoshka 2017-11-22 21:47:48 UTC

I absolutely disagree with such bug treatment. Although Ihar does not use it this does not mean that nobody else uses it.

Of course it no easy to properly work with @tarask locale variant (incorrect spelling of some strings etc) because the bug was opened 12 (TWELVE!) years ago without any actions from glibc maintainers.

This is a trivial case, this is not posting glibc to brand new kernel. It's just a locale definitions, it's attached, what prevents glibc maintainers to simply copy it into source tree?

Is this how communication with community should be handled?

Comment 28 Chris Leonard 2017-11-22 23:38:43 UTC

If a Unicode CLDR locale variant is existing, I think that glibc will absolutely follow suit.  The simple reality is that the glibc project is primarily a collection of C-hackers and not linguists.  The Unicode CLDR effort has a deeper bench of linguistic experience by virtue of developing Unicode representation.  I personally volunteer my own efforts to assist in developing the CLDR locale (at least the translated bits) in support of a minority language community, but I don't think making glibc intervene on internicene conflicts is very productive.  I can offer Pootle hosting of PO files representing CLDR regions, languages and scripts to assist a team in developing the core bits of a CLDR locale.  Actions will speak much louder than ticket comments.

Comment 29 Chris Leonard 2017-11-22 23:53:28 UTC

To start on a CLDR locale for be@tarask

Register as a translator here:

https://translate.sugarlabs.org/accounts/register/

Work on these three CLDR related PO files. conveniently including Wikipedia links for those who are not geographers, linguists or orthographers.

https://translate.sugarlabs.org/be@tarask/

When that is done we'll work on getting the rest of the Unicode CLDR Survey tool completed (plural forms, etc.)


You have my personal commitment of support in trying to develop a CLDR locale for be@tarask, as Sugar Labs Translation Team coordinator I work with many digitally disadvantaged languages and firmly believe in linguistic self-determination.

Does that sound like a fair alternative to making C-hackers get involved in an internal Belarusinan issue?

Comment 30 Mike FABIAN 2017-11-23 10:34:24 UTC

(In reply to Hleb Valoshka from comment #27)
> I absolutely disagree with such bug treatment. Although Ihar does not use it
> this does not mean that nobody else uses it.

If there are people who really want to use it today, we will add it.

> Of course it no easy to properly work with @tarask locale variant (incorrect
> spelling of some strings etc) because the bug was opened 12 (TWELVE!) years
> ago without any actions from glibc maintainers.

We are trying to improve, Rafał and me are currently going through the list
of open bugs related to locales and try to work through that backlog.

> This is a trivial case, this is not posting glibc to brand new kernel. It's
> just a locale definitions, it's attached, what prevents glibc maintainers to
> simply copy it into source tree?

Adding locales is not without cost, each locale needs about 2 MB in
the binary. Some distributions still install all available locales
always by default (for example openSUSE and Fedora do
this). Therefore, having more locales will make the default install
larger.  That is OK for locales which are used by some people, but
adding stuff which nobody uses makes no sense.

And it is sometimes hard for us to figure out whether there are
really any users or not, especially for old bug reports where there
was no activity for a few years.

As Chris Leonard writes, if a locales exists in CLDR, this is also
an indication that it is really used by somebody.

Recently I added a ca_ES.utf8@valencia locale, there I also had
some doubts first whether there are people really interested in using this.
But when I saw that ca_ES_VALENCIA.xml exists in CLDR, I thought:
“OK, this proves that there is real user interest in that locale”.

> Is this how communication with community should be handled?

We want to be nice to the community, but it is sometimes hard for
us to find out which language communities are really active
and which are not.

Comment 31 Rafal Luzynski 2017-11-28 22:41:47 UTC

Thank you Mike and Chris for replying while I was traveling. Indeed, we feel more comfortable to add or modify locale data if they are copied from CLDR. Actually our long term goal is to import locale data from CLDR automatically. Adding locales which are not present in CLDR is possible but always tricky: how to verify if a locale is correct? how to tell if a community willing to use the locale really exists? In this case I was told that it does not exist.

So, Hleb, please prove that the community (at least one person) exists and file a ticket asking to add the locale to CLDR. I suggest to add this locale to glibc (that means: I or someone else will add) as soon as it draws some reasonable attention from CLDR maintainers. I don't need to wait until it is closed and published. For now I reopen this bug report.