38 – Serbian locales (sr_CS and sr_CS@Latn) incorrect in GNU libc

Bug 38 - Serbian locales (sr_CS and sr_CS@Latn) incorrect in GNU libc

Summary: Serbian locales (sr_CS and sr_CS@Latn) incorrect in GNU libc

Status:	RESOLVED FIXED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	localedata (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Petter Reinholdtsen

URL:
Keywords:

Depends on:	40 549
Blocks:	39 libc236
	Show dependency tree / graph

Reported:	2004-02-27 17:21 UTC by Danilo Segan
Modified:	2019-04-10 12:28 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Serbian locale for Serbia and Montenegro (2.54 KB, text/plain) 2004-02-27 17:22 UTC, Danilo Segan	Details
Serbian locale for Serbia and Montenegro in Latin script (1.59 KB, text/plain) 2004-02-27 17:26 UTC, Danilo Segan	Details
Updated Serbian Latin locale for Serbia and Montenegro (1.58 KB, text/plain) 2004-08-08 23:00 UTC, Danilo Segan	Details
Updated Serbian locale for Serbia and Montenegro (2.53 KB, text/plain) 2004-08-08 23:01 UTC, Danilo Segan	Details
Serbian Latin locale for Serbia and Montenegro (1.58 KB, text/plain) 2004-08-08 23:08 UTC, Danilo Segan	Details
Serbian locale for Serbia and Montenegro (2.53 KB, text/plain) 2004-11-28 19:15 UTC, Danilo Segan	Details
Serbian Latin locale for Serbia and Montenegro (1.57 KB, text/plain) 2004-11-28 19:15 UTC, Danilo Segan	Details
Show Obsolete (5) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Danilo Segan 2004-02-27 17:21:18 UTC

sr_YU and sr_YU@cyrillic hold numerous incorrect data.  First of all, base
Serbian locale should be using Cyrillic script (constitution mentions Cyrillic
as primary script, with Latin being reserved for minorities).  Date and currency
formats, week start are all either incorrect or missing.  Country codes are also
incorrect.

They also cannot be installed with localedef without using "-c" option, because
LC_COLLATE uses both "copy" keyword, and other keywords which cannot be used if
"copy" is used.  This means that collation doesn't work for Serbian in Latin
script (ISO14561 is correct for Cyrillic).

I've corrected all of these issues except for collation (I just removed the
parts that don't work, so no need to use "-c" anymore), and put the updated
locales at http://srpski.org/locale/.  Files sr_CS and sr_CS@Latn provide
Serbian (in Cyrillic) and Serbian Latin locales.  sr_CS@Latn depends on sr_CS.

Comment 1 Danilo Segan 2004-02-27 17:22:45 UTC

Created attachment 12 [details]
Serbian locale for Serbia and Montenegro

Comment 2 Danilo Segan 2004-02-27 17:26:40 UTC

Created attachment 13 [details]
Serbian locale for Serbia and Montenegro in Latin script

Comment 3 Danilo Segan 2004-03-21 16:57:04 UTC

Some data to support claims made here:
(From Serbian government page at http://www.serbia.sr.gov.yu/cms/view.php?id=1015):
The official language in Serbia is Serbian and the alphabet in official use is
Cyrillic, while Latin script is also used. In the areas inhabited by national
minorities, the languages and alphabets of the minorities are in official use,
as provided by law.

If you check page at http://www.srbija.sr.gov.yu/cms/view.php?id=1014 (in
Serbian), you'll notice dates of the form "28. decembra 1990." which agrees with
my definition in locales (apart from word-declinations, where nominative is
"decembar" in Serbian in Latin script, we've got here "decembra" instead).

If you take a look at official Belgrade web page (Belgrade is capital of Serbia
and Montenegro) http://www.beograd.org.yu/, you can notice dates of the form
"Петак, 19. 03. 2004." and "Недеља, 21. 03. 2004 - 16:11:33" (you may come
across different dates, they're related to time of visit and when news item was
sent), which agrees with the long forms I used.

It's hard to back any of these with real specifications, since they're not
available on the web.  Also, there're some "stylistic" differences, like usage
of "-" (hyphen) on beograd.org.yu above (first time I've seen that, it's more
commonly a comma), and you can notice that they drop a dot in one place after
the year, and keep it in the other (so they do make mistakes).  Also, using
uppercase day names is very rare, even if it stands on its own.


Also, I've fixed the yesexpr and noexpr as well in my local copies, and I may
submit them if you want.

Comment 4 Bruno Haible 2004-04-02 19:38:28 UTC

Regarding the naming of the locales: By convention, in Unix, we use lower case 
and non-mutilated names. So, "sr_CS@latin" would be logical. The name 
"sr_CS@Latn" comes from Mono/.NET. The .NET conventions also use a "-" 
instead of "_", therefore the .NET locale names and the Unix locale names cannot 
be the same anyway.

Comment 5 Danilo Segan 2004-04-03 19:13:44 UTC

Hi Bruno,
The choice of sr_CS@Latn has nothing to do with .NET (heck, I don't even know
what codes they use, and only recently I found out about ietf "language tags"
which seem to be what you describe). It was decided over a year ago in a
discussion on Gnome I18N list that this was the way to go (at least for Gnome at
the time, which used "sr" for Latin and "sp" for Cyrillic, which is not even a
proper ISO 639 code), because of standardized nature of ISO 15924 script codes,
which allows it to scale better to similar cases in the future (that sort-of
happened with Uzbek).

My message to bug-glibc in 2003-06 has some pointers to the discussion:
http://sources.redhat.com/ml/bug-glibc/2003-06/msg00012.html

Since it's been almost a full year, sr@Latn.po files are present in many
packages now, and it would be real hassle to rename them all.  Just as an
example, most of packages in Gnome CVS (and all packages in Gnome, which is a
GNU project) have sr@Latn.po files.  Also, other external projects use them too
(like gToDo, or Gaim), and so do the other major projects like KDE (in their 3.2
releases).

(Btw, "US" in "en_US" doesn't look that "lower case" to me :)

Also, GNU libc has only four or five "modifiers" in use, so I wouldn't take that
as sufficent data to describe "convention".  Along the same lines, sr@Latn is
already widely used in practice (my guess is at least 200 packages, probably
even more).  The catch is that GNU libc's findlocale() accepts even things like
"sr_YU@Latn" now, and works suitably since the "sr_YU" locale is currently
(incorrectly) Latin-based.  When default Serbian locale is switched to Cyrillic
(and sr_CS@latin is used for Latin transcription), sr_CS@Latn would "resolve" to
sr_CS, and we'd get a mixup of Cyrillic and Latin.

Still, since my main goal is to get these into GNU libc, and I'm more worried
about the "default" locale (sr_CS) which needs to be Cyrillic and correct, I
won't oppose (strongly) any other choice, even if it means too much work for
many others (since I consider it Latin transcription of lower-priority).

Comment 6 Petter Reinholdtsen 2004-08-08 18:09:41 UTC

The locales sr_YU and sr_YU@cyrillic have been removed from glibc CVS.

I had a look at the sr_CS locale, and discovered a few issues.

 - YU and sr_YU is mentioned several places in the header and comment.
   I assume this should change to CS and sr_CS?

 - LC_IDENTIFICATION:category is missing quotes (") around the
   standard reference text.  I'm not sure if this is mandatory,
   but almost all other locales uses the quotes.  I recommend
   inserting quotes.

 - Both yesexpr and noexpr is missing '^' at the start of the
   regex.  It should probably include it.

 - Both yesexpr and noexpr contains '.*' at the start fo the regex.
   The default POSIX locale regex do not contain such ending, and
   I recommend that this part is removed from the regex to make
   sure the regex-es are compatible with the POSIX regex.

I also had a look at the sr_CS@Latn locale.

 - It also mention YU in the comments.

 - It also lack quotes in LC_IDENTIFICATION.

 - Both yesexpr and noexpr lack '^' and includes '.*'.

 - Both yesexpr and noexpr includes numeric '0' and '1'.  The
   POSIX locale do not include zero and one in the regex-es,
   so I recommend removing them to make sure the regex is
   comparable to the POSIX locale.

There might be other problems with the locale as well.  I am working
on a script to check the content of locales, and it is still very
sketcy. :)

The naming issue is an open question.  I yet do not understand
which modifier strings are accepted by the glibc maintainers.

You might find some useful information from
<URL: http://www.student.uit.no/~pere/linux/glibc/ > about
writing glibc locales.  There isn't much there, but it is
all I got at the moment. :)

Comment 7 Danilo Segan 2004-08-08 22:58:09 UTC

Petter, thanks for checking locales out.

"YU" is mentioned only as a source reference (such as: "Source: sr_YU locale"),
since these locales are based on earlier sr_YU locales.  There're no
unintentional references to "YU".

I've added the quotes around LC_IDENTIFICATION.category references in my local copy.

yesexpr and noexpr have been fixed in my local copy since you announced that on
libc-locales list (late March I believe, based on my ChangeLog in
http://cvs.kvota.net/viewcvs/viewcvs.cgi/locale/?cvsroot=i18n).  Now I've
removed the ".*" from yesexpr and noexpr as well, though that shouldn't cause
any problems.

I'll attach the latest files here.

Comment 8 Danilo Segan 2004-08-08 23:00:40 UTC

Created attachment 158 [details]
Updated Serbian Latin locale for Serbia and Montenegro

Comment 9 Danilo Segan 2004-08-08 23:01:41 UTC

Created attachment 159 [details]
Updated Serbian locale for Serbia and Montenegro

Comment 10 Danilo Segan 2004-08-08 23:08:10 UTC

Created attachment 160 [details]
Serbian Latin locale for Serbia and Montenegro

Whoops, I forgot to remove 0 and 1 from noexpr and yesexpr.

Comment 11 Danilo Segan 2004-10-19 22:53:46 UTC

Petter, with sr_YU locales removed from GNU libc, are there any chances of
getting this considered soon enough?

Comment 12 Danilo Segan 2004-11-15 15:46:22 UTC

I'll post updated locale files later which fix international currency symbol in
accordance with bug #549.

Comment 13 Danilo Segan 2004-11-28 19:15:02 UTC

Created attachment 293 [details]
Serbian locale for Serbia and Montenegro

Updated to use CSD instead of YUM.

Comment 14 Danilo Segan 2004-11-28 19:15:53 UTC

Created attachment 294 [details]
Serbian Latin locale for Serbia and Montenegro

Updated to use CSD instead of YUM.

Comment 15 Sourceware Commits 2005-07-18 01:50:41 UTC

Subject: Bug 38

CVSROOT:	/cvs/glibc
Module name:	libc
Branch: 	glibc-2_3-branch
Changes by:	roland@sources.redhat.com	2005-07-18 01:50:36

Modified files:
	localedata     : SUPPORTED 

Log message:
	2005-02-27  Denis Barbier  <barbier@debian.org>
	
	[BZ #38]
	* locales/sr_CS: New file.
	Contributed by Danilo Segan <dsegan@gmx.net>
	* SUPPORTED: Add sr_CS/ISO-8859-5 and sr_CS.UTF-8/UTF-8.

Patches:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/localedata/SUPPORTED.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.72.2.1&r2=1.72.2.2

Comment 16 Sourceware Commits 2005-07-18 01:51:04 UTC

Subject: Bug 38

CVSROOT:	/cvs/glibc
Module name:	libc
Branch: 	glibc-2_3-branch
Changes by:	roland@sources.redhat.com	2005-07-18 01:51:00

Added files:
	localedata/locales: sr_CS 

Log message:
	2005-02-27  Denis Barbier  <barbier@debian.org>
	
	[BZ #38]
	* locales/sr_CS: New file.
	Contributed by Danilo Segan <dsegan@gmx.net>
	* SUPPORTED: Add sr_CS/ISO-8859-5 and sr_CS.UTF-8/UTF-8.

Patches:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/localedata/locales/sr_CS.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=NONE&r2=1.1.4.1

Comment 17 Roland McGrath 2005-07-19 03:30:41 UTC

This fix is now in the 2.3 branch as well as the trunk, and the problem should
be resolved as of the 2.3.6 release.

Comment 18 Danilo Segan 2005-10-30 14:24:53 UTC

This dialect is the sole dialect used inside Montenegro part of "Serbia and
Montenegro" (thus, sr_CS@jekavian). Ekavian is hardly used there, just as
Jekavian is hardly used in Serbia.

Also, Serbian in "Bosnia and Herzegovina" (BA) is also most commonly in Jekavian
dialect.  So, the population using each is (estimates):
  - sr_CS, basic Ekavian: 7-8M in Serbia
  - sr_CS@jekavian, Jekavian: 750k in Montenegro, >2M in Bosnia, total ~3M

Jekavian is official dialect of Montenegro as well.

I, naturally, disagree with your stance (since this is an official dialect as
used in a single state [equivalent of US "states"] in the state union of Serbia
and Montenegro), and I still hope you'll be for including this one as well.

Comment 19 Danilo Segan 2005-10-30 14:26:48 UTC

Woops, sent to the wrong bug :)

Comment 20 Danilo Segan 2006-01-27 20:11:03 UTC

Comment on attachment 294 [details]
Serbian Latin locale for Serbia and Montenegro

Changing filename: please use sr_CS@latin if that will help (this is all that
needs doing to close this bug)