This is sources Bugzilla
Bugzilla Version 2.17.5
Bugzilla Bug 3405
  sort order on pt_BR Last modified: 2010-06-27 16:02
     Query page      Enter new bug
Bug#: 3405   Hardware:   Reporter: Walter Cruz <walter.php@gmail.com>
Host: Target: Build:
Product:     Add CC:
Component:   Version:   CC:
Remove selected CCs
Status: WAITING   Priority:  
Resolution:   Severity:  
Assigned To: GNU C LIbrary Locale Maintainers <libc-locales@sources.redhat.com>   Target Milestone:  
Flags: Requestee:
  backport ()
  examined ()
  testsuite ()
Summary:
Keywords:

Attachment Description Type Created Actions
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 3405 depends on: Show dependency tree
Show dependency graph
Bug 3405 blocks:

Additional Comments:


Leave as WAITING 
Mark bug as suspended
Change status back to NEW.
Accept bug (change status to ASSIGNED)
Resolve bug, changing resolution to
Resolve bug, mark it as duplicate of bug #
Reassign bug to
Reassign bug to owner of selected component

View Bug Activity   |   Format For Printing


Description:   Last confirmed: 0000-00-00 00:00 Opened: 2006-10-21 01:39
Hi all.

In pt_BR, the glibc doesn't count spaces in the sort order.

An example:

That list:

GABRIELA HELEDA DE SOUZA
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIELA JACOBY NOS
GABRIEL ALEXANDRE DA SILVA MANICA
GÁBRIEL ALCIDES KLIM PERONDI
GÁBRIELA JACOBY NOS 

But the right order is:

GABRIEL ALCIDES KLIM PERONDI
GÁBRIEL ALCIDES KLIM PERONDI
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GÁBRIELA JACOBY NOS
GABRIELA LETICIA BATISTA NUNES


I find that I can change that on /usr/share/i18n/locales, adding:

reorder-after <U00A0>
<U0020><CAP>;<CAP>;<CAP>;<U0020>
reorder-end

in the session LC_COLLATE. After generate the locale again, I have the right
sort order.

------- Additional Comment #1 From eduardo 2007-01-30 16:17 -------
When use "sort" command, it's the wrong sorted list:
~$ sort list.txt
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GÁBRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GÁBRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA

Tested in ubuntu 6.06, fedora core 3, red hat 9 and openSUSE 10.2 (all i386),
with the same wrong sort order.

------- Additional Comment #2 From Petter Reinholdtsen 2007-01-30 16:23 -------
Can you provide any references specifying that space should be handled
as a letter when soring in brazilian portugese?  Because if not, I suspect
you are mistaken when you believe space should be sorted that way.

------- Additional Comment #3 From Walter Cruz 2007-01-30 17:18 -------
(In reply to comment #2)
> Can you provide any references specifying that space should be handled
> as a letter when soring in brazilian portugese?  Because if not, I suspect
> you are mistaken when you believe space should be sorted that way.

The rules are defined by ABNT (Assoaciação Brasileira de Normas e Técnicas) in a
paper called NBR 6033, but the document isn't public available.

But, as me and edurbs are native speakers, I think that you should believe us :D

[]'s
- Walter

------- Additional Comment #4 From keld@dkuug.dk 2007-01-30 18:45 -------
Subject: Re:  sort order on pt_BR

On Tue, Jan 30, 2007 at 04:23:22PM -0000, pere at hungry dot com wrote:
> 
> ------- Additional Comments From pere at hungry dot com  2007-01-30 16:23 -------
> Can you provide any references specifying that space should be handled
> as a letter when soring in brazilian portugese?  Because if not, I suspect
> you are mistaken when you believe space should be sorted that way.

In most languages using a script with letters, you have two ordering
schemes, the standard one, and the word-by-word one. In the latter, space
is significant on the first level. So both are correct, culturally.

I don't know how we can have an easy way to have both schemes available
to the user, except we provide two locales, with a small delta
(replace-after) to make the word-by-word locale. And then a general
naming scheme so the user can chose easily, like the @euro variants.

best regards
Keld

------- Additional Comment #5 From Daniel Cristian Cruz 2007-03-09 13:00 -------
(In reply to comment #0)
> I find that I can change that on /usr/share/i18n/locales, adding:
> 
> reorder-after <U00A0>
> <U0020><CAP>;<CAP>;<CAP>;<U0020>
> reorder-end
> 
> in the session LC_COLLATE. After generate the locale again, I have the right
> sort order.

It didn't worked with Fedora 5. After changing settings on pt_BR file, and run
the following command, still having the same problem...
localedef -i pt_BR -c -f ISO-8859-1 -A /usr/share/locale/locale.alias pt_BR

Did I make something wrong?

Kind regards...

------- Additional Comment #6 From Daniel Cristian Cruz 2007-03-22 19:23 -------
(In reply to comment #5)
> Did I make something wrong?

Yes, I did. I put a space between <U0020> and <CAP>.

But it is still ordering in a strange behavior; 'a' and 'á' and 'ã' and 'à' are
the same characters. It is ordering like it were different.

Sorry...

------- Additional Comment #7 From Luiz K. Matsumura 2007-04-19 06:09 -------
Hi Daniel

How is it ordering ?
I make tests and the behavior with and without the proposed change is the same
when ordering this characters.
May be this an another bug ?

(In reply to comment #6)
> (In reply to comment #5)
> > Did I make something wrong?
> 
> Yes, I did. I put a space between <U0020> and <CAP>.
> 
> But it is still ordering in a strange behavior; 'a' and 'á' and 'ã' and 'à' are
> the same characters. It is ordering like it were different.
> 
> Sorry...
> 


------- Additional Comment #8 From Pierre Habouzit 2007-04-25 23:01 -------
(In reply to comment #0)
> Hi all.
> 
> In pt_BR, the glibc doesn't count spaces in the sort order.

FWIW fr_FR is hit as well, and many other locales are too.

cat a; echo "==========="; LC_ALL=fr_FR sort a
GABRIELA HELEDA DE SOUZA
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIELA JACOBY NOS
GABRIEL ALEXANDRE DA SILVA MANICA
GÁBRIEL ALCIDES KLIM PERONDI
GÁBRIELA JACOBY NOS 
===========
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GÁBRIELA JACOBY NOS 
GABRIEL ALCIDES KLIM PERONDI
GÁBRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA

> I find that I can change that on /usr/share/i18n/locales, adding:
> 
> reorder-after <U00A0>
> <U0020><CAP>;<CAP>;<CAP>;<U0020>
> reorder-end
> 
> in the session LC_COLLATE. After generate the locale again, I have the right
> sort order.


------- Additional Comment #9 From Guilherme de S. Pastore 2007-11-03 12:49 -------
Petter,

I can assure you that the proposed one is the behaviour any Brazilian would
expect since the age of 6, when they learn how to sort at school, right after
learning the alphabet.

If it is *really* necessary, I can pay for web access to the already mentioned
lousy 5-page document from ABNT which defines the technical norm for sorting
just to show you, but you may guess I'm not eager to :)

------- Additional Comment #10 From Daniel Henrique 2010-06-27 14:19 -------
Hi, everybody. First of all i apologize for my poor writing skills. English is
not my native language.

pt_BR sort order seems odd to me. If this behavior is not a bug, i agree with
Keld's suggestion: To define a new locale, like pt_BR@abnt, using the "right"
sort order.

Can the reorder sample sentence handle lower and uppercase properly? The result
of a sort, without the suggested change in the locale definition file, can't:

LC_ALL=pt_BR LANG=pt_BR LANGUAGE=pt_BR sort a.txt 
gabriela heleda de souza
GABRIELA HELEDA DE SOUZA
gabriela jacoby nos
GABRIELA JACOBY NOS
gábriela jacoby nos
GÁBRIELA JACOBY NOS 
gabriel alcides klim perondi
GABRIEL ALCIDES KLIM PERONDI
gábriel alcides klim perondi
GÁBRIEL ALCIDES KLIM PERONDI
gabriela leticia batista nunes
GABRIELA LETICIA BATISTA NUNES
gabriel alexandre da silva manica
GABRIEL ALEXANDRE DA SILVA MANICA


The expected output:
gabriel alcides klim perondi
gábriel alcides klim perondi
gabriel alexandre da silva manica
gabriela heleda de souza
gabriela leticia batista nunes
gabriela jacoby nos
gábriela jacoby nos 
GABRIEL ALCIDES KLIM PERONDI
GÁBRIEL ALCIDES KLIM PERONDI
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA HELEDA DE SOUZA
GABRIELA LETICIA BATISTA NUNES
GABRIELA JACOBY NOS
GÁBRIELA JACOBY NOS 


This is "tricky" because we don't just perform a lexicographically comparison of
each character (a Portuguese Java user will be happy to know that
String.compareTo is not enough to produce the sorted result that he expect, for
several reasons).
We first sort ignoring accented letters, then we use them as a
"tiebreaker/disambiguation criteria" (i don't know the correct term in English)
between equal full names. In the first step, a = á, but in the later step, a < á.


Well, that is all i know. I will try to get a copy of the Norma NBR 6033:1989
(NB 106) from ABNT to confirm (or not :-)) these examples.

Thanks.

------- Additional Comment #11 From Daniel Henrique 2010-06-27 14:25 -------
And i don't know if the Norma is "case sensitive" or "case insensitive".

------- Additional Comment #12 From keld@keldix.com 2010-06-27 15:51 -------
Subject: Re:  sort order on pt_BR

On Sun, Jun 27, 2010 at 02:25:55PM -0000, email_daniel_h at yahoo dot com dot br wrote:
> 
> ------- Additional Comments From email_daniel_h at yahoo dot com dot br  2010-06-27 14:25 -------
> And i don't know if the Norma is "case sensitive" or "case insensitive".

All the European language sorting standards I know of are case insensitive on the first
level, case only counts on the 3rd level. I expect this also to be true for
Portuguese. That is: most important distinction is base letter, second
is accent, third is case.

best regards
keld

------- Additional Comment #13 From Daniel Henrique 2010-06-27 15:58 -------
For those interested in an workaround, for a CentOS 5.5 box (use at your own risk):

1. Copy the base locale definition file

cp /usr/share/i18n/locales/pt_BR pt_BR\@abnt\.src

2. Edit pt_BR@abnt.src and add

reorder-after <U00A0>
<U0020><CAP>;<CAP>;<CAP>;<U0020>
reorder-end

before END LC_COLLATE

3. Create new directories

mkdir /usr/lib/locale/pt_BR\@abnt
mkdir /usr/lib/locale/pt_BR\.utf8\@abnt

4. Compile the new locales

localedef --verbose -c -i pt_BR\@abnt.src -f ISO-8859-1 /usr/lib/locale/pt_BR\@abnt
localedef --verbose -c -i pt_BR\@abnt.src -f UTF-8 /usr/lib/locale/pt_BR\.utf8\@abnt

5. Check the new locales

locale -a | grep pt_BR


I don't know if this is the best way, but it is one way.

Maybe the directories can be different in other Linux distributions.

I think that will be better to create a new pt_BR@abnt.src with a "copy
statement" for each section inside it than to copy the whole source from
/usr/share/i18n/locales/pt_BR

------- Additional Comment #14 From Daniel Henrique 2010-06-27 16:02 -------
(In reply to http://sources.redhat.com/bugzilla/show_bug.cgi?id=3405#c12)

Thanks, Keld.

     Query page      Enter new bug
Actions: New | Query | bug # | Reports | Requests   New Account | Log In