I was trying to look into using number grouping for a project and realized that the formats used is not consistent. For reference, here is the documentation: https://sourceware.org/glibc/manual/html_node/General-Numeric.html These are the two issues I've found: * Many locales have the same digit repeated, e.g., en_US https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/en_US;h=5cc518dff2fc1309e5cddd86950d6e9898a2d7e1;hb=refs/heads/master#l75 As far as I can tell, it should be enough to have a single 3 there. As is the case for, e.g., en_HK https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/en_HK;h=5f797e076099c4972d3c74fe92e5a6607c3bae95;hb=refs/heads/master#l84 * Some locales have 0;0 as grouping, e.g. el_GR https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/el_GR;h=285e1e009276476f2aa2d2745177944c7b34a78b;hb=HEAD Not sure what this is supposed to mean, but, e.g,. POSIX have -1 to indicate "no grouping" https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/POSIX;h=7ec7f1c5774ab1fb011c08e2e17d435923e48fe2;hb=refs/heads/master#l262 Note that "The last member is either 0, in which case the previous member is used over and over again for all the remaining groups...", i.e., string termination, but here there will be a string with three string termination characters, to no previous member. To some extent this is also the case for mon_grouping, at least the first case. I guess the impact of this issue depends on the situation. The first one will just waste a few bytes (and lead to confusion), but the second may lead to weird results, at least in code using the raw localedata information without noticing this. If people agree that this should be consistent and fixed (not so obvious what to replace 0;0 with, probably -1?), I'd be happy to provide a patch. (Even more happy to be able to do that using standard git-access, I can provide some credentials that I know how to use it etc.)
Direct link for the 0;0 case: https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/el_GR;h=285e1e009276476f2aa2d2745177944c7b34a78b;hb=HEAD#l92
https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html says: 7.3.4 LC_NUMERIC ... grouping Define the size of each group of digits in formatted non-monetary quantities. The operand is a sequence of integers separated by semicolons. Each integer specifies the number of digits in each group, with the initial integer defining the size of the group immediately preceding the decimal delimiter, and the following integers defining the preceding groups. If the last integer is not -1, then the size of the previous group (if any) shall be repeatedly used for the remainder of the digits. If the last integer is -1, then no further grouping shall be performed.
So in the el_GR locale, one could use grouping -1 instead of grouping 0:0 But it does not seem to matter, both do the same: mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%) $ grep -E "grouping.*(0;0|-1)" * C:mon_grouping -1 C:grouping -1 POSIX:mon_grouping -1 POSIX:grouping -1 aa_DJ:grouping 0;0 ar_SA:mon_grouping -1 ar_SA:grouping -1 bs_BA:grouping 0;0 el_CY:grouping 0;0 el_GR:grouping 0;0 eo:grouping 0;0 es_CU:grouping 0;0 gl_ES:grouping 0;0 i18n:mon_grouping -1 i18n:grouping -1 mg_MG:grouping 0;0 pap_AW:grouping 0;0 pap_CW:grouping 0;0 pt_PT:grouping 0;0 rw_RW:grouping -1 sl_SI:grouping 0;0 sr_RS:grouping 0;0 ti_ER:grouping 0;0 wo_SN:grouping 0;0 mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%) $ LC_ALL=rw_RW.UTF-8 /usr/bin/printf "%'f\n" 12345678.9 12345678,900000 mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%) $ LC_ALL=el_GR.UTF-8 /usr/bin/printf "%'f\n" 12345678.9 12345678,900000 mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%) $
Also grouping 3 and grouping 3;3 behaves the same: mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%) $ grep grouping en_US en_PH en_US:mon_grouping 3;3 en_US:grouping 3;3 en_PH:mon_grouping 3 en_PH:grouping 3 mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%) $ LC_ALL=en_US.UTF-8 /usr/bin/printf "%'f\n" 12345678.9 12,345,678.900000 mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%) $ LC_ALL=en_PH.UTF-8 /usr/bin/printf "%'f\n" 12345678.9 12,345,678.900000 mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%) $
Thanks for the reply. Yes, they behave the same, but for consistency reasons I believe that one of them should be selected. Two reasons: * When trying to understand how to specify these strings, the mix of formats (and redundant information) is rather confusing. * There are other tools relying on these files and it would be better if there are fewer corner cases to handle/optimizations to be done. I've later learnt that -1 is translated into "" by localeconv. Hence, one may suspect that 0;0 works because it translates into three(?) string termination characters. While this clearly works, one can hardly argue that it makes sense. For the 3;3 case, it may make sense in the user code to check if there is a single digit and in that case have a fast path. Which 3;3 will never detect. Or put another way: what is the benefit of having inconsistent data that may lead to redundant storage and additional computations?
OK, then I’ll change 0;0 ➡️ -1 and 3;3 ➡️ -1.
This test case needs to be adapted: https://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/tst-grouping_iterator.c;h=79cc9f4e7a168fb732af29afd25f194d310384fb;hb=HEAD
(In reply to Oscar Gustafsson from comment #5) > * There are other tools relying on these files and it would be better if > there are fewer corner cases to handle/optimizations to be done. These other tools nevertheless need to be able to parse '3;3' and '0:0' as this remains possible.
https://patchwork.sourceware.org/project/glibc/patch/20240122142005.993598-1-mfabian@redhat.com/
The master branch has been updated by Mike Fabian <mfabian@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5176a830e70140cb3390c62b7d41f75dbbf33c7c commit 5176a830e70140cb3390c62b7d41f75dbbf33c7c Author: Mike FABIAN <mfabian@redhat.com> Date: Thu Jan 18 16:52:03 2024 +0100 localedata: Use consistent values for grouping and mon_grouping Resolves: BZ # 31205 Adapt test cases in test-grouping_iterator.c
Fixed in glibc master.