Bug 13140 - no-break space as thousands_sep causes inconsistent output
Summary: no-break space as thousands_sep causes inconsistent output
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.18
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-30 07:48 UTC by Marko Myllynen
Modified: 2014-06-27 12:12 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marko Myllynen 2011-08-30 07:48:09 UTC
localhost:~> cat test-grouping.c 
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
  if (setlocale(LC_ALL, argv[1]) == NULL)
    exit(EXIT_FAILURE);
  printf("%s:%'9d\n", argv[1], 123456);
  exit(EXIT_SUCCESS);
}
localhost:~> gcc test-grouping.c 
localhost:~> for l in en_US de_DE fi_FI ; do ./a.out $l.UTF-8 ; done
en_US.UTF-8:  123,456
de_DE.UTF-8:  123.456
fi_FI.UTF-8: 123 456

I think the expected output would be:

en_US.UTF-8:  123,456
de_DE.UTF-8:  123.456
fi_FI.UTF-8:  123 456
Comment 1 Ulrich Drepper 2011-09-05 17:01:40 UTC
The width specification specifies the number of bytes, not screen columns.

Aside, the narrow stream functions have no idea what the encoding is.  It's a sequence of bytes and if two bytes take only one column that's simply not known.  There is nothing which can be done at this level.

If you want aligned output in all cases you have to do the legwork yourself.
Comment 2 Marko Myllynen 2012-04-16 08:27:39 UTC
Reopening in case the needed legwork could be considered as a task for upcoming releases.
Comment 3 Rich Felker 2013-10-10 16:45:42 UTC
The "legwork" needed is on the part of the application, not glibc. glibc is doing exactly the right thing here; non-wide printf-family functions' field widths are all in terms of bytes, not characters. Even if there were not a standard (ISO C) dictating the current behavior, there is no flexibility to change this, because for some uses of sprintf may rely on width being in bytes to avoid buffer overflow.

If you want field widths in characters, you have to use the wide printf-family functions. This is problematic of course because stdio streams have an orientation (byte or wide) and you cannot mix byte and wide functions on them. So, for practical purposes, you probably have to use swprintf or open_wmemstream and fwprintf, then print the resulting string using fprintf with the %ls specifier.

Simply avoiding thousands separators at the printf level is probably an easier solution.

In any case, I don't think there's anything glibc can do to make this easier.
Comment 4 Marko Myllynen 2013-10-15 13:39:27 UTC
Hi Rich,

thanks for the clarification, much appreciated, seems that indeed the right thing to do here is just to close this one.

Thanks.