Handling numbers input/output in glibc

Behdad Esfahbod behdad@cs.toronto.edu
Tue Mar 2 08:22:00 GMT 2004


On Mon, 2 Feb 2004, Bruno Haible wrote:

> Behdad Esfahbod wrote on 2004-01-10:
> > Problem statement:  In Persian (fa_IR) locale, we like to read
> > and write numbers with Persian numerals (U+06F0..U+06F9).
>
> To this I'd like to add the important additional explanation that you
> made on 2004-01-06:
>
> > The border between which numbers should be written with local
> > digits, which with latin digits, is not quite clear.  For example
> > in Persian we write every number with Persian digits, but I can
> > see how we may write a price with US dollar currency sign with
> > Latin digits.  Or Arab people may have their own desires about
> > which numbers they would like to see in their local digits, which
> > not.  So the decision better be left to each translation team,
>
> The solution that is implemented for this is:
>   - The application developer uses gettext() around all format strings that
>     contain "%d".
>   - gettext() looks up the translation in the Persian message catalog. It
>     may contains "%Id" instead of "%d".
>   - printf substitutes outdigits for those numbers that are output with "%Id".
>
> This should be sufficient, isn't it?

Yes.  That solves the problem effectively.  So digit output
problem is solved now.

Just one more thing, about digit input problem:

Right now iswdigit and scanf("%Id") both understand the "digit"
tag in locale definition.  So if you define two sets of decimal
digits in your locale under "digit" tag, scanf("%Id") (and not
scanf("%d")) would parse them as numerical data.  But since
iswdigit is defined to only accept ASCII digits in C99 standard,
the glibc locales only define ASCII digits under "digit" tag.  So
all the code in internationalization of scanf("%Id") is useless
now.

What I propose is either:

  * Change the code for iswdigit (and isdigit probably) to follow
the C99 standard and only accept ASCII digits.  Then we can
define all Unicode digit sets under "digit" tag in glic locales
and scanf("%Id") would work as expected.

Or:

  * If the above is not acceptable, define another tag parallel
to "digit", to be used by scanf("%Id").

So the point is to make scanf("%Id") work as expected
(internationalized) without breaking the standard compliance of
iswdigit.

behdad


[snip]
> Bruno



More information about the Libc-locales mailing list