Variable length date strings in glibc locales?

Marko Myllynen myllynen@redhat.com
Tue May 27 18:24:00 GMT 2014


Hi,

On 2014-05-27 16:37, Carlos O'Donell wrote:
> On 05/27/2014 02:58 AM, Marko Myllynen wrote:
>>
>> in some languages dates are written without leading zeroes so that May 3
>> would be "3.5.". The same for time, 08:07:00 would be "8.07.00".
>>
>> In glibc locales it would be possible to write dates and times in such
>> fashion but do we know how that would affect existing applications? Are
>> they expecting dates and times to be fixed length and would variable
>> length date strings cause formatting or layout issues? Looking at
>> existing locales, almost all of them use fixed length strings for
>> d_fmt/t_fmt/date_fmt/d_t_fmt.
>>
>> Ideally of course it would be nice to change certain locales to use date
>> and time formats according to their cultural conventions and national
>> recommendations but if that would lead to wonky layout in applications
>> then it's probably better to be pragmatic and use fixed length dates.
>>
>> I could add few words about this to our Locales wiki page if someone
>> happens to know what's the best approach here.
> 
> I know of no guarantees given about constant length date string.
> 
> Therefore I believe that applications will have to put up with
> variable length dates if that is what the locale specifies.
> 
> The guiding principle is that we want to represent dates as
> expected by the native speaker. If the application wants a constant
> length they will need to arrange that by breaking up the string
> and spacing it out themselves?

indeed that seems to be a very reasonable expectation, a quick check
with the current locales shows rather great variation, few leading
zeroes omitted in a locale wouldn't make a difference at all:

localhost:~> cat t.sh
#!/bin/bash

for f in date_fmt d_t_fmt d_fmt t_fmt ; do
  echo $f:
  for l in $(ls -1 /usr/share/i18n/locales/* | grep -Ev
'(@|i18n$|iso14651|translit|POSIX)') ; do
    echo -n "$(LC_ALL=$(basename $l.UTF-8) date --date="2007-05-03
08:07:00" +"$(LC_ALL=$(basename $l.UTF-8) locale $f)" | wc -L)" ; echo
-e "\t$(basename $l).UTF-8"
  done | sort -un | sed -n '1p;$p'
done
localhost:~> unset LC_ALL
localhost:~> bash ./t.sh 2>/dev/null
date_fmt:
21      ku_TR.UTF-8
55      so_ET.UTF-8
d_t_fmt:
19      tk_TM.UTF-8
56      km_KH.UTF-8
d_fmt:
5       or_IN.UTF-8
25      mt_MT.UTF-8
t_fmt:
8       aa_DJ.UTF-8
20      bo_CN.UTF-8
localhost:~> l=ku_TR.UTF-8
localhost:~> LC_ALL=$(basename $l) date --date="2007-05-03 08:07:00"
+"$(LC_ALL=$(basename $l) locale date_fmt)"
pêncsêm 03 Gulan 2007
localhost:~> l=en_US.UTF-8
localhost:~> LC_ALL=$(basename $l) date --date="2007-05-03 08:07:00"
+"$(LC_ALL=$(basename $l) locale date_fmt)"
Thu May  3 08:07:00 EEST 2007
localhost:~> l=so_ET.UTF-8
localhost:~> LC_ALL=$(basename $l) date --date="2007-05-03 08:07:00"
+"$(LC_ALL=$(basename $l) locale date_fmt)"
Khamiis, Bisha Shanaad  3,  8:07:00 subaxnimo EEST 2007
localhost:~>

So I think I'll add a note to the wiki page that although in few places
the resulting string is guaranteed to be of the same size in all locales
(like int_curr_symbol), in many cases the resulting string can wary in
length quite considerably.

Thanks,

-- 
Marko Myllynen



More information about the Libc-locales mailing list