Variable length date strings in glibc locales?
Marko Myllynen
myllynen@redhat.com
Tue May 27 18:24:00 GMT 2014
Hi,
On 2014-05-27 16:37, Carlos O'Donell wrote:
> On 05/27/2014 02:58 AM, Marko Myllynen wrote:
>>
>> in some languages dates are written without leading zeroes so that May 3
>> would be "3.5.". The same for time, 08:07:00 would be "8.07.00".
>>
>> In glibc locales it would be possible to write dates and times in such
>> fashion but do we know how that would affect existing applications? Are
>> they expecting dates and times to be fixed length and would variable
>> length date strings cause formatting or layout issues? Looking at
>> existing locales, almost all of them use fixed length strings for
>> d_fmt/t_fmt/date_fmt/d_t_fmt.
>>
>> Ideally of course it would be nice to change certain locales to use date
>> and time formats according to their cultural conventions and national
>> recommendations but if that would lead to wonky layout in applications
>> then it's probably better to be pragmatic and use fixed length dates.
>>
>> I could add few words about this to our Locales wiki page if someone
>> happens to know what's the best approach here.
>
> I know of no guarantees given about constant length date string.
>
> Therefore I believe that applications will have to put up with
> variable length dates if that is what the locale specifies.
>
> The guiding principle is that we want to represent dates as
> expected by the native speaker. If the application wants a constant
> length they will need to arrange that by breaking up the string
> and spacing it out themselves?
indeed that seems to be a very reasonable expectation, a quick check
with the current locales shows rather great variation, few leading
zeroes omitted in a locale wouldn't make a difference at all:
localhost:~> cat t.sh
#!/bin/bash
for f in date_fmt d_t_fmt d_fmt t_fmt ; do
echo $f:
for l in $(ls -1 /usr/share/i18n/locales/* | grep -Ev
'(@|i18n$|iso14651|translit|POSIX)') ; do
echo -n "$(LC_ALL=$(basename $l.UTF-8) date --date="2007-05-03
08:07:00" +"$(LC_ALL=$(basename $l.UTF-8) locale $f)" | wc -L)" ; echo
-e "\t$(basename $l).UTF-8"
done | sort -un | sed -n '1p;$p'
done
localhost:~> unset LC_ALL
localhost:~> bash ./t.sh 2>/dev/null
date_fmt:
21 ku_TR.UTF-8
55 so_ET.UTF-8
d_t_fmt:
19 tk_TM.UTF-8
56 km_KH.UTF-8
d_fmt:
5 or_IN.UTF-8
25 mt_MT.UTF-8
t_fmt:
8 aa_DJ.UTF-8
20 bo_CN.UTF-8
localhost:~> l=ku_TR.UTF-8
localhost:~> LC_ALL=$(basename $l) date --date="2007-05-03 08:07:00"
+"$(LC_ALL=$(basename $l) locale date_fmt)"
pêncsêm 03 Gulan 2007
localhost:~> l=en_US.UTF-8
localhost:~> LC_ALL=$(basename $l) date --date="2007-05-03 08:07:00"
+"$(LC_ALL=$(basename $l) locale date_fmt)"
Thu May 3 08:07:00 EEST 2007
localhost:~> l=so_ET.UTF-8
localhost:~> LC_ALL=$(basename $l) date --date="2007-05-03 08:07:00"
+"$(LC_ALL=$(basename $l) locale date_fmt)"
Khamiis, Bisha Shanaad 3, 8:07:00 subaxnimo EEST 2007
localhost:~>
So I think I'll add a note to the wiki page that although in few places
the resulting string is guaranteed to be of the same size in all locales
(like int_curr_symbol), in many cases the resulting string can wary in
length quite considerably.
Thanks,
--
Marko Myllynen
More information about the Libc-locales
mailing list