This is the mail archive of the mailing list for the GNU libc locales project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Variable length date strings in glibc locales?


On 2014-05-27 16:37, Carlos O'Donell wrote:
> On 05/27/2014 02:58 AM, Marko Myllynen wrote:
>> in some languages dates are written without leading zeroes so that May 3
>> would be "3.5.". The same for time, 08:07:00 would be "8.07.00".
>> In glibc locales it would be possible to write dates and times in such
>> fashion but do we know how that would affect existing applications? Are
>> they expecting dates and times to be fixed length and would variable
>> length date strings cause formatting or layout issues? Looking at
>> existing locales, almost all of them use fixed length strings for
>> d_fmt/t_fmt/date_fmt/d_t_fmt.
>> Ideally of course it would be nice to change certain locales to use date
>> and time formats according to their cultural conventions and national
>> recommendations but if that would lead to wonky layout in applications
>> then it's probably better to be pragmatic and use fixed length dates.
>> I could add few words about this to our Locales wiki page if someone
>> happens to know what's the best approach here.
> I know of no guarantees given about constant length date string.
> Therefore I believe that applications will have to put up with
> variable length dates if that is what the locale specifies.
> The guiding principle is that we want to represent dates as
> expected by the native speaker. If the application wants a constant
> length they will need to arrange that by breaking up the string
> and spacing it out themselves?

indeed that seems to be a very reasonable expectation, a quick check
with the current locales shows rather great variation, few leading
zeroes omitted in a locale wouldn't make a difference at all:

localhost:~> cat

for f in date_fmt d_t_fmt d_fmt t_fmt ; do
  echo $f:
  for l in $(ls -1 /usr/share/i18n/locales/* | grep -Ev
'(@|i18n$|iso14651|translit|POSIX)') ; do
    echo -n "$(LC_ALL=$(basename $l.UTF-8) date --date="2007-05-03
08:07:00" +"$(LC_ALL=$(basename $l.UTF-8) locale $f)" | wc -L)" ; echo
-e "\t$(basename $l).UTF-8"
  done | sort -un | sed -n '1p;$p'
localhost:~> unset LC_ALL
localhost:~> bash ./ 2>/dev/null
21      ku_TR.UTF-8
55      so_ET.UTF-8
19      tk_TM.UTF-8
56      km_KH.UTF-8
5       or_IN.UTF-8
25      mt_MT.UTF-8
8       aa_DJ.UTF-8
20      bo_CN.UTF-8
localhost:~> l=ku_TR.UTF-8
localhost:~> LC_ALL=$(basename $l) date --date="2007-05-03 08:07:00"
+"$(LC_ALL=$(basename $l) locale date_fmt)"
pêncsêm 03 Gulan 2007
localhost:~> l=en_US.UTF-8
localhost:~> LC_ALL=$(basename $l) date --date="2007-05-03 08:07:00"
+"$(LC_ALL=$(basename $l) locale date_fmt)"
Thu May  3 08:07:00 EEST 2007
localhost:~> l=so_ET.UTF-8
localhost:~> LC_ALL=$(basename $l) date --date="2007-05-03 08:07:00"
+"$(LC_ALL=$(basename $l) locale date_fmt)"
Khamiis, Bisha Shanaad  3,  8:07:00 subaxnimo EEST 2007

So I think I'll add a note to the wiki page that although in few places
the resulting string is guaranteed to be of the same size in all locales
(like int_curr_symbol), in many cases the resulting string can wary in
length quite considerably.


Marko Myllynen

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]