This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][PATCH v4 06/11] Provide backward compatibility for strftime family (bug 10871).


10.11.2016 13:41 Florian Weimer <fweimer@redhat.com> wrote:
>
>
> On 11/10/2016 01:33 AM, Rafal Luzynski wrote:
>
> > I've discussed all possible solutions in [1], including what you
> > have proposed here. Shortly, no solution is perfect and each has
> > its advantages and disadvantages. Your solution has these pros:
> >
> > - does not cause any backward compatibility issues;
> > - does not break any existing application where the current solution
> > is correct.
> >
> > At the same time it has the following cons:
> >
> > - introduces incompatibilities with *BSD family (including OS X) and
> > with the probable future POSIX specification which will remain
> > forever - please read below why I find it important;
>
> Even the FreeBSD situation is in support of my proposal because
> implementing it would improve date formatting:
>
> [root@bsd ~]# uname -a
> FreeBSD bsd 11.0-RELEASE-p1 FreeBSD 11.0-RELEASE-p1 #0 r306420: Thu Sep
> 29 01:43:23 UTC 2016
> root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
> [root@bsd ~]# LC_ALL=ca_ES.UTF-8 date -j 201604141000
> dijous, 14 de d’abril de 2016, 10:00:00 UTC

I was investigating these cases long ago and in Linux only,
sorry if I'm inaccurate and please tell me if an actual investigation
is needed again. I don't know if FreeBSD uses the same GNU coreutils
as Linux does but if it does then it's not a surprise if some bugs
are common.

So, AFAIR, date does not even call strftime() except when expanding
"%c" and "%x". Instead it reimplements the same algorithm as strftime()
but putting the result directly to stdout. The aim is to support
arbitrarily long format string without risking a memory overflow.
Then the same function is reused by du. Both belong to the package
coreutils. I'm aware that coreutils will have to be fixed the same
way as we would fix glibc.

In this particular case, it looks like date uses whatever is used
as an expansion of "%c" which for Catalan seems to contain something
like "%a, %-d de %B de..." which is an attempt to workaround an issue
which does not exist in FreeBSD. Probably locale data need to be fixed,
removal of "de" and one space should be sufficient in this case.
Actually this workaround is not even correct because neither
"de d’abril" nor "de abril" is correct.

May I ask you what is the result of this command in that system?

    LC_ALL=ca_ES.UTF-8 locale date_fmt

(I hope the command is correct and displays the format for "%c".)

> [root@bsd ~]# LC_ALL=ca_ES.UTF-8 cal
> De novembre 2016
> dg dl dt dc dj dv ds
>        1  2  3  4  5
>  6  7  8  9 10 11 12
> 13 14 15 16 17 18 19
> 20 21 22 23 24 25 26
> 27 28 29 30

Again I'm not sure if this is the same cal as in Linux but it looks
like it uses strftime("%B") or nl_langinfo(MON_1) where it should
use strftime("%OB") or nl_langinfo(ALTMON_1), respectively. As I said
above, I'm aware of this issue and cal is one of these apps that would
get broken and would have to be fixed.

> In the date case, this is not even a third-party application using a
> hard-coded strftime argument, it's right in the base operating system,
> in the locale data.

That's good IMHO because we know how to reach it upstream. There is
worse situation with the software we are not aware of.

> I couldn't test Thunderbird because it did not pick up the Catalan
> language pack for some reason, but the sources use "de %B" as a date
> format, so I expect that they are broken on FreeBSD, too.

I would expect Thunderbird to be a good example of an application
which is broken now and would get fixed; hopefully it's already
working correctly in FreeBSD.

I'm pretty sure it's a translation file (*.po, *.mo, *.gmo) which
provides "de %B" rather than the source code (*.c, *.cpp). Again,
this is an attempt to workaround the situation and even this does
not work correctly because it generates "de abril" on Linux and
"de d'abril" on BSD. Nothing better can be done until we fix this bug.

> I think this shows that whatever is currently proposed for POSIX has
> plenty of unintended consequences.
>
> > - does not actually solve the problem for any existing application
> > until the authors or translators change %B to %OB (in case of open
> > source programs we can reach the upstreams and suggest solution).
>
> Yes, but this has to be weighed against all the applications which are
> broken after the change.

That's what I'm trying to estimate and so far I guess there are
more apps broken now than those that will get broken.

I'm not sure if there is a way to grep over as many source codes
as possible and check how they use strftime() and nl_langinfo().

> > My solution has these pros:
> >
> > - will automagically solve the problems for all applications where
> > it is broken;
>
> Not true, see the FreeBSD example, where the full-format date string is
> still incorrect.

In case of date utility it's because they provide a default format
string for Catalan locale incorrectly. If you put a correct format string
like this:

date +"%-d %B de %Y"

the result would be correct. I'm not sure why Catalan locale in FreeBSD
provide that additional "de". Is it a remain from the times when they
had the same bug (pre-1999)? Is it copied from Linux?

> > - will remain compatible with an existing *BSD solution and possible
> > future POSIX specification.
>
> FreeBSD has to fix things anyway, so changing the approach would not
> create additional work for them.

Well, if you convinced *BSD (and Apple) to swap their meaning of "%B"
and "%OB" it would make it possible to implement the same in Linux.
I'm afraid they wouldn't agree.

> > Also has this disadvantage:
> >
> > - will break some existing applications where current solution is correct;
>
> Some? Most exiting applications (which use date formats) for some
> locales, I would say.

No, only those which list months standalone. Except calendars and some
applications grouping objects (e.g., documents, including some blog
managers) by months I can't imagine any software doing so. All other software
would get fixed. The examples in Catalan which you have provided are
caused by an attempt to fix the problem putting "de" before the month
name. Even this workaround is not perfect. But, again, if Catalan people
prefer they may choose to remain with current locale settings and current
workaround. Like German, they will not see any difference if they leave
the locale data unchanged.

>
> > but:
> >
> > - in case of open source software we can reach the upstreams and suggest
> > solution;
> > - in case of closed source software distributed in a binary form we can
> > provide a backward compatible ABI which will provide the old behaviour
> > for older programs.
>
> I think we should, if at all possible, avoid situations were mere
> recompilation of an application introduces subtle changes. Software is
> increasingly bundled and compiled by downstream developers and not
> distributions.
>
> > I believe there are less cases where the a month name is displayed
> > standalone than those where it is displayed with a day number therefore
> > I believe that a fallout caused by applications broken by my solution
> > is smaller than the fallout caused by the applications broken now.
>
> This might be true for affected Slavic locales (but I haven't investigated).

I'm counting (or rather trying to guess) the numbers of applications
here, not the number of languages (locales). I mean the number of apps
which: are broken now and will get fixed vs those which are working
correctly now and will get broken.

>
> > And the severity of the new bugs is equal to the severity of the current
> > bugs.
>
> I think for Romance languages with elision, the slightly incorrect “de”
> is preferred to the “de de” or “de d'” we'd get with your approach (and
> as the FreeBSD example shows, these situations are hardly temporary, but
> the bugs stick around for quite some time).

Definitely, "de de" or "de d'" is incorrect but if someone touches the
locale data for some language they should remove these additional "de"
from "%c" and "%x" at the same time while providing "alternative"
month names. Regarding the software which has "de" provided by
translators it's a task for translators to fix it.

Here I think that maybe we should reach some local communities including
translators and ask which solution would they prefer: would they like
that nothing changes until they change every "de %B" to "%OB" or would
they like that "de de" suddenly appears until they change "de %B"
to "%B". Is trans@lists.fedoraproject.org a good place to discuss it?

Regards,

Rafal


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]