This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][PATCH v9 2/6] Implement alternative month names (bug 10871).


On Tue, Sep 19, 2017 at 6:42 AM, Rafal Luzynski
<digitalfreak@lingonborough.com> wrote:
> Some languages (Slavic, Baltic, etc.) require a genitive case of the
> month name when formatting a full date (with the day number) while
> they require a nominative case when referring to the month standalone.
> This requirement cannot be fulfilled without providing two forms for
> each month name.  From now it is precised that nl_langinfo(MON_1)
> series (up to MON_12) and strftime("%B") generate the month names in
> the grammatical form used when the month forms part of a complete date.
> If the grammatical form used when the month is named by itself is needed,
> the new values nl_langinfo(ALTMON_1) (up to ALTMON_12) and
> strftime("%OB") are supported.

I am reviewing this patch on the assumption that we have consensus for
the addition of this feature, with the semantics described above, and
without any provision for old binaries (that is, the strings produced
by %B will unconditionally change when locales that need two forms are
updated).  Last call for objections.

The code changes look good to me, except for a few small concerns
listed below.  Also, before landing I would like Joseph Myers (cc:ed)
to positively confirm that this patch introduces no new ISO C or POSIX
conformance issues, and will not tie our hands if this or a similar
feature is standardized in the future.

I do not see any new tests anywhere in this patch series, and I found
a bug in the support for %OB in strptime (see below).  To test this
feature we need at least one locale that actually uses it, so rather
than gate *these* patches on the addition of tests, I am asking you to
write comprehensive tests and include them with the first patch you
submit that adds altmon names to a locale. (This locale will have to
be one of the locales currently listed in localedata/Makefile as
available to tests (the test-input variable, I think); if none of them
need this feature, you may add one.)

>         [BZ #10871]
>         * locale/C-time.c: Add alternative month names, define them as the
>         same as mon explicitly.
>         * locale/categories.def: alt_mon and wide-alt_mon added.
>         * locale/langinfo.h: ALTMON_1 .. ALTMON_12 and similar contants
>         defined.
>         * locale/programs/ld-time.c: Alternative month names support
>         added, they are a copy of mon if not specified explicitly.
>         * locale/programs/locfile-kw.gperf: alt_mon defined.
>         * locale/programs/locfile-token.h: tok_alt_mon defined.
>         * localedata/tst-langinfo.c: Add tests for the new constants
>         ALTMON_1 .. ALTMON_12.
>         * time/strftime_l.c: %OB format for alternative month names
>         added.
>         * time/strptime_l.c: Alternative month names also recognized.

A note on writing ChangeLog entries: we really do want you to list
every single new symbol that's part of the public API, not an
abbreviated list -- that is:

        * locale/langinfo.h [__USE_GNU]: New public symbols ALTMON_1,
        ALTMON_2, ALTMON_3, ALTMON_4, ALTMON_5, ALTMON_6, ALTMON_7,
        ALTMON_8, ALTMON_9, ALTMON_10, ALTMON_11, ALTMON_12,
        _NL_WALTMON_1, _NL_WALTMON_2, _NL_WALTMON_3, _NL_WALTMON_4,
        _NL_WALTMON_5, _NL_WALTMON_6, _NL_WALTMON_7, _NL_WALTMON_8,
        _NL_WALTMON_9, _NL_WALTMON_10, _NL_WALTMON_11, _NL_WALTMON_12.

We want it spelled out like this because the ChangeLog exists to be
grepped.  The idea is that someone five years from now who is having a
weird problem involving _NL_WALTMON_9 should be able to search for
_NL_WALTMON_9 and find the commit where it was introduced.  If you
abbreviate the list, that won't work.

The official instructions for writing ChangeLogs
<https://www.gnu.org/prep/standards/html_node/Change-Logs.html> would
have you mention *every* new, changed, or deleted symbol for *every*
file where it appears, but I find that that bulks up the log entries
without adding much benefit.  It is enough, I think, to mention every
file, and (independently) every new, changed, or deleted symbol at
least once.  That's enough for the hypothetical future person to find
the commit, and then they're going to look at the diffs.

> diff --git a/ChangeLog b/ChangeLog
> index 3b8e6c5..b0636ff 100644
> --- a/ChangeLog
> +++ b/ChangeLog

Another minor procedural thing: don't include diffs to the file
ChangeLog in commits sent for review, because if a reviewer wants to
apply the patch and tinker with it, they don't want to have to clean
up the inevitable merge botch in the ChangeLog.

> @@ -402,6 +404,20 @@ __strptime_internal (const char *rp, const char *fmt,
> struct tm *tmp,
>               if (s.decided !=raw)
>                 {
>                   trp = rp;
> +#ifdef _LIBC
> +                 /* First check the alt month.  */

Here you are checking the alt-month name before the month name ...

> @@ -428,6 +444,10 @@ __strptime_internal (const char *rp, const char *fmt,
> struct tm *tmp,
>               if (s.decided != loc
>                   && (((trp = rp, match_string (month_name[cnt], trp))
>                        && trp > rp_longest)
> +#ifdef _LIBC
> +                     || ((trp = rp, match_string (alt_month_name[cnt], trp))
> +                         && trp > rp_longest)
> +#endif
>                       || ((trp = rp, match_string (ab_month_name[cnt], trp))
>                           && trp > rp_longest)))

... and here you are checking the alt-month name after the month name.
Please make this consistent one way or the other.  (I *think* the
order of checks would only ever matter if the alt-month name for month
X were the same as the month name for month Y, which would be a nasty
ambiguity in the relevant natural language -- but nasty ambiguities do
actually happen in natural languages, so we need to be careful.)

I wonder whether we actually need the #ifdef _LIBC here -- but that's
an independent question (is this code _really_ still shared with
gnulib, or have they diverged to the point where we should stop trying
to keep the files in sync?) and you are not on the hook to answer it.

> @@ -1015,6 +1035,10 @@ __strptime_internal (const char *rp, const char *fmt,
> struct tm *tmp,
>         case 'O':
>           switch (*fmt++)
>             {
> +           case 'B':
> +             /* Undo the increment and continue.  */
> +             fmt--;
> +             break;

This is subtly wrong, and I had to read a big chunk of the entire
function to understand what it was meant to do and how it doesn't
actually work.  Please change to

]         case 'O':
]           switch (*fmt++)
]             {
] +           case 'B':
] +             /* Match month name.  Reprocess as plain 'B'.  */
] +             fmt--;
] +             goto start_over;

You will also need to remove the `#ifndef _NL_CURRENT` wrapping the
definition of the `start_over` label, and change its comment:

] -#ifndef _NL_CURRENT
] -      /* We need this for handling the `E' modifier.  */
] +      /* In some cases, modifiers are handled by adjusting state and
] +         then restarting the switch statement below.  */
]     start_over:
] -#endif
]
]       /* Make back up of current processing pointer.  */

(Am I seriously telling you to use `goto`?  Yes, I am.  This is the
kind of abnormal control flow situation where a judicious `goto` is
_easier_ to read than the alternative - what you had was meant to do
the same thing, but it did it by cycling the `while (*fmt != '\0')`
loop, which meant that all of the processing _above_ the outer switch
statement was repeated, which was both wrong (note the unconditional
`++fmt` above "We discard strftime modifiers") and more fragile and
complicated to understand than "goto start_over", which requires the
reader only to search for the "start_over" label.)

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]