This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC][PATCH v9 2/6] Implement alternative month names (bug 10871).
- From: Rafal Luzynski <digitalfreak at lingonborough dot com>
- To: Zack Weinberg <zackw at panix dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Wed, 15 Nov 2017 11:32:38 +0100 (CET)
- Subject: Re: [RFC][PATCH v9 2/6] Implement alternative month names (bug 10871).
- Authentication-results: sourceware.org; auth=none
- References: <742475879.1094767.1505817734249@poczta.nazwa.pl> <CAKCAbMhaqZJnunsVgsUrcg5=GjRJ6Oyh2kWLJjpUBgZxpTmoNg@mail.gmail.com> <505927862.453253.1510007773186@poczta.nazwa.pl>
- Reply-to: Rafal Luzynski <digitalfreak at lingonborough dot com>
6.11.2017 23:36 Rafal Luzynski <digitalfreak@lingonborough.com> wrote:
> [...]
> Actually we don't need the updated locale data containing the actual
> alternative month names. In most cases the %B/%OB format specifiers
> should work correctly in strptime() no matter if the input string contains
> the nominative or genitive case because the algorithm searches for the
> best match rather than for the equal string.
>
> A longer explanation: The algorithm selects the month name which has
> the longest matching initial substring with the input string, including
> the terminating '\0' character. Since in eastern European languages the
> grammatical cases are made by appending or changing suffixes while the
> stems (usually) remain the same, this algorithm is still able to
> recognize the month name correctly. [...]
I made some tests locally and it turns out I was wrong here. The
genitive forms (in eastern European languages) will not be recognized
by the current locales just because they look similar to the nominative
forms. For example, in Polish language (I choose the simplest word)
the word for May is "maj", the genitive form is "maja". When we try
to parse "maja" with strptime("%B", ...) we get the substring "maj"
correctly recognized as the 5th month while the letter "a" remains
unparsed which eventually raises an error. I incorrectly thought
that strptime() matches the whole word and returns the index of the
word from the repository (from nl_langinfo() results) which has the
longest matching substring (whole string is the best). Actually
it matches the longest substring and leaves the rest of the word
unparsed.
Shortly, only the words which are actually present in the repository
are recognized as the month names, or their initial substrings.
The words which partially match raise errors.
On the other hand, "%OB" format specifier should work fine
(as well as "%B") as long as we use only the nominative cases
in the input strings.
Regards,
Rafal