This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v11 1/5] Implement alternative month names (bug 10871).


On 01/08/2018 03:59 PM, Rafal Luzynski wrote:
> Some languages (Slavic, Baltic, etc.) require a genitive case of the
> month name when formatting a full date (with the day number) while
> they require a nominative case when referring to the month standalone.
> This requirement cannot be fulfilled without providing two forms for
> each month name.  From now it is precised that nl_langinfo(MON_1)
> series (up to MON_12) and strftime("%B") generate the month names in
> the grammatical form used when the month forms part of a complete date.
> If the grammatical form used when the month is named by itself is needed,
> the new values nl_langinfo(ALTMON_1) (up to ALTMON_12) and
> strftime("%OB") are supported.  This new feature is optional so the
> languages which do not need it or do not yet provide the updated
> locales simply do not use it and their behaviour is unchanged.

High level:

Overall the idea you have proposed makes perfect sense. I'm excited to see
someone pushing a new interface design like this to solve real language
problems with the interfaces. I look forward to the POSIX standardization.
My obvious worry remains that POSIX does not standardize %OB/%Ob, but all
we can do is lead by example and solve user problems. Thank you for this
work.

Design:

By the very definition of the %O* variants, if the locale does not 
provide ALTMON_* then it must revert back to the non-%O* variant e.g. %B. 
I see this implemented in the parser which is good.

I also see that strftime is relaxed in what it accepts and that is also
good and required.

One oddity I noticed in patch 3 is the missing ABALTMON_* definitions
for the _GNU_SOURCE case, is that on purpose and why? It's not needed,
but it seems like a missing useful feature that completes the API.

Implementation:

My only request here is additional tests that verify, for locales
that don't have ALTMON or ABALTMON specified, thus verifying the fallback
works. With that added I think you can commit this change squashed together
with the pl_PL and ru_RU changes and the test cases fixed up to pass.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> 	[BZ #10871]
> 	* locale/C-time.c: Add alternative month names, define them as the
> 	same as mon explicitly.
> 	* locale/categories.def: alt_mon and wide-alt_mon added.
> 	* locale/langinfo.h [__USE_GNU]: New public symbols ALTMON_1,
> 	ALTMON_2, ALTMON_3, ALTMON_4, ALTMON_5, ALTMON_6, ALTMON_7,
> 	ALTMON_8, ALTMON_9, ALTMON_10, ALTMON_11, ALTMON_12,
> 	_NL_WALTMON_1, _NL_WALTMON_2, _NL_WALTMON_3, _NL_WALTMON_4,
> 	_NL_WALTMON_5, _NL_WALTMON_6, _NL_WALTMON_7, _NL_WALTMON_8,
> 	_NL_WALTMON_9, _NL_WALTMON_10, _NL_WALTMON_11, _NL_WALTMON_12.
> 	* locale/programs/ld-time.c: Alternative month names support
> 	added, they are a copy of mon if not specified explicitly.
> 	* locale/programs/locfile-kw.gperf: alt_mon defined.
> 	* locale/programs/locfile-token.h: tok_alt_mon defined.
> 	* localedata/tst-langinfo.c: Add tests for the new constants
> 	ALTMON_1 .. ALTMON_12.
> 	* time/Makefile (LOCALES): Add pl_PL.UTF-8 for tests.
> 	* time/strftime_l.c: %OB format for alternative month names
> 	added.
> 	* time/strptime_l.c: Alternative month names also recognized.
> 	* time/tst-strptime.c: Add tests to parse different forms of
> 	month names including the new %OB format specifier.
> ---
>  locale/C-time.c                  | 28 ++++++++++++++++++++--
>  locale/categories.def            |  2 ++
>  locale/langinfo.h                | 50 ++++++++++++++++++++++++++++++++++++++--
>  locale/programs/ld-time.c        | 21 +++++++++++++++++
>  locale/programs/locfile-kw.gperf |  1 +
>  locale/programs/locfile-token.h  |  1 +
>  localedata/tst-langinfo.c        | 12 ++++++++++
>  time/Makefile                    |  2 +-
>  time/strftime_l.c                | 11 +++++++--
>  time/strptime_l.c                | 32 +++++++++++++++++++++----
>  time/tst-strptime.c              |  6 +++++
>  11 files changed, 155 insertions(+), 11 deletions(-)
> 
> diff --git a/locale/C-time.c b/locale/C-time.c
> index 1f1ee01..73bc700 100644
> --- a/locale/C-time.c
> +++ b/locale/C-time.c
> @@ -30,7 +30,7 @@ const struct __locale_data _nl_C_LC_TIME attribute_hidden =
>    { NULL, },			/* no cached data */
>    UNDELETABLE,
>    0,
> -  111,
> +  135,

OK.

>    {
>      { .string = "Sun" },
>      { .string = "Mon" },
> @@ -142,6 +142,30 @@ const struct __locale_data _nl_C_LC_TIME attribute_hidden =
>      { .string = "" },
>      { .string = "%a %b %e %H:%M:%S %Z %Y" },
>      { .wstr = (const uint32_t *) L"%a %b %e %H:%M:%S %Z %Y" },
> -    { .string = _nl_C_codeset }
> +    { .string = _nl_C_codeset },
> +    { .string = "January" },
> +    { .string = "February" },
> +    { .string = "March" },
> +    { .string = "April" },
> +    { .string = "May" },
> +    { .string = "June" },
> +    { .string = "July" },
> +    { .string = "August" },
> +    { .string = "September" },
> +    { .string = "October" },
> +    { .string = "November" },
> +    { .string = "December" },
> +    { .wstr = (const uint32_t *) L"January" },
> +    { .wstr = (const uint32_t *) L"February" },
> +    { .wstr = (const uint32_t *) L"March" },
> +    { .wstr = (const uint32_t *) L"April" },
> +    { .wstr = (const uint32_t *) L"May" },
> +    { .wstr = (const uint32_t *) L"June" },
> +    { .wstr = (const uint32_t *) L"July" },
> +    { .wstr = (const uint32_t *) L"August" },
> +    { .wstr = (const uint32_t *) L"September" },
> +    { .wstr = (const uint32_t *) L"October" },
> +    { .wstr = (const uint32_t *) L"November" },
> +    { .wstr = (const uint32_t *) L"December" }

OK.

>    }
>  };
> diff --git a/locale/categories.def b/locale/categories.def
> index 47947f7..3cbb4e6 100644
> --- a/locale/categories.def
> +++ b/locale/categories.def
> @@ -249,6 +249,8 @@ DEFINE_CATEGORY
>    DEFINE_ELEMENT (_DATE_FMT,                "date_fmt",            opt, string)
>    DEFINE_ELEMENT (_NL_W_DATE_FMT,           "wide-date_fmt",       opt,
> wstring)
>    DEFINE_ELEMENT (_NL_TIME_CODESET,	    "time-codeset",	   std, string)
> +  DEFINE_ELEMENT (ALTMON_1,       "alt_mon",       opt, stringarray, 12, 12)
> +  DEFINE_ELEMENT (_NL_WALTMON_1,  "wide-alt_mon",  opt, wstringarray, 12, 12)

OK.

>    ), NO_POSTLOAD)
>  
>  
> diff --git a/locale/langinfo.h b/locale/langinfo.h
> index 28a0a73..0fbd838 100644
> --- a/locale/langinfo.h
> +++ b/locale/langinfo.h
> @@ -100,7 +100,8 @@ enum
>    ABMON_12,
>  #define ABMON_12		ABMON_12
>  
> -  /* Long month names.  */
> +  /* Long month names, in the grammatical form used when the month
> +     forms part of a complete date.  */

OK.

>    MON_1,			/* January */
>  #define MON_1			MON_1
>    MON_2,
> @@ -189,7 +190,8 @@ enum
>    _NL_WABMON_11,
>    _NL_WABMON_12,
>  
> -  /* Long month names.  */
> +  /* Long month names, in the grammatical form used when the month
> +     forms part of a complete date.  */

OK.

>    _NL_WMON_1,		/* January */
>    _NL_WMON_2,
>    _NL_WMON_3,
> @@ -231,6 +233,50 @@ enum
>  
>    _NL_TIME_CODESET,
>  
> +  /* Long month names, in the grammatical form used when the month
> +     is named by itself.  */
> +  __ALTMON_1,			/* January */
> +  __ALTMON_2,
> +  __ALTMON_3,
> +  __ALTMON_4,
> +  __ALTMON_5,
> +  __ALTMON_6,
> +  __ALTMON_7,
> +  __ALTMON_8,
> +  __ALTMON_9,
> +  __ALTMON_10,
> +  __ALTMON_11,
> +  __ALTMON_12,

OK.

> +#ifdef __USE_GNU
> +# define ALTMON_1		__ALTMON_1
> +# define ALTMON_2		__ALTMON_2
> +# define ALTMON_3		__ALTMON_3
> +# define ALTMON_4		__ALTMON_4
> +# define ALTMON_5		__ALTMON_5
> +# define ALTMON_6		__ALTMON_6
> +# define ALTMON_7		__ALTMON_7
> +# define ALTMON_8		__ALTMON_8
> +# define ALTMON_9		__ALTMON_9
> +# define ALTMON_10		__ALTMON_10
> +# define ALTMON_11		__ALTMON_11
> +# define ALTMON_12		__ALTMON_12

OK. Requires _GNU_SOURCE, which is good because this is an extension not
yet defined by POSIX.

> +#endif
> +
> +  /* Long month names, in the grammatical form used when the month
> +     is named by itself.  */
> +  _NL_WALTMON_1,			/* January */
> +  _NL_WALTMON_2,
> +  _NL_WALTMON_3,
> +  _NL_WALTMON_4,
> +  _NL_WALTMON_5,
> +  _NL_WALTMON_6,
> +  _NL_WALTMON_7,
> +  _NL_WALTMON_8,
> +  _NL_WALTMON_9,
> +  _NL_WALTMON_10,
> +  _NL_WALTMON_11,
> +  _NL_WALTMON_12,

OK.

> +
>    _NL_NUM_LC_TIME,	/* Number of indices in LC_TIME category.  */
>  
>    /* LC_COLLATE category: text sorting.
> diff --git a/locale/programs/ld-time.c b/locale/programs/ld-time.c
> index 67d055a..4186448 100644
> --- a/locale/programs/ld-time.c
> +++ b/locale/programs/ld-time.c
> @@ -91,6 +91,9 @@ struct locale_time_t
>    const char *date_fmt;
>    const uint32_t *wdate_fmt;
>    int alt_digits_defined;
> +  const char *alt_mon[12];
> +  const uint32_t *walt_mon[12];
> +  int alt_mon_defined;

OK.

>    unsigned char week_ndays;
>    uint32_t week_1stday;
>    unsigned char week_1stweek;
> @@ -639,6 +642,15 @@ time_output (struct localedef_t *locale, const struct
> charmap_t *charmap,
>    add_locale_string (&file, time->date_fmt);
>    add_locale_wstring (&file, time->wdate_fmt);
>    add_locale_string (&file, charmap->code_set_name);
> +
> +  /* The alt'mons.  */
> +  for (n = 0; n < 12; ++n)
> +    add_locale_string (&file, time->alt_mon[n] ?: "");
> +
> +  /* The wide character alt'mons.  */
> +  for (n = 0; n < 12; ++n)
> +    add_locale_wstring (&file, time->walt_mon[n] ?: empty_wstr);

OK.

> +
>    write_locale_data (output_path, LC_TIME, "LC_TIME", &file);
>  }
>  
> @@ -782,6 +794,7 @@ time_read (struct linereader *ldfile, struct localedef_t
> *result,
>  	  STRARR_ELEM (mon, 12, 12);
>  	  STRARR_ELEM (am_pm, 2, 2);
>  	  STRARR_ELEM (alt_digits, 0, 100);
> +	  STRARR_ELEM (alt_mon, 12, 12);

OK.

>  
>  	case tok_era:
>  	  /* Ignore the rest of the line if we don't need the input of
> @@ -934,6 +947,14 @@ time_read (struct linereader *ldfile, struct localedef_t
> *result,
>  	    lr_error (ldfile, _("\
>  %1$s: definition does not end with `END %1$s'"), "LC_TIME");
>  	  lr_ignore_rest (ldfile, now->tok == tok_lc_time);
> +
> +	  /* If alt_mon was not specified, make it a copy of mon.  */
> +	  if (!ignore_content && !time->alt_mon_defined)
> +	    {
> +	      memcpy (time->alt_mon, time->mon, sizeof (time->mon));
> +	      memcpy (time->walt_mon, time->wmon, sizeof (time->wmon));
> +	      time->alt_mon_defined = 1;
> +	    }

OK. Good, fall back to %B.

>  	  return;
>  
>  	default:
> diff --git a/locale/programs/locfile-kw.gperf b/locale/programs/locfile-kw.gperf
> index c74c1f2..dad7f21 100644
> --- a/locale/programs/locfile-kw.gperf
> +++ b/locale/programs/locfile-kw.gperf
> @@ -148,6 +148,7 @@ first_workday,          tok_first_workday,          0
>  cal_direction,          tok_cal_direction,          0
>  timezone,               tok_timezone,               0
>  date_fmt,               tok_date_fmt,               0
> +alt_mon,                tok_alt_mon,                0

OK.

>  LC_MESSAGES,            tok_lc_messages,            0
>  yesexpr,                tok_yesexpr,                0
>  noexpr,                 tok_noexpr,                 0
> diff --git a/locale/programs/locfile-token.h b/locale/programs/locfile-token.h
> index f02325d..d49da5e 100644
> --- a/locale/programs/locfile-token.h
> +++ b/locale/programs/locfile-token.h
> @@ -186,6 +186,7 @@ enum token_t
>    tok_cal_direction,
>    tok_timezone,
>    tok_date_fmt,
> +  tok_alt_mon,

OK.

>    tok_lc_messages,
>    tok_yesexpr,
>    tok_noexpr,
> diff --git a/localedata/tst-langinfo.c b/localedata/tst-langinfo.c
> index 8c3667c..0d33e75 100644
> --- a/localedata/tst-langinfo.c
> +++ b/localedata/tst-langinfo.c
> @@ -50,6 +50,18 @@ struct map
>    VAL (ABMON_8),
>    VAL (ABMON_9),
>    VAL (ALT_DIGITS),
> +  VAL (ALTMON_1),
> +  VAL (ALTMON_10),
> +  VAL (ALTMON_11),
> +  VAL (ALTMON_12),
> +  VAL (ALTMON_2),
> +  VAL (ALTMON_3),
> +  VAL (ALTMON_4),
> +  VAL (ALTMON_5),
> +  VAL (ALTMON_6),
> +  VAL (ALTMON_7),
> +  VAL (ALTMON_8),
> +  VAL (ALTMON_9),

OK.

>    VAL (AM_STR),
>    VAL (CRNCYSTR),
>    VAL (CURRENCY_SYMBOL),
> diff --git a/time/Makefile b/time/Makefile
> index 264eed9..91adcd0 100644
> --- a/time/Makefile
> +++ b/time/Makefile
> @@ -48,7 +48,7 @@ tests	:= test_time clocktest tst-posixtz tst-strptime
> tst_wcsftime \
>  include ../Rules
>  
>  ifeq ($(run-built-tests),yes)
> -LOCALES := de_DE.ISO-8859-1 en_US.ISO-8859-1 ja_JP.EUC-JP
> +LOCALES := de_DE.ISO-8859-1 en_US.ISO-8859-1 ja_JP.EUC-JP pl_PL.UTF-8

OK.

>  include ../gen-locales.mk
>  
>  $(objpfx)tst-ftime_l.out: $(gen-locales)
> diff --git a/time/strftime_l.c b/time/strftime_l.c
> index 18651ff..ac5d28f 100644
> --- a/time/strftime_l.c
> +++ b/time/strftime_l.c
> @@ -492,6 +492,9 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T
> *format,
>  # define f_month \
>    ((const CHAR_T *) (tp->tm_mon < 0 || tp->tm_mon > 11			     \
>  		     ? "?" : _NL_CURRENT (LC_TIME, NLW(MON_1) + tp->tm_mon)))
> +# define f_altmonth \
> +  ((const CHAR_T *) (tp->tm_mon < 0 || tp->tm_mon > 11			     \
> +		     ? "?" : _NL_CURRENT (LC_TIME, NLW(ALTMON_1) + tp->tm_mon)))

OK.

>  # define ampm \
>    ((const CHAR_T *) _NL_CURRENT (LC_TIME, tp->tm_hour > 11		      \
>  				 ? NLW(PM_STR) : NLW(AM_STR)))
> @@ -507,6 +510,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T
> *format,
>  		   ? "?" : month_name[tp->tm_mon])
>  #  define a_wkday f_wkday
>  #  define a_month f_month
> +#  define f_altmonth f_month

OK.

>  #  define ampm (L_("AMPM") + 2 * (tp->tm_hour > 11))
>  
>    size_t aw_len = 3;
> @@ -785,7 +789,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T
> *format,
>  #endif
>  
>  	case L_('B'):
> -	  if (modifier != 0)
> +	  if (modifier == L_('E'))
>  	    goto bad_format;

OK. Accept %O but not %E.

>  	  if (change_case)
>  	    {
> @@ -793,7 +797,10 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const
> CHAR_T *format,
>  	      to_lowcase = 0;
>  	    }
>  #if defined _NL_CURRENT || !HAVE_STRFTIME
> -	  cpy (STRLEN (f_month), f_month);
> +	  if (modifier == L_('O'))
> +	    cpy (STRLEN (f_altmonth), f_altmonth);

OK.

> +	  else
> +	    cpy (STRLEN (f_month), f_month);
>  	  break;
>  #else
>  	  goto underlying_strftime;
> diff --git a/time/strptime_l.c b/time/strptime_l.c
> index 7d4758e..39cf38d 100644
> --- a/time/strptime_l.c
> +++ b/time/strptime_l.c
> @@ -124,6 +124,8 @@ extern const struct __locale_data _nl_C_LC_TIME
> attribute_hidden;
>    (&_nl_C_LC_TIME.values[_NL_ITEM_INDEX (ABDAY_1)].string)
>  # define month_name (&_nl_C_LC_TIME.values[_NL_ITEM_INDEX (MON_1)].string)
>  # define ab_month_name (&_nl_C_LC_TIME.values[_NL_ITEM_INDEX (ABMON_1)].string)
> +# define alt_month_name \
> +  (&_nl_C_LC_TIME.values[_NL_ITEM_INDEX (ALTMON_1)].string)

OK.

>  # define HERE_D_T_FMT (_nl_C_LC_TIME.values[_NL_ITEM_INDEX (D_T_FMT)].string)
>  # define HERE_D_FMT (_nl_C_LC_TIME.values[_NL_ITEM_INDEX (D_FMT)].string)
>  # define HERE_AM_STR (_nl_C_LC_TIME.values[_NL_ITEM_INDEX (AM_STR)].string)
> @@ -319,10 +321,9 @@ __strptime_internal (const char *rp, const char *fmt,
> struct tm *tmp,
>        while (*fmt >= '0' && *fmt <= '9')
>  	++fmt;
>  
> -#ifndef _NL_CURRENT
> -      /* We need this for handling the `E' modifier.  */
> +      /* In some cases, modifiers are handled by adjusting state and
> +         then restarting the switch statement below.  */

OK.

>      start_over:
> -#endif
>  
>        /* Make back up of current processing pointer.  */
>        rp_backup = rp;
> @@ -423,13 +424,32 @@ __strptime_internal (const char *rp, const char *fmt,
> struct tm *tmp,
>  				     ab_month_name[cnt]))
>  			decided_longest = loc;
>  		    }
> +#ifdef _LIBC
> +		  /* Now check the alt month.  */
> +		  trp = rp;
> +		  if (match_string (_NL_CURRENT (LC_TIME, ALTMON_1 + cnt), trp)
> +		      && trp > rp_longest)
> +		    {
> +		      rp_longest = trp;
> +		      cnt_longest = cnt;
> +		      if (s.decided == not
> +			  && strcmp (_NL_CURRENT (LC_TIME, ALTMON_1 + cnt),
> +				     alt_month_name[cnt]))
> +			decided_longest = loc;
> +		    }
> +#endif

OK. Look for alternative month name.

>  		}
>  #endif
>  	      if (s.decided != loc
>  		  && (((trp = rp, match_string (month_name[cnt], trp))
>  		       && trp > rp_longest)
>  		      || ((trp = rp, match_string (ab_month_name[cnt], trp))
> -			  && trp > rp_longest)))
> +			  && trp > rp_longest)
> +#ifdef _LIBC
> +		      || ((trp = rp, match_string (alt_month_name[cnt], trp))
> +			  && trp > rp_longest)

OK.

> +#endif
> +	      ))
>  		{
>  		  rp_longest = trp;
>  		  cnt_longest = cnt;
> @@ -1015,6 +1035,10 @@ __strptime_internal (const char *rp, const char *fmt,
> struct tm *tmp,
>  	case 'O':
>  	  switch (*fmt++)
>  	    {
> +	    case 'B':
> +	      /* Match month name.  Reprocess as plain 'B'.  */
> +	      fmt--;
> +	      goto start_over;

OK.

>  	    case 'd':
>  	    case 'e':
>  	      /* Match day of month using alternate numeric symbols.  */
> diff --git a/time/tst-strptime.c b/time/tst-strptime.c
> index 34ad797..bbc1390 100644
> --- a/time/tst-strptime.c
> +++ b/time/tst-strptime.c
> @@ -51,6 +51,12 @@ static const struct
>      6, 0, 0, 1 },
>    { "ja_JP.EUC-JP", "2001 20 \xb7\xee", "%Y %U %a", 1, 140, 4, 21 },
>    { "ja_JP.EUC-JP", "2001 21 \xb7\xee", "%Y %W %a", 1, 140, 4, 21 },
> +  { "pl_PL.UTF-8", "21 lis 2017", "%d %b %Y", 2, 324, 10, 21 },
> +  { "pl_PL.UTF-8", "22 LIS 2017", "%d %B %Y", 3, 325, 10, 22 },
> +  /* TODO: Use the genitive case here as soon as it is added to localedata.  */
> +  { "pl_PL.UTF-8", "23 listopad 2017", "%d %B %Y", 4, 326, 10, 23 },
> +  /* The nominative case is incorrect here but it is parseable.  */
> +  { "pl_PL.UTF-8", "24 listopad 2017", "%d %OB %Y", 5, 327, 10, 24 },

Here we need examples of %OB for langauges that do *not* have genitive names
in order to test the conversion of %OB for locales that would have this
definition missing (tests that %OB does fall back to %B).

>  };
>  
>  
> 


-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]