Bug 24293 - Missing Minguo calendar support for TW locales
Summary: Missing Minguo calendar support for TW locales
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: 2.30
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-02 16:54 UTC by Felix Yan
Modified: 2019-04-05 20:44 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Yan 2019-03-02 16:54:59 UTC
Minguo calendar is the official calendar system, and very widely used in Taiwan, it would be nice to have the support in glibc.

Some background information: The government website (www.gov.tw) uses it, popular public services like Taiwan HSR also uses this calendar system.

Link to wikipedia: https://en.wikipedia.org/wiki/Minguo_calendar
Comment 1 Wei-Lun Chao 2019-03-07 07:47:03 UTC
(In reply to Felix Yan from comment #0)
> Minguo calendar is the official calendar system, and very widely used in
> Taiwan, it would be nice to have the support in glibc.
Confirmed. "Minguo" calendar is widely used in offical and educational occasions.

> Some background information: The government website (www.gov.tw) uses it,
> popular public services like Taiwan HSR also uses this calendar system.
In most current cases two calendars coexist, but shifting to Gregorian calendar is the trend.

> Link to wikipedia: https://en.wikipedia.org/wiki/Minguo_calendar
According to Chinese National Standard "CNS 7648", Gregorian calendar is the main calendar while "Minguo" calendar as secondary representation.

Referring to reporter's blog: https://blog.felixc.at/2018/11/add-minguo-calendar-support-as-glibc-localedata-era/
Three extra lines could be proposed(untested):

era "+:2:1913//01//01:+*:<U6C11><U570B>:%EC%Ey<U5E74>";/
    "+:1:1912//01//01:1912//12//31:<U6C11><U570B>:%EC<U5143><U5E74>";/
    "+:1:1911//12//31:-*:<U6C11><U570B><U524D>:%EC%Ey<U5E74>"
Comment 2 Wei-Lun Chao 2019-03-15 07:11:21 UTC
% Tested entries:
era "+:2:1913//01//01:+*:<U6C11><U570B>:%EC%Ey";/
    "+:1:1912//01//01:1912//12//31:<U6C11><U570B>:%EC<U5143>";/
    "+:1:1911//12//31:-*:<U6C11><U570B><U524D>:%EC%Ey"
era_d_fmt       "%EY<U5E74>%b%-d<U65E5>"
era_d_t_fmt     "%EY<U5E74>%b%-d<U65E5> (%A) %H<U6642>%M<U5206>%S<U79D2>"
Comment 3 Sourceware Commits 2019-03-15 09:13:05 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  238d60a1fb5081450ca57d3e20f6c1c27df9afb5 (commit)
      from  5b06f538c5aee0389ed034f60d90a8884d6d54de (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=238d60a1fb5081450ca57d3e20f6c1c27df9afb5

commit 238d60a1fb5081450ca57d3e20f6c1c27df9afb5
Author: Felix Yan <felixonmars@archlinux.org>
Date:   Thu Mar 7 17:40:02 2019 +0800

    localedata: Add Minguo calendar support to Taiwanese locales [BZ #24293]
    
    Minguo calendar is the official calendar system, and very widely used in
    Taiwan. This commit adds its support into glibc.
    
    Some background information: The government website (www.gov.tw) uses it,
    popular public services like Taiwan HSR also use this calendar system.
    
    Link to Wikipedia: https://en.wikipedia.org/wiki/Minguo_calendar
    
            [BZ #24293]
            * localedata/locales/zh_TW (era): Add, support Minguo calendar.
            * localedata/locales/cmn_TW (era): Likewise.
            * localedata/locales/hak_TW (era): Likewise.
            * localedata/locales/lzh_TW (era): Likewise.
            * localedata/locales/nan_TW (era): Likewise.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                 |    9 +++++++++
 localedata/locales/cmn_TW |    4 ++++
 localedata/locales/hak_TW |    4 ++++
 localedata/locales/lzh_TW |    4 ++++
 localedata/locales/nan_TW |    4 ++++
 localedata/locales/zh_TW  |    4 ++++
 6 files changed, 29 insertions(+), 0 deletions(-)
Comment 4 Rafal Luzynski 2019-03-15 09:25:58 UTC
(In reply to Wei-Lun Chao from comment #2)
> % Tested entries:
> era "+:2:1913//01//01:+*:<U6C11><U570B>:%EC%Ey";/
>     "+:1:1912//01//01:1912//12//31:<U6C11><U570B>:%EC<U5143>";/
>     "+:1:1911//12//31:-*:<U6C11><U570B><U524D>:%EC%Ey"
> era_d_fmt       "%EY<U5E74>%b%-d<U65E5>"
> era_d_t_fmt     "%EY<U5E74>%b%-d<U65E5> (%A) %H<U6642>%M<U5206>%S<U79D2>"

This bug has just been fixed.  Your proposal differs from Felix Yan's patch which has just been pushed to master, I think that <U5E74> (年) character is missing from your proposal.  Feel free to continue discussing or providing more patches if the current content needs updates.
Comment 5 Wei-Lun Chao 2019-03-15 09:56:13 UTC
My proposal would be more well-formed:
1. Without definition of era_d_t_fmt, the output of command "date +%Ec" could be undefined.
2. The character <U5E74> should be included in "era_d_fmt" and "era_d_t_fmt", not in "era"(%EY). Just like <U5E74> is included in "d_fmt" and "d_t_fmt", not in %Y.
3. Years before the first year should be prefixed with <U6C11><U570B><U524D>, not with <U6C11><U524D>.
Comment 6 Felix Yan 2019-03-15 10:20:22 UTC
(In reply to Wei-Lun Chao from comment #5)
> 1. Without definition of era_d_t_fmt, the output of command "date +%Ec"
> could be undefined.
> 2. The character <U5E74> should be included in "era_d_fmt" and
> "era_d_t_fmt", not in "era"(%EY). Just like <U5E74> is included in "d_fmt"
> and "d_t_fmt", not in %Y.
I see, thanks for the info. I guess I did not understand the d_fmt and d_t_fmt parts before.

> 3. Years before the first year should be prefixed with
> <U6C11><U570B><U524D>, not with <U6C11><U524D>.
I believe <U6C11><U524D> is correct according to multiple sources (government websites, wikipedia, etc). Some links:

https://land.gov.taipei/cp.aspx?n=1C18A16DCE9C4260
https://zh.wikipedia.org/wiki/%E6%B0%91%E5%9C%8B%E7%B4%80%E5%B9%B4#%E6%A6%82%E8%A6%81

I did not find a source for <U6C11><U570B><U524D>, on the other hand. There is also confusion in this usage, as <U6C11><U570B><U524D>3<U5E74> could mean either <U6C11><U524D>3<U5E74> or the three-year range of <U6C11><U570B>1~3<U5E74>.
Comment 7 Felix Yan 2019-03-15 10:39:38 UTC
I just took a look at the era parts of other locales. In ja_JP, the character <U5E74> is also defined in era. In my understanding <U5E74> needs to be present for %EY to be complete.

I guess I'll need to add era_d_fmt and era_d_t_fmt lines but keep <U5E74> in era, so %Ec would work as expected.

@Rafal

One thing I would like to ask is, should we use %-EY instead of %EY here in the era_d_fmt/era_d_t_fmt lines to get rid of the zero padding added in glibc 2.29? If so I'll submit a patch for ja_JP too.
Comment 8 Rafal Luzynski 2019-03-15 11:39:46 UTC
(In reply to Felix Yan from comment #7)
> @Rafal
> 
> One thing I would like to ask is, should we use %-EY instead of %EY here in
> the era_d_fmt/era_d_t_fmt lines to get rid of the zero padding added in
> glibc 2.29?

Yes if you want to get rid of the zero padding.

> If so I'll submit a patch for ja_JP too.

ja_JP wants the zero padding and that's why it has been introduced in bug 23758.  That means, ja_JP does not want the patch you suggest.  Also please read their arguments, maybe you will agree not to remove the zero padding in *_TW either.
Comment 9 Felix Yan 2019-03-15 16:05:31 UTC
(In reply to Rafal Luzynski from comment #8)
> (In reply to Felix Yan from comment #7)
> > If so I'll submit a patch for ja_JP too.
> 
> ja_JP wants the zero padding and that's why it has been introduced in bug
> 23758.  That means, ja_JP does not want the patch you suggest.  Also please
> read their arguments, maybe you will agree not to remove the zero padding in
> *_TW either.

I see. They want the zero padding for fixed width text, but this is not the case for Minguo calendar as far as I see. It's already three digits now (108, just over 100), so I believe it doesn't make sense to pad to 2 digits.
Comment 10 Rafal Luzynski 2019-03-15 20:32:27 UTC
Zero padding (or space padding or no padding at all) will do nothing for the numbers larger than 9 so you don't have to worry about it at all.  There was the same explanation why that change was not destructive for lo_LA and th_TH locales.
Comment 11 Wei-Lun Chao 2019-03-17 01:11:36 UTC
(In reply to Felix Yan from comment #6)
> > 3. Years before the first year should be prefixed with
> > <U6C11><U570B><U524D>, not with <U6C11><U524D>.
> I believe <U6C11><U524D> is correct according to multiple sources
> (government websites, wikipedia, etc). Some links:
> 
> https://land.gov.taipei/cp.aspx?n=1C18A16DCE9C4260
> https://zh.wikipedia.org/wiki/
> %E6%B0%91%E5%9C%8B%E7%B4%80%E5%B9%B4#%E6%A6%82%E8%A6%81
AFAK, it's for something not exist like:
abera "+:2:1913//01//01:+*:<U6C11>:%EC%Ey";/
     "+:1:1912//01//01:1912//12//31:<U6C11>:%EC<U5143>";/
     "+:1:1911//12//31:-*:<U6C11><U524D>:%EC%Ey"

> I did not find a source for <U6C11><U570B><U524D>, on the other hand. There
> is also confusion in this usage, as <U6C11><U570B><U524D>3<U5E74> could mean
> either <U6C11><U524D>3<U5E74> or the three-year range of
> <U6C11><U570B>1~3<U5E74>.
For the range usage, we use <U6C11><U521D>.
Comment 12 Sourceware Commits 2019-04-02 07:49:54 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  466afec30896585b60c2106df7a722a86247c9f3 (commit)
       via  84aea16929f310625a52bf9c3db3341f56970ab0 (commit)
       via  2f1d61552d429c4e1cbcec115f3cc3dcaf91400d (commit)
       via  2c7e704b7e590e9895155529f1480e2cdea5424f (commit)
      from  62449176e035e47adc543c1b8cf2075edaaf4742 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=466afec30896585b60c2106df7a722a86247c9f3

commit 466afec30896585b60c2106df7a722a86247c9f3
Author: TAMUKI Shoichi <tamuki@linet.gr.jp>
Date:   Tue Apr 2 16:46:55 2019 +0900

    ja_JP locale: Add entry for the new Japanese era [BZ #22964]
    
    The Japanese era name will be changed on May 1, 2019.  The Japanese
    government made a preliminary announcement on April 1, 2019.
    
    The glibc ja_JP locale must be updated to include the new era name for
    strftime's alternative year format support.
    
    Checked on x86_64-linux-gnu.
    
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    
    ChangeLog:
    
    	[BZ #22964]
    	* localedata/locales/ja_JP (LC_TIME): Add entry for the new Japanese
    	era.
    	* time/tst-strftime2.c (dates): Add 2019-04-30 and 2019-05-01.
    	(mkreftable): Add rules for the new Japanese era and the new dates.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=84aea16929f310625a52bf9c3db3341f56970ab0

commit 84aea16929f310625a52bf9c3db3341f56970ab0
Author: TAMUKI Shoichi <tamuki@linet.gr.jp>
Date:   Tue Apr 2 16:42:04 2019 +0900

    time: Add tests for Minguo calendar [BZ #24293]
    
    Co-authored-by: Rafal Luzynski <digitalfreak@lingonborough.com>
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    
    ChangeLog:
    
    	[BZ #24293]
    	* time/Makefile (LOCALES): Add zh_TW.UTF-8, cmn_TW.UTF-8,
    	hak_TW.UTF-8, nan_TW.UTF-8, and lzh_TW.UTF-8.
    	* time/tst-strftime2.c (locales): Likewise.
    	(dates): Add 1910-04-01, 1911-12-31, 1912-01-01, 1913-04-01,
    	2010-04-01, and 2011-04-01.
    	(mkreftable): Add rules for the new locales and the new dates.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2f1d61552d429c4e1cbcec115f3cc3dcaf91400d

commit 2f1d61552d429c4e1cbcec115f3cc3dcaf91400d
Author: TAMUKI Shoichi <tamuki@linet.gr.jp>
Date:   Tue Apr 2 16:37:03 2019 +0900

    time/tst-strftime2.c: Make the file easier to maintain
    
    Express the years as full Gregorian years (e.g., 1988 instead of 88)
    and months with natural numbers (1-12 rather than 0-11).
    
    Compare actual dates rather than indexes when selecting the era name.
    
    Declare the local variable era as a string character pointer rather
    than an array of chars where the actual string is copied which might
    lead to potential buffer overflows in future.
    
    Co-authored-by: Rafal Luzynski <digitalfreak@lingonborough.com>
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
    
    ChangeLog:
    
    	* time/tst-strftime2.c (date_t): Explicitly define the type.
    	(dates): Use natural month and year numbers to express a date.
    	(is_before): New function to compare dates.
    	(mkreftable): Minor improvements to simplify maintenance.
    	(do_test): Reflect the changes in dates array.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2c7e704b7e590e9895155529f1480e2cdea5424f

commit 2c7e704b7e590e9895155529f1480e2cdea5424f
Author: TAMUKI Shoichi <tamuki@linet.gr.jp>
Date:   Tue Apr 2 16:25:35 2019 +0900

    NEWS: Mention Minguo calendar support added [BZ #24293]
    
    Co-authored-by: Rafal Luzynski <digitalfreak@lingonborough.com>
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                |   25 ++++++++
 NEWS                     |    6 ++
 localedata/locales/ja_JP |    6 +-
 time/Makefile            |    4 +-
 time/tst-strftime2.c     |  139 +++++++++++++++++++++++++++++++++-------------
 5 files changed, 139 insertions(+), 41 deletions(-)
Comment 13 Wei-Lun Chao 2019-04-02 08:56:51 UTC
(In reply to Felix Yan from comment #6)
> I did not find a source for <U6C11><U570B><U524D>
See https://www.ris.gov.tw/app/portal/219
The charactor <U524D> means "before" and not a part of the era name.