Bug 21370 - RFE: strftime() needs a "convert to titlecase" flag
Summary: RFE: strftime() needs a "convert to titlecase" flag
Status: UNCONFIRMED
Alias: None
Product: glibc
Classification: Unclassified
Component: time (show other bugs)
Version: 2.26
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-10 23:52 UTC by Rafal Luzynski
Modified: 2020-01-31 11:05 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rafal Luzynski 2017-04-10 23:52:45 UTC
As suggested in https://bugzilla.gnome.org/show_bug.cgi?id=658807 some applications need another flag to be supported by strftime().  The flag should convert the string being output to the titlecase.  By "titlecase" I mean:

- if the first Unicode character is a digraph (or a ligature) then it should be converted to its titlecase counterpart (the first character is uppercase, the second is lowercase);
- if the first Unicode character is a regular single letter then it should be converted to uppercase;
- if it is not a letter or not a lowercase letter then it should not be modified;
- no other character will be modified.

It has been suggested that the flag character should be "*" (for example "%*A") but I can't tell why this character has been chosen.  It may be this or another character.

Currently strftime() supports these similar flags:

- "^" converts all characters to uppercase;
- "#" swaps the case of all characters;
- "^#" should convert all characters to lowercase but it does not - see bug 15527.

This new flag will be useful when formatting dates in probably most of European languages, except English, German, Greek, and probably few more.
Comment 1 Florian Weimer 2017-04-12 07:55:04 UTC
This proposal conflicts with the ALTMON proposal (bug 10871).  Some locales currently use "%d of %B" as the date format (with a language-specific genitive indicator instead of “of”), but they would switch to "%d %B" (or perhaps "%d %OB") to support elision in the genitive indicator.  Applying titlecase conversion  to the ALTMON string would likely not give expected results.
Comment 2 Rafal Luzynski 2017-04-12 10:38:48 UTC
No, I can't see any conflict and I'm pretty sure there isn't any.  I didn't say that all ALTMON locale data should be converted to the titlecase.  What I meant is that a new flag should be introduced and it would be optionally used by the applications only if developers (or translators) want it.

The problem is that in many languages there is no rule which says that the month names must always start with uppercase so they start with lowercase in the locale database.  But sometimes they must start with uppercase for other reasons, for example because they are in the beginning of the sentence, appear as a title, header, standalone, etc.  So, for example:

"%B" - would produce "april" (I know this is against English rules but let's assume for this example that English does not want always uppercase)
"%*B" - would produce "April"
"%A, %B %d" - would produce "wednesday, april 12" (doesn't it look kinda weird?)
"%*A, %B %d" - would produce "Wednesday, april 12" (doesn't it look better?)

This can be combined with "O" flag from bug 10871 to achieve multiple results in multiple languages: "Abril", "abril", "d’abril", "D’abril", "Avril", "avril", "De avril", "de avril", "April", "april", "Aprila", "aprila", "Kwiecień", "kwiecień", "kwietnia".

Similarly, ALTMON proposal does not conflict with the existing flags: "^" and "#".
Comment 3 Florian Weimer 2017-04-12 11:07:55 UTC
(In reply to Rafal Luzynski from comment #2)
> This can be combined with "O" flag from bug 10871 to achieve multiple
> results in multiple languages: "Abril", "abril", "d’abril", "D’abril",
> "Avril", "avril", "De avril", "de avril", "April", "april", "Aprila",
> "aprila", "Kwiecień", "kwiecień", "kwietnia".

My concern is that “De avril” and “D'abril” may or may not be proper titlecase (as would be needed in this context).  “de Avril“ or “d'Abril” might be required in a calendar context.
Comment 4 Rafal Luzynski 2017-04-13 01:02:51 UTC
I'd like the people from specific language communities come and speak here because I suspect such languages may not exist. :-)  I'm aware that it may take many years and I'm not in a hurry with this bug.

But let's assume that such language exists.  Still it looks to me more than like an incomplete solution or another missing feature rather than a conflict.  It shouldn't be hurting for anyone if an _optional_ feature to capitalize the first letter is added.

If they need something like "de Avril" or "d'Abril" then probably besides "capitalize the first letter" flag they would also need:

- capitalize the first letter of every word,
- capitalize the first letter of the second word (or more precisely: of the main word).

Or maybe it would be easier to:

- fix the bug 15527,
- change all locale data to the proper titlecase so they will be always titlecased by default (like in English),
- those who want lowercase would use "^#" explicitly,
- those who want all uppercase would use "^" etc.

So for example (Catalan):

"%A, %d %B" -> "Dimecres, 12 d’Abril" (or "…D’abril"?)
"%A, %d %^#B" -> "Dimecres, 12 d’abril" (this is probably correct)

This would make this bug report obsolete.

I proposed this idea here: https://sourceware.org/ml/libc-alpha/2016-12/msg00303.html but it seems to me it wasn't liked.
Comment 5 Michael Bauer 2018-05-05 10:33:00 UTC
Irish and Gaelic (and Cree I think) have special cases, where the first letter has to remain lowercase whatever e.g. gd-GB

An t-Samhain (Nov)
21 dhen t-Samhain / 21 DHEN t-SAMHAIN

and Irish (ga-IE) An tSamhain (though I'm unsure to the inflected form, copying in Kevin)
Comment 6 Rafal Luzynski 2018-05-05 18:08:46 UTC
I am aware of this and I am also aware of another language whose name I don't remember which treats a reverse apostrophe: "`" as a letter. Titlecasing in that language means that the second letter is uppercased, e.g., "`abc" -> "`Abc".

This may mean that implementing this feature will be very complex or impossible, maybe we'll instead move to another workaround: all locale data must be titlecased and converted to lowercase/uppercase only on an explicit demand.
Comment 7 Michael Bauer 2018-05-05 18:19:21 UTC
>e.g., "`abc" -> "`Abc".
It's probably not the one you're remembering but Juǀ'hoan has a rule like that that after an initial click, the first Latin letter is capped (even if it's the 3rd symbol if an apostrophe is there as well) i.e. ǂ'Aun or ǁXaǀoba

Wouldn't sentence case be a better default than Title Case? I suspect it would product fewer groans than Title Case, which would look hideous in Gaelic.

Perhaps this is an enhancement for CLDR that someone (I don't have the skills) consider, defining case and relevant exceptions?
Comment 8 Jan Slaný 2019-11-24 13:06:37 UTC
I definitely support this idea. I have come across this while considering capitalization issues in the Czech locale and I think that, from the translators' point of view, providing a title-case flag as Rafal originally suggested is the simplest solution.

I acknowledge that the second proposed approach (i.e. specifying the locale data properly capitalized and using the combination of flags "^#" to convert to lowercase) would better handle the more complex cases present in some languages. However, requiring translators to use e.g. "^#" everywhere except for titles is cumbersome and unnecessary in (possibly many) languages, where the simple "capitalize first letter" approach would be sufficient.

For these reasons, I would like to see the title-case flag implemented in glibc. Being an optional flag, it wouldn't break any existing functionality. The "^#" approach (once fixed) could remain an alternative for languages with more complex capitalization rules.