[PATCH v3] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
Mike FABIAN
mfabian@redhat.com
Thu Jun 25 08:32:06 GMT 2020
Carlos O'Donell <carlos@redhat.com> さんはかきました:
> On 6/23/20 5:30 AM, Mike FABIAN via Libc-alpha wrote:
>> I skipped unassigned characters and ended the range at U+D7FF even
>> though U+D7FC .. U+D7FF are currently unassigned. But because
>> the script now skips the unassigned characters it is OK to end the range
>> for the Hangul Jamo at U+D7FF, if these characters ever happen to get
>> assigned in future, they will probably be Hangul Jamo because of
>> Block.txt.
>>
>> After each Unicode update, manual checking is good anyway, but ending
>> the range in the script at U+D7FF seems more likely to do the right
>> thing already if these characters ever get assigned.
>>
>
> You change the generator but all the files that are generated by the
> generator do not appear regenerated in your patch.
> Can you please post exactly what you plan to commit, that way we can
> review the results?
The patch did contain everything.
> I'm expecting:
> - generator change.
This part is the generator change:
diff --git a/localedata/unicode-gen/utf8_gen.py b/localedata/unicode-gen/utf8_gen.py
index 17b99ee88d..11c906b92f 100755
--- a/localedata/unicode-gen/utf8_gen.py
+++ b/localedata/unicode-gen/utf8_gen.py
@@ -258,7 +258,13 @@ def process_width(outfile, ulines, elines, plines):
if key in width_dict:
del width_dict[key] # default width is 1
for key in list(range(0x1160, 0x1200)):
- width_dict[key] = 0
+ # Hangul jungseong and jongseong:
+ if key in unicode_utils.UNICODE_ATTRIBUTES:
+ width_dict[key] = 0
+ for key in list(range(0xD7B0, 0xD800)):
+ # Hangul jungseong and jongseong:
+ if key in unicode_utils.UNICODE_ATTRIBUTES:
+ width_dict[key] = 0
for key in list(range(0x3248, 0x3250)):
# These are “A” which means we can decide whether to treat them
# as “W” or “N” based on context:
@@ -327,6 +333,7 @@ if __name__ == "__main__":
help='The Unicode version of the input files used.')
ARGS = PARSER.parse_args()
+ unicode_utils.fill_attributes(ARGS.unicode_data_file)
with open(ARGS.unicode_data_file, mode='r') as UNIDATA_FILE:
UNICODE_DATA_LINES = UNIDATA_FILE.readlines()
with open(ARGS.east_asian_with_file, mode='r') as EAST_ASIAN_WIDTH_FILE:
> - all files updated with date changes.
And the UTF-8 file in charmaps is the only file which changed, only in
the WIDTH section:
diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8
index 14c5d4fa33..8cce47cd97 100644
--- a/localedata/charmaps/UTF-8
+++ b/localedata/charmaps/UTF-8
@@ -48920,6 +48920,8 @@ WIDTH
<UABE8> 0
<UABED> 0
<UAC00>...<UD7A3> 2
+<UD7B0>...<UD7C6> 0
+<UD7CB>...<UD7FB> 0
<UF900>...<UFA6D> 2
<UFA70>...<UFAD9> 2
<UFB1E> 0
> - some files have more than date changes.
No other files are changed and the UTF-8 file in charmaps does not
contain a generation date.
> This way we keep the generated files consistent.
--
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。
More information about the Libc-alpha
mailing list