29506 – UTF-8 HANGUL SYLLABLE bugs

Bug 29506 - UTF-8 HANGUL SYLLABLE bugs

Summary: UTF-8 HANGUL SYLLABLE bugs

Status:	RESOLVED FIXED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	localedata (show other bugs)
Version:	2.38

Importance:	P2 normal
Target Milestone:	2.39
Assignee:	Mike FABIAN

URL:
Keywords:

Depends on:
Blocks:

Reported:	2022-08-19 13:54 UTC by Jakub Jelinek
Modified:	2024-01-14 17:02 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jakub Jelinek 2022-08-19 13:54:00 UTC

localedata/unicode-gen/utf8_gen.py
lists the 6th element of JAMO_FINAL_SHORT_NAME as NI, but according to Unicode (all I've checked, Unicode claims names are immutable) it should be NJ.
See https://www.unicode.org/Public/4.1.0/ucd/Jamo.txt
11AC; NJ  # HANGUL JONGSEONG NIEUN-CIEUC
or
https://www.unicode.org/Public/14.0.0/ucd/Jamo.txt
11AC; NJ  # HANGUL JONGSEONG NIEUN-CIEUC

This means that UTF-8 contains entries like:
<UAD8D>     /xea/xb6/x8d HANGUL SYLLABLE GWEONI
that my Unicode name to codepoint function can't recognize, while
it can map "HANGUL SYLLABLE GWEONJ" to U+AD8D.

Comment 1 Sourceware Commits 2024-01-14 17:00:54 UTC

The master branch has been updated by Mike Fabian <mfabian@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=064c708c78cc2a6b5802dce73108fc0c1c6bfc80

commit 064c708c78cc2a6b5802dce73108fc0c1c6bfc80
Author: Mike FABIAN <mfabian@redhat.com>
Date:   Sun Jan 14 11:42:28 2024 +0100

    localedata/unicode-gen/utf8_gen.py: fix Hangul syllable name
    
    Resolves: BZ # 29506

Comment 2 Mike FABIAN 2024-01-14 17:02:04 UTC

Fixed in glibc master.