Bug 31859 - Transliteration rules with two input characters like "ḌḌ" "DDH" do not work.
Summary: Transliteration rules with two input characters like "ḌḌ" "DDH" do not work.
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: locale (show other bugs)
Version: 2.39
: P2 normal
Target Milestone: 2.41
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-06-07 13:44 UTC by Mike FABIAN
Modified: 2024-08-16 12:57 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
carlos: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike FABIAN 2024-06-07 13:44:01 UTC
See: https://sourceware.org/pipermail/libc-alpha/2024-May/156769.html

If transliteration rules like this:

translit_start
"ḌḌ" "DDH"
"ḍḍ" "ddh"
"Ḍḍ" "Ddh"
translit_en

are used in the LC_CTYPE section of a locale, they don’t work.

These are in our new scn_IT locale, but commented out for the moment because they do not work.

If localedata/locales/translit_combining is not changed, the rules for the single characters Ḍ U+01E0C and ḍ U+1E0D from translit_combining did always win when I tested, the longer input sequences "ḌḌ", "ḍḍ", and "Ḍḍ" were never used.

But when I commented out these short single characters transliteration rules in translit_combining like this:

diff --git a/localedata/locales/translit_combining b/localedata/locales/translit_combining
index ce2f19eee1..6f879d9caf 100644
--- a/localedata/locales/translit_combining
+++ b/localedata/locales/translit_combining
@@ -2486,9 +2486,9 @@ translit_start
 % LATIN SMALL LETTER D WITH DOT ABOVE
 <U1E0B> <U0064>
 % LATIN CAPITAL LETTER D WITH DOT BELOW
-<U1E0C> <U0044>
+%<U1E0C> <U0044>
 % LATIN SMALL LETTER D WITH DOT BELOW
-<U1E0D> <U0064>
+%<U1E0D> <U0064>
 % LATIN CAPITAL LETTER D WITH LINE BELOW
 <U1E0E> <U0044>
 % LAT


then

bash-5.2# echo 'ḌḌ'|iconv -f UTF-8 -t ASCII//translit
^C
bash-5.2#

uses 100% CPU and never stops until I stop it with Control-C.
Comment 2 Carlos O'Donell 2024-08-16 12:41:04 UTC
commit 1b0a2062c8938c7333cd118d85d9976c4e7c92af
Author: Andreas Schwab <schwab@suse.de>
Date:   Mon Jun 10 12:19:17 2024 +0200

    iconv: Fix matching of multi-character transliterations (bug 31859)
    
    Only return __GCONV_INCOMPLETE_INPUT for a partial match when the end of
    the input buffer is reached.  Otherwise it is a non-match, and other
    patterns should be tried.
Comment 3 Carlos O'Donell 2024-08-16 12:50:28 UTC
In general it might have been possible to cause service breakage by building a custom locale with these transliterations, enabling the locale on a server, and then attempting to process these conversions with the locale enabled. However, since glibc didn't ship such a locale, this would be a failure in testing for the developer using the custom locale. There is no actual, concrete, non-synthetic scenario reported here, so I'm marking this security- for the hang in the converter.