Bug 26383 - bind_textdomain_codeset doesn't accept //TRANSLIT anymore
Summary: bind_textdomain_codeset doesn't accept //TRANSLIT anymore
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: locale (show other bugs)
Version: 2.32
: P2 normal
Target Milestone: 2.33
Assignee: Arjun Shankar
URL: https://bugs.debian.org/968260
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-13 13:08 UTC by Aurelien Jarno
Modified: 2020-10-20 15:55 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aurelien Jarno 2020-08-13 13:08:50 UTC
bind_textdomain_codeset used to accept a charset ending with //TRANSLIT, as described in the documentation:
"The bind_textdomain_codeset function can be used to specify the output character set for message catalogs for domain domainname. The codeset argument must be a valid codeset name which can be used for the iconv_open function, or a null pointer."

However this is not the case anymore in glibc 2.32, and more precisely since commit 91927b7c76437db860cd86a7714476b56bb39d07. This can be seen with the following testcase:

#include <libintl.h>
#include <locale.h>
#include <stdio.h>

void main()
{
        setlocale(LC_ALL, "fr_FR.UTF-8");

        bind_textdomain_codeset("libc", "utf-8//TRANSLIT");
        printf("translation of NAME into French: %s\n", dgettext("libc", "NAME"));
}

With glibc 2.31, it prints:
translation of NAME into French: NOM

With glibc 2.32, it prints:
translation of NAME into French: NAME
Comment 1 Arjun Shankar 2020-08-25 10:01:22 UTC
Thanks for triaging this, Aurelien. I'll look into it.
Comment 2 Carlos O'Donell 2020-08-25 19:45:46 UTC
(In reply to Aurelien Jarno from comment #0)
>         bind_textdomain_codeset("libc", "utf-8//TRANSLIT");

The specifier is "STANDARD/CHARSET/ERROR-HANDLER" e.g. "ISO-10646/UTF-8/TRANSLIT". This needs fixing in the man pages to spell this out.

We have aliases for this though e.g. "UTF-8//" and "UTF8//" (case insensitive as part of normalization). So that should work.

We need a regression test for this, and what you've provided is probably good enough.
Comment 3 Arjun Shankar 2020-10-15 11:31:02 UTC
Fixed in master via 7d4ec75e111291851620c6aa2c4460647b7fd50d and will make it to glibc-2.33.