26383 – bind_textdomain_codeset doesn't accept //TRANSLIT anymore

Bug 26383 - bind_textdomain_codeset doesn't accept //TRANSLIT anymore

Summary: bind_textdomain_codeset doesn't accept //TRANSLIT anymore

Status:	RESOLVED FIXED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	locale (show other bugs)
Version:	2.32

Importance:	P2 normal
Target Milestone:	2.33
Assignee:	Arjun Shankar

URL:	https://bugs.debian.org/968260
Keywords:

Depends on:
Blocks:

Reported:	2020-08-13 13:08 UTC by Aurelien Jarno
Modified:	2020-10-20 15:55 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Aurelien Jarno 2020-08-13 13:08:50 UTC

bind_textdomain_codeset used to accept a charset ending with //TRANSLIT, as described in the documentation:
"The bind_textdomain_codeset function can be used to specify the output character set for message catalogs for domain domainname. The codeset argument must be a valid codeset name which can be used for the iconv_open function, or a null pointer."

However this is not the case anymore in glibc 2.32, and more precisely since commit 91927b7c76437db860cd86a7714476b56bb39d07. This can be seen with the following testcase:

#include <libintl.h>
#include <locale.h>
#include <stdio.h>

void main()
{
        setlocale(LC_ALL, "fr_FR.UTF-8");

        bind_textdomain_codeset("libc", "utf-8//TRANSLIT");
        printf("translation of NAME into French: %s\n", dgettext("libc", "NAME"));
}

With glibc 2.31, it prints:
translation of NAME into French: NOM

With glibc 2.32, it prints:
translation of NAME into French: NAME

Comment 1 Arjun Shankar 2020-08-25 10:01:22 UTC

Thanks for triaging this, Aurelien. I'll look into it.

Comment 2 Carlos O'Donell 2020-08-25 19:45:46 UTC

(In reply to Aurelien Jarno from comment #0)
>         bind_textdomain_codeset("libc", "utf-8//TRANSLIT");

The specifier is "STANDARD/CHARSET/ERROR-HANDLER" e.g. "ISO-10646/UTF-8/TRANSLIT". This needs fixing in the man pages to spell this out.

We have aliases for this though e.g. "UTF-8//" and "UTF8//" (case insensitive as part of normalization). So that should work.

We need a regression test for this, and what you've provided is probably good enough.

Comment 3 Arjun Shankar 2020-10-15 11:31:02 UTC

Fixed in master via 7d4ec75e111291851620c6aa2c4460647b7fd50d and will make it to glibc-2.33.