This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

fewer open() calls done by gettext()


Hi Ulrich,

Since the beginning, gettext()'s lookup of message catalogs has
searched the paths
  $LOCALEDIR/$ll_$CC.$CHARSET/LC_MESSAGES/$domain.mo
  $LOCALEDIR/$ll_$CC/LC_MESSAGES/$domain.mo
  $LOCALEDIR/$ll.$CHARSET/LC_MESSAGES/$domain.mo
  $LOCALEDIR/$ll/LC_MESSAGES/$domain.mo
if the locale is specified as $ll_$CC.$CHARSET.

In a typical program (attached below), this leads to 6 system calls,
and the .mo file is usually only found at the last of these 6 calls:

$ strace ./prog 2>&1 | grep ^open | grep prog.mo
open("/tmp/./fr_FR.UTF-8/LC_MESSAGES/prog.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/tmp/./fr_FR.utf8/LC_MESSAGES/prog.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/tmp/./fr_FR/LC_MESSAGES/prog.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/tmp/./fr.UTF-8/LC_MESSAGES/prog.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/tmp/./fr.utf8/LC_MESSAGES/prog.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/tmp/./fr/LC_MESSAGES/prog.mo", O_RDONLY) = -1 ENOENT (No such file or directory)

I would suggest to reduce this to 2 calls:

$ strace ./prog 2>&1 | grep ^open | grep prog.mo
open("/tmp/./fr_FR/LC_MESSAGES/prog.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/tmp/./fr/LC_MESSAGES/prog.mo", O_RDONLY) = -1 ENOENT (No such file or directory)

Rationale:

The use-case of storing different .mo files in
  fr/LC_MESSAGES/prog.mo and fr.UTF-8/LC_MESSAGES/prog.mo
or
  fr/LC_MESSAGES/prog.mo and fr.ISO-8859-1/LC_MESSAGES/prog.mo
or
  fr.UTF-8/LC_MESSAGES/prog.mo and fr.ISO-8859-1/LC_MESSAGES/prog.mo
is when translators would want to use different kinds of characters
(quotation characters or so), i.e. have one PO file for the UTF-8
locale and a different PO file for the more restricted character set.
Or when Japanese people did not trust the conversion between JISX character
sets and Unicode and therefore wanted to maintain a separate PO file
for EUC-JP.

But
  1. Translators never did this.
  2. In the future, translators will even less need it than in the past.
     Nowadays most PO files (even Japanese ones) are submitted in UTF-8
     encodings, and most users are in UTF-8 locales. It will therefore
     never make sense any more to have a PO file specialized for a non-
     Unicode locale charset.

Do you think this optimization is worth doing?

If this is OK with you, I can prepare the patch of intl/l10nflist.c
(of course, taking care to not modify the behaviour of locale/findlocale.c).

Bruno


How to reproduce:
$ gcc -Wall prog.c -o prog
$ strace ./prog 2>&1 | grep ^open | grep prog.mo

============================== prog.c ================================
#include <libintl.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

int main ()
{
  int n = 2;

  setenv ("LC_ALL", "fr_FR.UTF-8", 1);
  if (setlocale (LC_ALL, "") == NULL)
    /* Couldn't set locale.  */
    exit (77);

  textdomain ("prog");
  bindtextdomain ("prog", ".");

  printf (gettext ("'Your command, please?', asked the waiter."));
  printf ("\n");

  printf (ngettext ("a piece of cake", "%d pieces of cake", n), n);
  printf ("\n");

  printf (gettext ("%s is replaced by %s."), "FF", "EUR");
  printf ("\n");

  exit (0);
}
======================================================================


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]