10948 – libX11 in a UTF-8 locale doesn't behave well if we don't have CJK fonts installed

Bug 10948 - libX11 in a UTF-8 locale doesn't behave well if we don't have CJK fonts installed

Summary: libX11 in a UTF-8 locale doesn't behave well if we don't have CJK fonts insta...

Status:	RESOLVED FIXED

Alias:	None

Product:	cygwin
Classification:	Unclassified
Component:	Cygwin/X (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Yaakov Selkowitz

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-11-12 22:56 UTC by Jon Turney
Modified:	2013-12-16 13:39 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Don't try so hard to find a matching font with the given encoding (637 bytes, patch) 2010-08-09 19:57 UTC, Jon Turney	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jon Turney 2009-11-12 22:56:01 UTC

libX11 XCreateFontSet() will search for fonts with all the encodings given in
/usr/share/X11/locale/en_US.UTF-8/XLC_LOCALE to populate the fontset.

When there are no fonts available for some encodings (typically when CJK fonts,
which aren't part of fonts-misc-misc, aren't installed), this leads to a lot of
CPU being burnt and a noticable startup delay, as
http://cgit.freedesktop.org/xorg/lib/libX11/tree/modules/om/generic/omGeneric.c,
around line 845, queries the X server up to 12 times for each missing encoding...

For this FontSet to have any benefit, Xutf8DrawString() needs to be capable of
determining that a glyph is not available in the ISO-10646-1 font in the
fontset, but is available in another in a diferently encoded font in the fontset
and perform the appropriate subsitution, and that situation must actually
obtain.  I don't know about the first point, the second seems quite likely as
ISO-10646-1 fonts often don't contain CJK glyphs.

Perhaps we should arrange for font-isas-misc, font-jis-misc and font-daewoo-misc
to be installed if font-misc-misc is.

Comment 1 Yaakov Selkowitz 2009-11-15 02:44:56 UTC

Sigh.  Why does it seem that C.UTF-8 causes another problem on a daily basis?

But back to the question at hand, what about the comment on line 845: 

/* This may not be needed anymore as XListFonts() takes care of this */

Could this be handled differently that avoids all the round trips?

Comment 2 Yaakov Selkowitz 2009-12-14 04:37:49 UTC

In bug 9839 we discussed making a bunch of fonts dependencies of font-alias.  
Would that make more sense?

Comment 3 Jon Turney 2009-12-16 21:33:05 UTC

I think the suggestion in to set the default resources for xterm so it doesn't
do this stupid thing (as so starts up faster) is probably a good one

http://sourceware.org/ml/cygwin-xfree/2009-11/msg00175.html

Comment 4 Yaakov Selkowitz 2009-12-17 06:10:00 UTC

(In reply to comment #3)
> I think the suggestion in to set the default resources for xterm so it doesn't
> do this stupid thing (as so starts up faster) is probably a good one

But this issue affects other Xt/Xaw programs as well, so bandaiding xterm isn't 
really the correct solution.

I think I'm going to take this to xorg-devel.

Comment 5 Yaakov Selkowitz 2010-02-26 05:22:56 UTC

According to winsup/doc/new-features.sgml, the C/POSIX locale is returning to 
ASCII in 1.7.2.  That should mean that only those who have explicitly set a UTF-8 
locale/codeset should be affected by this.  If so, is documenting the need to 
install the CJK fonts enough?

Comment 6 Yaakov Selkowitz 2010-03-24 19:17:12 UTC

Correction: in 1.7.2 (released today), if LC_*/LANG are unset, the default locale 
is still C.UTF-8, but 'C' is now C.ASCII (again) and most locales are not UTF-8 by 
default (e.g. 'en_US' uses charset ISO-8859-1 unless otherwise specified).

Since users are recommended to set a proper LC_*/LANG with 1.7, is it enough to 
document that they should do so and *if* they select a UTF-8 charset then they 
will need to install CJK fonts?

Comment 7 Jon Turney 2010-03-24 21:51:54 UTC

(In reply to comment #6)
> is it enough to 
> document that they should do so and *if* they select a UTF-8 charset then they 
> will need to install CJK fonts?

It's certainly what we should do until and unless a better solution occurs.

Some numbers to quantify the scope of the problem:

uninstalling font-isas-misc, font-jis-misc and font-daewoo-misc, the time for
xterm -e true goes up from 1.5s to 3.5s
uninstalling font-isas-misc, font-jis-misc and font-daewoo-misc, the time to
start twm goes up from about 1s to 30s !

Comment 8 Jon Turney 2010-03-29 15:45:05 UTC

I'm actually kind of tempted to remove that whole while loop from
parse_omit_name(), which is replacing elements of the XLFD with * in a desperate
attempt to get a match, which avails us nothing if there is no font with that
encoding...

This is kind of pointless as uses of XCreateFontSet() (at least in Xterm)
specify the wildcard font at the end of the base_font_name_list, to get this
kind of fallback behaviour anyhow.

Alternatively, I guess the logic could be made more sophisticated, e.g. if the
XLFD "-*-*-*-*-*-*-*-*-*-*-*-*-<required encoding>" doesn't match any font, we
can skip the while loop, as it's not going to match anything.

Comment 9 Jon Turney 2010-08-03 13:40:30 UTC

(In reply to comment #8)
> Alternatively, I guess the logic could be made more sophisticated, e.g. if the
> XLFD "-*-*-*-*-*-*-*-*-*-*-*-*-<required encoding>" doesn't match any font, we
> can skip the while loop, as it's not going to match anything.

The other possible optimization to parse_omit_name() is to notice when the XLFD
component is already a '*', so there is no point requerying the server, as
nothing has changed.

Comment 10 Jon Turney 2010-08-09 19:57:06 UTC

Created attachment 4920 [details]
Don't try so hard to find a matching font with the given encoding

Looking at this code again, I'm not so sure the above correctly states what it
tries to do (although this is odd because I recall my observations being based
on some dynamic inspection).

In any case, it seems that we repeatedly add '*-' a prefix to the name if it
has less than 12 fields after removing the encoding, which I'm pretty sure is
unneccessary since a * can match more than one component anyhow.

So the attached patch should be safe, and avoids making the server
unnecessarily and repeatedly search the font list making application startup
quicker in the case where no font of the desired encoding exists.

Comment 11 Jon Turney 2010-10-28 19:03:05 UTC

When packaging libX11 1.4, please include this patch.

Comment 12 Yaakov Selkowitz 2011-03-09 08:50:02 UTC

(In reply to comment #11)
> When packaging libX11 1.4, please include this patch.

Done.  Any remaining issues in this bug?

Comment 13 Yaakov Selkowitz 2012-01-01 04:15:23 UTC

Anything else here?

Comment 14 Jon Turney 2012-01-06 15:58:25 UTC

This patch still needs upstreaming, but bug can be closed.

Comment 15 Jon Turney 2013-12-16 13:39:37 UTC

(In reply to Jon TURNEY from comment #14)
> This patch still needs upstreaming, but bug can be closed.

Applied upstream, will be in libX11 >1.6.3