This is the mail archive of the
mailing list for the glibc project.
Re: A per-user or per-application ld.so.cache?
- From: Florian Weimer <fweimer at redhat dot com>
- To: Ben Woodard <woodard at redhat dot com>, "Carlos O'Donell" <carlos at redhat dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Tue, 9 Feb 2016 08:18:05 +0100
- Subject: Re: A per-user or per-application ld.so.cache?
- Authentication-results: sourceware.org; auth=none
- References: <56B8E105 dot 8030906 at redhat dot com> <56B8E810 dot 1040609 at redhat dot com> <56B8F860 dot 6060707 at redhat dot com> <6C935651-66F1-411B-AAC7-59B8E4383EC4 at redhat dot com>
On 02/08/2016 11:29 PM, Ben Woodard wrote:
> I just talked to one of the developers to get a good sense of the current problem.
> The sum of the on-disk file ELF files including debuginfo for one app that we looked at is around 3GB but when we just look at the text in all the ELF files it is 100-200MB depending on architecture spread across about 1400 DSOs.
This means that copying the text files together into a single file would
> Except for the fact that the process is starting on literally thousands of nodes simultaneously and its libraries are scattered around about 15 non-system project directories. This leads to a phenomenal number of NFS operations as the compute nodes search through 20 or so directories for all their components. That brings even very powerful NFS servers to their knees.
Okay, this is the critical bit which was missing so far. I think Linux
has pretty good caching for lookup failures, so the whole performance
issue was a bit puzzling. If the whole thing runs on many nodes against
storage which lacks such caching, then I can see that this could turn
into a problem.
The main question is: Will the storage be able to cope with millions of
file opens if they magically pick the right file name (avoiding ENOENT)?
If not, the only viable optimization seems to be the single file approach.
How will the storage react to parallel read operations on those 15
directories from many nodes?
I'm worried a bit that this turns into a request to tune ld.so to very
peculiar storage stack behavior.
Depending on what they do with Python, the Python module importer will
still cause a phenomenal amount of ENOENT traffic, and there is nothing
we can do about that because it's not related to dlopen.