This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: An idea on library loading and symbol lookup


Robert Schweikert <rjschwei at abaqus dot com> writes:

> Hi all,
>
> I have been trying to get smart about shared library loading on Linux
> and wanted to float an idea to get an understanding of the related
> issues and if the idea is feasible at all. From reading Ulrich's paper
> (http://people.redhat.com/drepper/dsohowto.pdf) it appears that the good
> people that came up with the C++ ABI standard threw performance concerns
> out the window when coming up with the name mangling convention.
>
> namespace - class name - function name - arguments
>
> Since namespace - class name will be the same quite often and in many
> cases even the function name will be the same between 2 symbols.
>
> Taking for example a symbol from a template function which looks as
> follows:
>
> _ZNK7cow_COWI16bas_ShortcutImplI23kamC_BaselineCorrectionE11cow_VirtualIS2_EE8ConstGetEv
>
> and then another function from the same template:
>
> _ZNK7cow_COWI16bas_ShortcutImplI23kamC_BaselineCorrectionE11cow_VirtualIS2_EE6IsNullEv
>
> Now if you put these two symbols next to each other and consider that
> the loader has to walk the string and do a character comparison for each
> char, at leat that is my understanding of what's happening, you can see 
> and count that there are 77 characters that are equal. Thus, the loader
> compares 77 characters before discarding the "wrong" symbol. Now you do
> that a few times and I think there should not be an argument on why it
> takes quite a while to load shared objects. If I would have a namespace
> then the number of characters that are equal in these 2 symbols would be
> even bigger since the namespace name comes first in the mangled name.

Symbol lookup is done via a hash table.  Only entries added to the
same hash bucket will be compared character by character.  The
expectation is that such strcmps happen quite seldom

> Now the idea:
>
> Lets say I do a reverse lookup (I'm sure I'm not the first one to come
> up with this idea but I couldn't find any discussions on it), i.e. I
> start at the back of the string then I only compare 2 characters before
> I discard the symbol. I just improved my lookup performance by a
> gazillion. 

You need to know the end of the string.  I don't think we have that
information directly available and therefore you would first need to
scan to the end - and if you scan, you can do the compare at nearly
the same time.

> I know there are other issues to deal with such as symbol versioning
> etc. and these are the issues I'd like to understand. Overall I think 
> there's got to be a better way then comparing all these characters from
> the beginning.

Hashing ;-)

Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj at suse dot de
   private aj at arthur dot inka dot de
    http://www.suse.de/~aj


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]