This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Fwd: Interface to resolve SONAMES, ld.so.cache format


Hello, Carlos!!

Your response is... wonderful! I did not remotely expect anything like
that. So thank you very much for it. You've went past what I needed to
know and explained what I was curious to know. I'm on the verge of
tears, really. No one has helped me at all until now, not even who
should have done it. :))


> This is the right place to ask questions.


Glad to hear that :) I was kind of lost.


> The answer depends on what you want to know.
>
> (1) Know all mappings, but have no semantic information about them.
>
> If you want to know account for all mappings you need to provide your
> own implementation of mmap, log those calls, and call the real mmap
> under the hood. You would know all of the mappings but have no idea
> what they were for.
>
> (2) Know all shared mappings with semantic information about them.
>
> There is a probe-based debugger interface to the runtime dynamic
> loader described in elf/rtld-debugger-interface.txt. That interface
> provides a debug agent with all of the information about the modules
> as they are being loaded. However you have to be "watching" from the
> very start of the application to catch all of the events for all
> loaded objects.
>
> (3) Know some shared mappings (missing alternate namespaces) using _r_debug.
>
> You can know some of the mappings by using the classic _r_debug
> rendevous structure in the dynamic loader to walk the list of loaded
> shared objects. You can cast _r_debug to the full structure used by
> the loader if you don't care about being portable to any other version
> of glibc or support different structures as glibc changes version
> (using the right one for the right version).


At times like this I realise I've got a lot to learn, I'll only say that :)


> The format of the cache on a given machine is going to be constant. We
> haven't changed the format in years. You should not have to worry
> about the old pre-glibc-2.2 format.


I expected that, but the code looked a bit daunting, and there could
be more to it than meets the eye. My eye, at least.


> To understand the format you must read and understand all of
> elf/dl-cache.c. There is no public documentation for this, but I would
> be very grateful if you want to document your findings on the
> community wiki (https://sourceware.org/glibc/wiki/).
>
>
> To edit the wiki you have to (a) register and then (b) get someone to
> vouch for you and add you to EditorGroup, just ask on #glibc and
> someone should help you. This process prevents 100% of spam because
> you have to talk to a real human.


I can not promise it, but if it's within my reach I'd like to do at
least that much to repay you for your help.


> It depends on what you mean by clean. I described 3 possible ways. The
> easiest solution is using the dynamic loader in trace mode since it's
> the most accurate reflection of reality and doesn't duplicate any
> code.


Indeed, and really well described, I have to say. Thanks again.


> We have been trying to collect use cases for a new tooling interface
> library that would allow introspection into the running
> process/threads etc.
> https://sourceware.org/glibc/wiki/Tools%20Interface%20NG
>
> Your use case might be useful, please add it :-)


I'm not really sure to be able to provide a solid reasoning behind my
needs. It was a dessign decision that I took since I was told to
filter memory accesses to certain locations in the VA space of the
process to cope with some inconsistencies caused by using
interposition via LD_PRELOAD. My project is completely chaotic, and
I'm not proud to admit it. Still, it's the most I can manage since I
lack proper guidance... I should have chosen to implement an Android
app instead of my "monitoring the memory accesses of a process and
implementing page replacement algoriths [in userspace]"... .


> That depends on what you mean by clean. You have to (a) follow the
> standard lookup rules (b) follow the lookup paths in ld.so.conf and
> (c) follow the runtime lookup rules if the application happens to
> preload a library via LD_PRELOAD or (e) open a library via dlopen.
> Otherwise a static analysis of DT_SONAME lookups yields only the
> static results you would expect. You need to do this lookup yourself
> by parsing ELF files.


(a) and (b) and seem pretty straightforward, but I did not really
consider options (c) and (e). Mostly because I don't think I'll have
te time to address all the issues that could arise.


> You have to write your own code to parse ld.so.cache or copy code from
> elf/* with the license and copyright being applied to your project now
> for including that code. The code for parsing ld.so.cache is mostly in
> elf/dl-cache.c. I would just parse the ld.so.conf file and then parse
> the ELF files without relying on the cache, but that's your design
> choice. Simplest of all is to process the output of ld.so in trace
> mode.


The code license could be a problem since, IIRC, my University wants
to have the rights of the work. Otherwise it could have been a great
option to avoid having to reinvent the wheel. I'll follow your advice
and parse the ELFs in the locations pointed by ld.so.conf.


> It is safe to parse ld.so.cache as long as you follow the rules for
> locking the configuration and cache files so you don't see partial
> updates if another process or upgrade is running that modifies the
> conf or cache files.


Yes, consistency was another reason I did not want to use this
approach, so I think I'll get away from it.


> Given that this is an undergraduate assignment I'd just parse the
> output of ld.so in trace mode e.g. ldd.


After reading your explanations, I also think that's the way to go if
I want to do the task without spending a whole week on it.

Regards


2014-04-27 18:25 GMT+02:00 Carlos O'Donell <carlos@systemhalted.org>:
>
> On Sun, Apr 27, 2014 at 5:12 AM, Ãlvaro AcciÃn Montes
> <alvaroaccionmontes@gmail.com> wrote:
> > Hello,
>
> Hello Alvaro!
>
> > This is a question I asked in the #glibc channel, but someone kindly pointed
> > me to ask it here. I'll try to be as clear as I can with my explanation,
> > provided I'm not really that fluent in either English, or explaining myself.
>
> This is the right place to ask questions.
>
> > I'm working in a academic project, that has to do with monitoring
> > executables in terms of address space usage, etc. Similar to what Valgrind
> > does, but I was not the one who proposed it :)
> >
> > For a series of reasons, I need to know what objects are going to be mapped
> > into the process adress space, similar to what ldd does, but it needs to be
> > programmatically determined (not a requirement, but it would cost me
> > points). I've of course had a look to ldd's source, but it uses enviroment
> > variables to tell the loader to print "debug" information. I was wondering
> > if there was a well defined interface to obtain that information without
> > having to rely in the method ldd uses, but it seems the simplest answer is
> > the correct one, and there is no such method.
>
> The answer depends on what you want to know.
>
> (1) Know all mappings, but have no semantic information about them.
>
> If you want to know account for all mappings you need to provide your
> own implementation of mmap, log those calls, and call the real mmap
> under the hood. You would know all of the mappings but have no idea
> what they were for.
>
> (2) Know all shared mappings with semantic information about them.
>
> There is a probe-based debugger interface to the runtime dynamic
> loader described in elf/rtld-debugger-interface.txt. That interface
> provides a debug agent with all of the information about the modules
> as they are being loaded. However you have to be "watching" from the
> very start of the application to catch all of the events for all
> loaded objects.
>
> (3) Know some shared mappings (missing alternate namespaces) using _r_debug.
>
> You can know some of the mappings by using the classic _r_debug
> rendevous structure in the dynamic loader to walk the list of loaded
> shared objects. You can cast _r_debug to the full structure used by
> the loader if you don't care about being portable to any other version
> of glibc or support different structures as glibc changes version
> (using the right one for the right version).
>
> > So my current approach is as follows. I get a list of required dependencies
> > by iterating the .dynamic section of the executable and getting all the
> > entries that are tagged as DT_NEEDED. The next step would be to find the
> > shared objects with SONAME's matching the previous list. At this point, I
> > was left wondering if there was a way to obtain them other than parsing
> > ld.so.cache, but I had not luck. Correct me if I'm wrong, but it seems that
> > the linker is a completely isolated entity, and the only functionality
> > exported is via ldopen() function family.
>
> You do not need to parse ld.so.cache. You *do* need to parse
> ld.so.conf in order to determine where the dynamic linker will search
> for shared libraries. You also need to read each ELF file and look for
> DT_SONAME to determine the soname of that shared library. You then
> need to follow the normal ELF rules and keep recursively finding all
> the DSOs that would be needed to form the final application image.
>
> > Now, I need to parse ld.so.cache, but it seems I'm not able to figure it's
> > format. There are 2 versions, and there is a note that says that for
> > Glibc2.2 there is a new format added in a compatible way, but I'm inclined
> > to think that a normal file would rely only in one of those, even if the
> > second is kept for compatibility's sake.
>
> The format of the cache on a given machine is going to be constant. We
> haven't changed the format in years. You should not have to worry
> about the old pre-glibc-2.2 format.
>
> To understand the format you must read and understand all of
> elf/dl-cache.c. There is no public documentation for this, but I would
> be very grateful if you want to document your findings on the
> community wiki (https://sourceware.org/glibc/wiki/).
>
> To edit the wiki you have to (a) register and then (b) get someone to
> vouch for you and add you to EditorGroup, just ask on #glibc and
> someone should help you. This process prevents 100% of spam because
> you have to talk to a real human.
>
> > So now my questions are (answering the outermost would render the remaining
> > ones irrelevant):
> >
> > - Is there a clean way to get the functionality of ldd programmatically.
> > I.E. without having to call ldd with pipes?
>
> It depends on what you mean by clean. I described 3 possible ways. The
> easiest solution is using the dynamic loader in trace mode since it's
> the most accurate reflection of reality and doesn't duplicate any
> code.
>
> We have been trying to collect use cases for a new tooling interface
> library that would allow introspection into the running
> process/threads etc.
> https://sourceware.org/glibc/wiki/Tools%20Interface%20NG
>
> Your use case might be useful, please add it :-)
>
> > - Is there a clean way to resolve SONAMES with the corresponding shared
> > object in the system
>
> That depends on what you mean by clean. You have to (a) follow the
> standard lookup rules (b) follow the lookup paths in ld.so.conf and
> (c) follow the runtime lookup rules if the application happens to
> preload a library via LD_PRELOAD or (e) open a library via dlopen.
> Otherwise a static analysis of DT_SONAME lookups yields only the
> static results you would expect. You need to do this lookup yourself
> by parsing ELF files.
>
> > - How can I parse ld.so.cache? More precissely, is it safe? Is it worth
> > doing? What's the format it uses in glic-2.19 (that's not a real issue, it
> > can be 2.XX)? There's a macro in glibc's source code that ues a binary
> > search IIRC, but I'm not limited to that (in the sense I can use some less
> > efficient alternatives if I'm able to figure the format in order to progress
> > faster*)
>
> You have to write your own code to parse ld.so.cache or copy code from
> elf/* with the license and copyright being applied to your project now
> for including that code. The code for parsing ld.so.cache is mostly in
> elf/dl-cache.c. I would just parse the ld.so.conf file and then parse
> the ELF files without relying on the cache, but that's your design
> choice. Simplest of all is to process the output of ld.so in trace
> mode.
>
> It is safe to parse ld.so.cache as long as you follow the rules for
> locking the configuration and cache files so you don't see partial
> updates if another process or upgrade is running that modifies the
> conf or cache files.
>
> > Thank you very much for your time, and I'm really sorry for the lengthy
> > email.
>
> No worries.
>
> > * For an undergraduate assignment I think this is more than enough,
> > considering this mail makes for a tenth of the whole thing.
>
> Given that this is an undergraduate assignment I'd just parse the
> output of ld.so in trace mode e.g. ldd.
>
> Cheers,
> Carlos.


2014-04-27 18:25 GMT+02:00 Carlos O'Donell <carlos@systemhalted.org>:
> On Sun, Apr 27, 2014 at 5:12 AM, Ãlvaro AcciÃn Montes
> <alvaroaccionmontes@gmail.com> wrote:
>> Hello,
>
> Hello Alvaro!
>
>> This is a question I asked in the #glibc channel, but someone kindly pointed
>> me to ask it here. I'll try to be as clear as I can with my explanation,
>> provided I'm not really that fluent in either English, or explaining myself.
>
> This is the right place to ask questions.
>
>> I'm working in a academic project, that has to do with monitoring
>> executables in terms of address space usage, etc. Similar to what Valgrind
>> does, but I was not the one who proposed it :)
>>
>> For a series of reasons, I need to know what objects are going to be mapped
>> into the process adress space, similar to what ldd does, but it needs to be
>> programmatically determined (not a requirement, but it would cost me
>> points). I've of course had a look to ldd's source, but it uses enviroment
>> variables to tell the loader to print "debug" information. I was wondering
>> if there was a well defined interface to obtain that information without
>> having to rely in the method ldd uses, but it seems the simplest answer is
>> the correct one, and there is no such method.
>
> The answer depends on what you want to know.
>
> (1) Know all mappings, but have no semantic information about them.
>
> If you want to know account for all mappings you need to provide your
> own implementation of mmap, log those calls, and call the real mmap
> under the hood. You would know all of the mappings but have no idea
> what they were for.
>
> (2) Know all shared mappings with semantic information about them.
>
> There is a probe-based debugger interface to the runtime dynamic
> loader described in elf/rtld-debugger-interface.txt. That interface
> provides a debug agent with all of the information about the modules
> as they are being loaded. However you have to be "watching" from the
> very start of the application to catch all of the events for all
> loaded objects.
>
> (3) Know some shared mappings (missing alternate namespaces) using _r_debug.
>
> You can know some of the mappings by using the classic _r_debug
> rendevous structure in the dynamic loader to walk the list of loaded
> shared objects. You can cast _r_debug to the full structure used by
> the loader if you don't care about being portable to any other version
> of glibc or support different structures as glibc changes version
> (using the right one for the right version).
>
>> So my current approach is as follows. I get a list of required dependencies
>> by iterating the .dynamic section of the executable and getting all the
>> entries that are tagged as DT_NEEDED. The next step would be to find the
>> shared objects with SONAME's matching the previous list. At this point, I
>> was left wondering if there was a way to obtain them other than parsing
>> ld.so.cache, but I had not luck. Correct me if I'm wrong, but it seems that
>> the linker is a completely isolated entity, and the only functionality
>> exported is via ldopen() function family.
>
> You do not need to parse ld.so.cache. You *do* need to parse
> ld.so.conf in order to determine where the dynamic linker will search
> for shared libraries. You also need to read each ELF file and look for
> DT_SONAME to determine the soname of that shared library. You then
> need to follow the normal ELF rules and keep recursively finding all
> the DSOs that would be needed to form the final application image.
>
>> Now, I need to parse ld.so.cache, but it seems I'm not able to figure it's
>> format. There are 2 versions, and there is a note that says that for
>> Glibc2.2 there is a new format added in a compatible way, but I'm inclined
>> to think that a normal file would rely only in one of those, even if the
>> second is kept for compatibility's sake.
>
> The format of the cache on a given machine is going to be constant. We
> haven't changed the format in years. You should not have to worry
> about the old pre-glibc-2.2 format.
>
> To understand the format you must read and understand all of
> elf/dl-cache.c. There is no public documentation for this, but I would
> be very grateful if you want to document your findings on the
> community wiki (https://sourceware.org/glibc/wiki/).
>
> To edit the wiki you have to (a) register and then (b) get someone to
> vouch for you and add you to EditorGroup, just ask on #glibc and
> someone should help you. This process prevents 100% of spam because
> you have to talk to a real human.
>
>> So now my questions are (answering the outermost would render the remaining
>> ones irrelevant):
>>
>> - Is there a clean way to get the functionality of ldd programmatically.
>> I.E. without having to call ldd with pipes?
>
> It depends on what you mean by clean. I described 3 possible ways. The
> easiest solution is using the dynamic loader in trace mode since it's
> the most accurate reflection of reality and doesn't duplicate any
> code.
>
> We have been trying to collect use cases for a new tooling interface
> library that would allow introspection into the running
> process/threads etc.
> https://sourceware.org/glibc/wiki/Tools%20Interface%20NG
>
> Your use case might be useful, please add it :-)
>
>> - Is there a clean way to resolve SONAMES with the corresponding shared
>> object in the system
>
> That depends on what you mean by clean. You have to (a) follow the
> standard lookup rules (b) follow the lookup paths in ld.so.conf and
> (c) follow the runtime lookup rules if the application happens to
> preload a library via LD_PRELOAD or (e) open a library via dlopen.
> Otherwise a static analysis of DT_SONAME lookups yields only the
> static results you would expect. You need to do this lookup yourself
> by parsing ELF files.
>
>> - How can I parse ld.so.cache? More precissely, is it safe? Is it worth
>> doing? What's the format it uses in glic-2.19 (that's not a real issue, it
>> can be 2.XX)? There's a macro in glibc's source code that ues a binary
>> search IIRC, but I'm not limited to that (in the sense I can use some less
>> efficient alternatives if I'm able to figure the format in order to progress
>> faster*)
>
> You have to write your own code to parse ld.so.cache or copy code from
> elf/* with the license and copyright being applied to your project now
> for including that code. The code for parsing ld.so.cache is mostly in
> elf/dl-cache.c. I would just parse the ld.so.conf file and then parse
> the ELF files without relying on the cache, but that's your design
> choice. Simplest of all is to process the output of ld.so in trace
> mode.
>
> It is safe to parse ld.so.cache as long as you follow the rules for
> locking the configuration and cache files so you don't see partial
> updates if another process or upgrade is running that modifies the
> conf or cache files.
>
>> Thank you very much for your time, and I'm really sorry for the lengthy
>> email.
>
> No worries.
>
>> * For an undergraduate assignment I think this is more than enough,
>> considering this mail makes for a tenth of the whole thing.
>
> Given that this is an undergraduate assignment I'd just parse the
> output of ld.so in trace mode e.g. ldd.
>
> Cheers,
> Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]