Bug 24488 - ebl_openbackend might use wrong library search path
Summary: ebl_openbackend might use wrong library search path
Status: RESOLVED FIXED
Alias: None
Product: elfutils
Classification: Unclassified
Component: backends (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Mark Wielaard
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-26 13:07 UTC by Matthias Maennich
Modified: 2019-05-30 18:52 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2019-04-26 00:00:00


Attachments
Factor out loading of ebl backend library and try multiple times with bin/lib origin paths (1.61 KB, patch)
2019-05-11 15:49 UTC, Mark Wielaard
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Maennich 2019-04-26 13:07:47 UTC
When starting to use the new --enable-asan feature, I noticed some odd behaviour for when libabigail asks libelf for values from .debug_str. That only happened when the asan runtime was loaded (either by linking with libasan or by preloading it). It is best explained with this reproducer:

Please note, I did not use the --enable-asan option to build as I could reduce the issue to just preloading the asan runtime.

test.cc:

 void func() {}
 class a {
   int b();
 };
 int a::b() {}

compile that with

  $ g++ -g -c test.cc     // It worked for me with gcc 6.3.0 and gcc 7.3.0

and analyze with abidw (latest master, I used ABIGAIL_DEV=yes, --disable-shared), gets you (correct)

$ abidw --no-show-locs test.o


<abi-corpus path='test.o' architecture='elf-amd-x86_64'>
  <elf-function-symbols>
    <elf-symbol name='_Z4funcv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
    <elf-symbol name='_ZN1a1bEv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
  </elf-function-symbols>
  <abi-instr version='1.0' address-size='64' path='test.cc' comp-dir-path='/tmp' language='LANG_C_plus_plus'>
    <class-decl name='a' size-in-bits='8' visibility='default' id='type-id-1'>
      <member-function access='private'>
        <function-decl name='b' mangled-name='_ZN1a1bEv' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_ZN1a1bEv'>
          <parameter type-id='type-id-2' is-artificial='yes'/>
          <return type-id='type-id-3'/>
        </function-decl>
      </member-function>
    </class-decl>
    <type-decl name='int' size-in-bits='32' id='type-id-3'/>
    <pointer-type-def type-id='type-id-1' size-in-bits='64' id='type-id-2'/>
    <type-decl name='void' id='type-id-4'/>
    <function-decl name='func' mangled-name='_Z4funcv' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z4funcv'>
      <return type-id='type-id-4'/>
    </function-decl>
  </abi-instr>
</abi-corpus>


When running this with preloaded asan runtime, the result is wrong:

$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libasan.so.4 abidw --no-show-locs test.o

<abi-corpus path='test.o' architecture='elf-amd-x86_64'>
  <elf-function-symbols>
    <elf-symbol name='_Z4funcv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
    <elf-symbol name='_ZN1a1bEv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
  </elf-function-symbols>
  <abi-instr version='1.0' address-size='64' path='func' comp-dir-path='func' language='LANG_C_plus_plus'>
    <class-decl name='a' size-in-bits='8' visibility='default' id='type-id-1'>
      <member-function access='private'>
        <function-decl name='b' mangled-name='func' visibility='default' binding='global' size-in-bits='64'>
          <parameter type-id='type-id-2' is-artificial='yes'/>
          <return type-id='type-id-3'/>
        </function-decl>
      </member-function>
    </class-decl>
    <type-decl name='int' size-in-bits='32' id='type-id-3'/>
    <pointer-type-def type-id='type-id-1' size-in-bits='64' id='type-id-2'/>
  </abi-instr>
</abi-corpus>

Especially the following lines extracted from above:

   <abi-instr version='1.0' address-size='64' path='func' comp-dir-path='func' language='LANG_C_plus_plus'>
                                                    ^^^^                 ^^^^

   <function-decl name='b' mangled-name='func' visibility='default' binding='global' size-in-bits='64'>
                                         ^^^^

The occurrences of 'func' are wrong. In fact, the values for these should come from .debug_str. It turns out, 'func' is the value at offset 0 in .debug_str.
See $ dwarfdump -s test.o:

.debug_str
name at offset 0x00000000, length    4 is 'func'
name at offset 0x00000005, length   47 is 'GNU C++14 7.3.0 -mtune=generic -march=x86-64 -g'
name at offset 0x00000035, length    9 is '_ZN1a1bEv'
name at offset 0x0000003f, length    4 is 'this'
name at offset 0x00000044, length    7 is 'test.cc'
name at offset 0x0000004c, length    8 is '_Z4funcv'
name at offset 0x00000055, length    4 is '/tmp'

So, it appears to me that the offset is calculated wrong as constant 0.

When attempting to debug that issue, it did look like libelf is not returning the correct values. But I am not sure where the root cause actually is.
Comment 1 Matthias Maennich 2019-04-26 14:43:25 UTC
The issue seems to be that libdw can't load its backends (in this very case libebl_x86_64.so) when run with preloaded asan.

When setting LD_LIBRARY_PATH to explicitly contain said library, the behaviour is restored. That can be used as a workaround.

It appears that libasan is overloading dlopen and does not respect the full lookup path for libdw (i.e. the path containing libebl)

maybe related: https://bugzilla.redhat.com/show_bug.cgi?id=1449604
Comment 2 Mark Wielaard 2019-04-26 22:30:28 UTC
We discussed this on irc a bit and the real bug is indeed in the sanitizer.
When overriding ldopen it doesn't obey the RUNPATH set in libdw.so. And so 

But on Fedora it still works because ebl_openbackend first tries to load from:

#ifndef LIBEBL_SUBDIR
# define LIBEBL_SUBDIR PACKAGE
#endif
#define ORIGINDIR "$ORIGIN/../$LIB/" LIBEBL_SUBDIR "/"

        /* Give it a try.  At least the machine type matches.  First
           try to load the module.  */
        char dsoname[100];
        strcpy (stpcpy (stpcpy (dsoname, ORIGINDIR "libebl_"),
                        machines[cnt].dsoname),
                ".so");

        void *h = dlopen (dsoname, RTLD_LAZY);

This doesn't work on Debian based systems though.
$LIB will expand to "lib" (on 32bit systems) or "lib64" (on 64bit systems).
But on Debian (amd64) everything is installed under /usr/lib/x86_64-linux-gnu

It is not immediately clear why we use ../$LIB/
I think we can just use #define ORIGINDIR "$ORIGIN/" LIBEBL_SUBDIR "/"
Comment 3 Mark Wielaard 2019-04-27 15:49:32 UTC
(In reply to Mark Wielaard from comment #2)
> We discussed this on irc a bit and the real bug is indeed in the sanitizer.
> When overriding ldopen it doesn't obey the RUNPATH set in libdw.so. And so 
> 
> But on Fedora it still works because ebl_openbackend first tries to load
> from:
> 
> #ifndef LIBEBL_SUBDIR
> # define LIBEBL_SUBDIR PACKAGE
> #endif
> #define ORIGINDIR "$ORIGIN/../$LIB/" LIBEBL_SUBDIR "/"
> 
>         /* Give it a try.  At least the machine type matches.  First
>            try to load the module.  */
>         char dsoname[100];
>         strcpy (stpcpy (stpcpy (dsoname, ORIGINDIR "libebl_"),
>                         machines[cnt].dsoname),
>                 ".so");
> 
>         void *h = dlopen (dsoname, RTLD_LAZY);
> 
> This doesn't work on Debian based systems though.
> $LIB will expand to "lib" (on 32bit systems) or "lib64" (on 64bit systems).
> But on Debian (amd64) everything is installed under /usr/lib/x86_64-linux-gnu

Some experiments on Debian seem to point to $LIB expanding to lib/x86_64-linux-gnu. But I cannot find any documentation for that. Debian's own documentation  https://manpages.debian.org/unstable/manpages/ld.so.8.en.html says:

  $LIB (or equivalently ${LIB})
    This expands to lib or lib64 depending on the architecture (e.g., on x86-64, it expands to lib64 and on x86-32, it expands to lib). 

> It is not immediately clear why we use ../$LIB/
> I think we can just use #define ORIGINDIR "$ORIGIN/" LIBEBL_SUBDIR "/"

That won't work. The path is used for both binaries and libraries that rely on ebl backends. When built in a binary (say eu-elflint) that doesn't use libdw we need the ../$LIB/ part to get from prefix bin/ dir to the corresponding prefix /lib[64] dir.

The original idea was that you could find the ebl backend libraries relative to whereever the binary (eu-xxx) or library (libdw.so) were installed.

The problem on systems that use a deeper library paths we cannot use the same search path for binaries and libraries (../ doesn't get us to the top of the install prefix).
Comment 4 Frank Ch. Eigler 2019-04-28 16:11:37 UTC
Is it worth reconsidering the dynamic loading model for libebl?  Each target backend consists of about 30kB of stripped .so content, for half a megabyte in total.
Comment 5 Mark Wielaard 2019-04-28 16:27:11 UTC
(In reply to Frank Ch. Eigler from comment #4)
> Is it worth reconsidering the dynamic loading model for libebl?  Each target
> backend consists of about 30kB of stripped .so content, for half a megabyte
> in total.

Yes, I believe that also makes sense. Certainly for the "native" backend. But maybe for all. Note that DTS does this (to make static linking possible, so things work without the backend shared libraries being installed). But it is a bit of an ugly hack atm. See the mjw/RH-DTS branch. It might make sense to clean that up and maybe make it configurable which backends are built in.

Still it would be helpful to better understand how the dlopen search path and substitutions work on Debian based systems.
Comment 6 Mark Wielaard 2019-05-11 15:49:52 UTC
Created attachment 11770 [details]
Factor out loading of ebl backend library and try multiple times with bin/lib origin paths

I think we just have to try twice. The first time using the $ORIGIN as if it came from an executable (in bin/) and then using the $ORIGIN as if it came from an library (in lib[64]/ or lib/<arch>/). So first time using ../$LIB and second time just with the elfutils EBL_SUBDIR.

The first is what we do now and always work on multilib systems. The second try works when loading relative to a library whether on a multilib or multiarch system.

Then we use the same fallback (not using any path) we used already (to take advantage of any RPATH or LD_LIBRARY_PATH setting).

Could someone try this out on a Debian based system to see if it works as intended?
Comment 7 Matthias Maennich 2019-05-30 10:42:25 UTC
I tried the patch against libdw1-0.168-1 on Debian Stretch. It cleanly applied and solved the very issue I saw.

Thanks!

[1] https://packages.debian.org/stretch/libdw1
Comment 8 Mark Wielaard 2019-05-30 18:52:21 UTC
(In reply to Matthias Maennich from comment #7)
> I tried the patch against libdw1-0.168-1 on Debian Stretch. It cleanly
> applied and solved the very issue I saw.

Thanks for testing. Pushed as follows:

commit bfcf8b1fee8805b42b262baf352c58574df59362 (HEAD -> master, origin/master, origin/HEAD)
Author: Mark Wielaard <mark@klomp.org>
Date:   Sat May 11 16:55:01 2019 +0200

    libebl: Try harder to find backend library in bin and lib origin paths.
    
    eblopenbackend tries to find libraries based on the $ORIGIN/../$LIB/
    path. But depending on whether the system is multilib or multiarch
    this doesn't always work. On multilib systems $LIB is always just one
    directory deep (it is either .../lib or .../lib64) but on multiarch
    systems it can be multiple directories deep (.../lib/x86_64-linux-gnu).
    This means that on multiarch systems $ORIGIN/../$LIB only works for
    binaries (where origin is .../bin/), but not for libraries.
    
    Most of the time it still works because of RPATH which is tried afterwards.
    But RPATH processing does not always work reliable.
    
    So try multiple paths first. The first time using the $ORIGIN as if it
    came from an executable (in bin/) and then using the $ORIGIN as if it
    came from an library (in lib[64]/ or lib/<arch>/). So first time using
    ../$LIB and second time just with the elfutils EBL_SUBDIR.
    
    The first is what we do now and always work on multilib systems. The
    second try works when loading relative to a library whether on a multilib
    or multiarch system.
    
    Then we use the same fallback (not using any path) we used already
    (to take advantage of any RPATH or LD_LIBRARY_PATH setting).
    
    https://sourceware.org/bugzilla/show_bug.cgi?id=24488
    
    Signed-off-by: Mark Wielaard <mark@klomp.org>