This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

LD_HWCAP_MASK failure with tst-env-setuid


Adhemerval,

I tried a bunch of things with the LD_HWCAP_MASK and tst-env-setuid and
other programs and my current conclusion is that it may be due to a
stale tst-env-setuid binary.  I've attached a long form description of
things I tried for you or others to poke holes into, but can you confirm
if a clean build and test run also fails similarly for you?

Here's what I did:

1. To begin with, I simply ran /bin/true with LD_HWCAP_MASK set:

LD_HWCAP_MASK=0xffffffff /bin/true

and sure enough, on one of my boxes it failed with the ENOMEM and on
another it took a good 5-6 seconds before finishing.  This confirmed
that the issue has been long-standing but was never really noticed.  At
this point I was going with the assumption that this was a generic bug
and did not bother testing aarch64.

2. Now I tried running elf/ld.so under a debugger and was able to see
the delay, but I was simply unable to break at the point of the delay or
failure.  I could not understand at that point what was going on, so I
moved on to something else

3. Now I ran /bin/true with testrun.sh and the LD_HWCAP_MASK envvar set
and could see the delay.  I tried attaching to elf/ld.so during that
delay and once again it seemed to be in arbitrary places and I could not
figure out what was going on.

4. I ran perf and found the place in _dl_important_hwcaps where the
program spent the most time.  I put a bunch of _dl_debug_printf's all
over the place and oddly the printfs near the hotspot never even got
invoked, the function was returning much before that.

5. And then my Alexander Graham Bell moment happened, where I
accidentally ran elf/ld.so directly instead of from within testrun.sh
and the program succeeded immediately, no more delay.  Likewise on the
other box, running the built elf/ld.so directly no longer showed the
ENOMEM failure.

6. Then I formed the hypothesis that using the old glibc from the system
was to blame and that trunk glibc was working fine.  This fit in with
all of the failures perfectly because all of them involved execution of
a shell or another intermediary program using the system dynamic linker
and that is what was failing, not the test.  gdb could not break at that
point because the delay was in the shell it had invoked to start the
program; the program had not even started.

I decided to test this by doing a git bisect.

7. The bisect led to the fix for pr#21391 that HJ Lu pushed, which
seemed to have stopped the delays and ENOMEMs in their tracks.  This led
me to conclude that the issue is specific to x86 and does not affect
aarch64.  I tested that hypothesis using my mustang aarch64 machine and
sure enough, it succeeded all of the tests that x86 failed.

So to conclude, the only way that tst-env-setuid would have failed for
you in this case was if it was stale i.e. failed to rebuild somehow.
Hence my request to test again with a clean build.

Phew.

Thanks,
Siddhesh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]