Bug 29176 - run-backtrace-native-biarch.sh seems to fail on Ubuntu Jammy
Summary: run-backtrace-native-biarch.sh seems to fail on Ubuntu Jammy
Status: RESOLVED FIXED
Alias: None
Product: elfutils
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 29206 (view as bug list)
Depends on:
Blocks:
 
Reported: 2022-05-24 21:14 UTC by Evgeny Vereshchagin
Modified: 2023-11-14 12:00 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2022-05-28 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Evgeny Vereshchagin 2022-05-24 21:14:49 UTC
I tried to switch to Ubuntu Jammy in https://github.com/evverx/elfutils/pull/83 and the test started failing there with
```
FAIL: run-backtrace-native-biarch.sh
====================================

case 0: expected symname 'raise' got '(null)'
./test-subr.sh: line 84: 23451 Aborted                 (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" $VALGRIND_CMD "$@"
backtrace-child-biarch: no main
FAIL run-backtrace-native-biarch.sh (exit status: 1)

```
It still passes on Ubuntu Focal.

FWIW switching to Ubuntu Jammy somehow "fixed" run-debuginfod-fd-prefetch-caches.sh (which appears to be flaky on Ubuntu Focal and fails more or less consistently when elfutils is built with --enable-gcov: https://github.com/evverx/elfutils/runs/6577995202)
Comment 1 Mark Wielaard 2022-05-27 15:31:21 UTC
(In reply to Evgeny Vereshchagin from comment #0)
> I tried to switch to Ubuntu Jammy in
> https://github.com/evverx/elfutils/pull/83 and the test started failing
> there with
> ```
> FAIL: run-backtrace-native-biarch.sh
> ====================================
> 
> case 0: expected symname 'raise' got '(null)'
> ./test-subr.sh: line 84: 23451 Aborted                 (core dumped)
> LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH"
> $VALGRIND_CMD "$@"
> backtrace-child-biarch: no main
> FAIL run-backtrace-native-biarch.sh (exit status: 1)
> 
> ```
> It still passes on Ubuntu Focal.
> 
> FWIW switching to Ubuntu Jammy somehow "fixed"
> run-debuginfod-fd-prefetch-caches.sh (which appears to be flaky on Ubuntu
> Focal and fails more or less consistently when elfutils is built with
> --enable-gcov: https://github.com/evverx/elfutils/runs/6577995202)

Note that github makes log non-public by default, so it is hard to see what is going on.

Do you have any more information on what changed between "Focal" and "Jammy", glibc upgrade? some system settings, gcc upgrade? That might explain what you are seeing?

Basically the testcase says it cannot find the name associated with the frame. It is NULL while it is expecting the symbol name "raise".

This is a somewhat gnarly test. Best might be to add some extra printfs to tests/backtrace.c (callback_verify) printing the frameno and framename found to see what is going on.
Comment 2 Evgeny Vereshchagin 2022-05-27 16:04:31 UTC
> Do you have any more information on what changed between "Focal" and "Jammy", glibc upgrade? some system settings, gcc upgrade? That might explain what you are seeing?

I think everything was upgraded there. As far as I can see gcc-9.4.0 was replaced with gcc-11.2.0 and glibc was upgraded from 2.31-0ubuntu9.7 to 2.35-0ubuntu3.

> Best might be to add some extra printfs to tests/backtrace.c (callback_verify) printing the frameno and framename found to see what is going on.

I'll try to do that.
Comment 3 Evgeny Vereshchagin 2022-05-27 17:02:15 UTC
I added printf and here's what it printed on Ubuntu Jammy:
```
FRAMENO: '0', SYMNAME: '__kernel_vsyscall'
FRAMENO: '1', SYMNAME: ''
FRAMENO: '2', SYMNAME: 'raise'
FRAMENO: '3', SYMNAME: 'main'
FRAMENO: '4', SYMNAME: ''
FRAMENO: '5', SYMNAME: '__libc_start_main'
FRAMENO: '6', SYMNAME: '_start'
FRAMENO: '0', SYMNAME: '__kernel_vsyscall'
FRAMENO: '1', SYMNAME: ''
case 0: expected symname 'raise' got '(null)'
```

On Fedora 35 (where the test passes) I got
```
FRAMENO: '0', SYMNAME: '__kernel_vsyscall'
FRAMENO: '1', SYMNAME: '__pthread_kill_implementation'
FRAMENO: '2', SYMNAME: 'raise'
FRAMENO: '3', SYMNAME: 'main'
FRAMENO: '0', SYMNAME: '__kernel_vsyscall'
FRAMENO: '1', SYMNAME: '__pthread_kill_implementation'
FRAMENO: '2', SYMNAME: 'raise'
FRAMENO: '3', SYMNAME: 'sigusr2'
FRAMENO: '4', SYMNAME: 'stdarg'
FRAMENO: '5', SYMNAME: 'backtracegen'
FRAMENO: '6', SYMNAME: 'start'
FRAMENO: '7', SYMNAME: 'start_thread'
FRAMENO: '8', SYMNAME: '__clone3'
FRAMENO: '0', SYMNAME: ''
```
Comment 4 Evgeny Vereshchagin 2022-05-27 20:58:13 UTC
Looks like it's possible to make the test pass there by installing libc6-i386-dbgsym (though I'm not sure why the test passes without that package on Focal). Anyway it doesn't seem to be an elfutils issue. I'll go ahead and close it. Thanks!
Comment 5 Mark Wielaard 2022-05-28 00:01:02 UTC
I hope you don't mind me reopening the bug. I would like the testcase to work even without the dbgsym installed.

Is the dbgsym package for the main (x86_64) libc6 package also installed?

The problem is that the testcase requires all addresses to map to a know symbol name. But what we are really interested in is that we are unwinding through the main or the start_thread symbols. We probably should just skip any unknown/NULL symbols.
Comment 6 Evgeny Vereshchagin 2022-05-28 00:22:18 UTC
> Is the dbgsym package for the main (x86_64) libc6 package also installed?

As far as I can see libc6-dbg is installed there but even without it when code is compiled without -m32 and aborts backtraces don't contain NULL/unknown symbols:
```
#0  0x00007ffff7e25a7c in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7dd1476 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff7db77f3 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x0000555555555161 in main ()
```

With -m32 and without the "x32" debug symbols backtraces look like
```
#0  0xf7fc4559 in __kernel_vsyscall ()
#1  0xf7e10e37 in ?? () from /lib32/libc.so.6
#2  0xf7dc04c5 in raise () from /lib32/libc.so.6
#3  0xf7da93ac in abort () from /lib32/libc.so.6
#4  0x565561b5 in main ()
```

> We probably should just skip any unknown/NULL symbols.

As far as I understand it should make the test pass even without the debug symbols.
Comment 7 Letu Ren 2022-05-31 14:14:08 UTC
*** Bug 29206 has been marked as a duplicate of this bug. ***
Comment 8 Jan-Benedict Glaw 2022-09-13 10:06:04 UTC
I see this on my autobuilder as well (for run-backtrace-native-biarch.sh and run-backtrace-native-core-biarch.sh), so keeping an eye on this.
Comment 9 Jan-Benedict Glaw 2023-03-04 11:24:56 UTC
Is there already a decision on whether or not the tests should pass when there's no dbgsym package installed for libc?
Comment 10 Mark Wielaard 2023-03-04 20:35:26 UTC
(In reply to Jan-Benedict Glaw from comment #9)
> Is there already a decision on whether or not the tests should pass when
> there's no dbgsym package installed for libc?

The test should pass even without the dbgsym package. The unwinder should work without any extra debuginfo installed. It (now) also fails on debian-testing in the buildbot:
https://builder.sourceware.org/buildbot/#/builders/elfutils-debian-testing-x86_64

PASS: run-backtrace-native.sh
SKIP: run-backtrace-native-core.sh
FAIL: run-backtrace-native-biarch.sh
SKIP: run-backtrace-native-core-biarch.sh

The skips there are because there are no core files created.

The issue seems to be a testcase issue, where it expects "real" symbol names, even if it doesn't really matter.
Comment 12 Jan-Benedict Glaw 2023-03-05 09:34:24 UTC
Pulling in that patch works for me as well.
Comment 13 Mark Wielaard 2023-03-05 12:16:27 UTC
(In reply to Jan-Benedict Glaw from comment #12)
> Pulling in that patch works for me as well.

Great. Pushed as:

commit a7f65495258933eaf361e82eb325c9d826b455d5 (HEAD -> master)
Author: Mark Wielaard <mark@klomp.org>
Date:   Sat Mar 4 21:55:56 2023 +0100

    tests: skip '(null)' symname frames in backtrace tests
    
    Some setups might have some frames for unknown (null) functions
    in the thread backtrace. Skip these frames instead of failing
    immediately.
    
        * tests/backtrace.c (callback_verify): Check and skip nulls_seen.
    
    https://sourceware.org/bugzilla/show_bug.cgi?id=29176
    
    Signed-off-by: Mark Wielaard <mark@klomp.org>
Comment 14 Bill Scherr 2023-11-14 11:56:47 UTC
elfutils-0.190 solves 
FAIL: run-backtrace-native-biarch.sh
and
FAIL: run-backtrace-native.sh
on Gentoo.  Verified Patch included!