[PATCH - RFC] WIP: fix backtrace test to work on musl

Érico Nogueira ericonr@disroot.org
Mon Feb 8 23:37:48 GMT 2021


From: Érico Rolim <erico.erc@gmail.com>

---

Currently, the run-backtrace-native.sh test fails on musl systems. As
seen here in the test suite log, this appears to happen because elfutils
expects raise() to be the last function in the stack trace, which it
isn't, because unlike glibc, which uses inline functions for setting and
resetting the signal mask, musl chose to use a normal function.

FAIL: run-backtrace-native.sh
=============================

0x55c581dbe000	0x55c581dc3000	/builddir/elfutils-0.183/tests/backtrace-child
0x7fb0cfbf8000	0x7fb0cfca5000	/usr/lib/libc.so
0x7fff865be000	0x7fff865bf000	[vdso: 3035]
TID 3035:
# 0 0x7fb0cfc4e3e7    	__restore_sigs
# 1 0x7fb0cfc4e590 - 1	raise
# 2 0x55c581dbf240 - 1	main
# 3 0x7fb0cfc166fa - 1	libc_start_main_stage2
# 4 0x55c581dbf329 - 1	_start
TID 3057:
# 0 0x7fb0cfc4e3e7    	__restore_sigs
/builddir/elfutils-0.183/tests/backtrace: dwfl_thread_getframes: no matching address range
Assertion failed: symname && strcmp (symname, "raise") == 0 (backtrace.c: callback_verify: 111)
./test-subr.sh: line 84:  3023 Aborted                 (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" $VALGRIND_CMD "$@"
# 2 0x55c581dbf240 - 1	main
backtrace-child: neither empty nor just out of DWARF
FAIL run-backtrace-native.sh (exit status: 1)

=====================

By applying the patch below, which includes some comments from what I
undertstood of the code, I was able to get the test to fail differently,
but I'm not sure I am actually closer to making the test just work.

FAIL: run-backtrace-native.sh
=============================

0x5572f0661000	0x5572f0666000	/builddir/elfutils-0.183/tests/backtrace-child
0x7ff1dac68000	0x7ff1dad15000	/usr/lib/libc.so
0x7fff70fc1000	0x7fff70fc2000	[vdso: 10694]
TID 10694:
# 0 0x7ff1dacbe3e7    	__restore_sigs
# 1 0x7ff1dacbe590 - 1	raise
# 2 0x5572f0662240 - 1	main
# 3 0x7ff1dac866fa - 1	libc_start_main_stage2
# 4 0x5572f0662329 - 1	_start
TID 10697:
# 0 0x7ff1dacbe3e7    	__restore_sigs
frameno: 0 symname: __restore_sigs
# 1 0x7ff1dacbe590 - 1	raise
frameno: 0 symname: raise
# 2 0x5572f066251b - 1	sigusr2
frameno: 1 symname: sigusr2
# 3 0x5572f06625bc - 1	stdarg
frameno: 4 symname: stdarg
# 4 0x5572f06625e2 - 1	backtracegen
frameno: 5 symname: backtracegen
# 5 0x5572f06625fb - 1	start
frameno: 6 symname: start
# 6 0x7ff1daccf7ee - 1	start
frameno: 7 symname: start
# 7 0x7ff1dacdc91b - 1	__clone
frameno: 8 symname: __clone
# 8 0x7ff1dacdc91b - 1	__clone
frameno: 9 symname: __clone
# 9 0x7ff1dacdc91b - 1	__clone
frameno: 10 symname: __clone
#10 0x7ff1dacdc91b - 1	__clone
frameno: 11 symname: __clone
#11 0x7ff1dacdc91b - 1	__clone
frameno: 12 symname: __clone
#12 0x7ff1dacdc91b - 1	__clone
frameno: 13 symname: __clone
#13 0x7ff1dacdc91b - 1	__clone
frameno: 14 symname: __clone
#14 0x7ff1dacdc91b - 1	__clone
frameno: 15 symname: __clone
#15 0x7ff1dacdc91b - 1	__clone
frameno: 16 symname: __clone
#16 0x7ff1dacdc91b - 1	__clone
frameno: 17 symname: __clone
/builddir/elfutils-0.183/tests/backtrace: dwfl_thread_getframes: no matching address range
/builddir/elfutils-0.183/tests/backtrace: Too many frames: 17

# 2 0x5572f0662240 - 1	main
backtrace-child: neither empty nor just out of DWARF
FAIL run-backtrace-native.sh (exit status: 1)

=====================

Is my currrent approach somewhat correct? Have I identified the issue
correctly, at least?

I think the trick of making reduce_frameno a global and setting it
before dwfl_thread_getframes isn't really necessary (the output doesn't
change), but it was one of the variables that seemed capable of
affecting the program output, so I tried messing some with it.

I think that this test failing shows this part of the library is working
correctly, but it would be nice to also get the testsuite working
perfectly.

I have also thought about looping through stack frames until symname is
"raise", since other libcs (and possibly even glibc?) could one day add
a function call inside raise(), which would break this test. What do you
think of that solution?

The patch doesn't include a signed off by line because it's not fit for
inclusion (or at least not yet).

 tests/backtrace.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/tests/backtrace.c b/tests/backtrace.c
index 36c8b8c4..6bea2859 100644
--- a/tests/backtrace.c
+++ b/tests/backtrace.c
@@ -51,6 +51,7 @@ main (int argc __attribute__ ((unused)), char **argv)
 }
 
 #else /* __linux__ */
+static bool reduce_frameno;
 
 static int
 dump_modules (Dwfl_Module *mod, void **userdata __attribute__ ((unused)),
@@ -94,18 +95,29 @@ callback_verify (pid_t tid, unsigned frameno, Dwarf_Addr pc,
   static bool duplicate_sigusr2 = false;
   if (duplicate_sigusr2)
     frameno--;
-  static bool reduce_frameno = false;
   if (reduce_frameno)
     frameno--;
   if (! use_raise_jmp_patching && frameno >= 2)
     frameno += 2;
   const char *symname2 = NULL;
+  /* it expects:
+   * - 0 raise -> blocking signals in glibc is an inline function with a syscall
+   * - 1 sigusr2
+   *
+   * what it gets:
+   * - 0 __restore_sigs -> in musl it's a proper function
+   * - 1 raise
+   *
+   * it's offset by 1, how to solve?
+   * */
+  printf("frameno: %u symname: %s\n", frameno, symname);
+  //frameno--;
   switch (frameno)
   {
     case 0:
       if (! reduce_frameno && symname
 	       && (strcmp (symname, "__kernel_vsyscall") == 0
-		   || strcmp (symname, "__libc_do_syscall") == 0))
+		   || strcmp (symname, "__libc_do_syscall") == 0 || strcmp (symname, "__restore_sigs") == 0))
 	reduce_frameno = true;
       else
 	assert (symname && strcmp (symname, "raise") == 0);
@@ -117,6 +129,7 @@ callback_verify (pid_t tid, unsigned frameno, Dwarf_Addr pc,
       /* __restore_rt - glibc maybe does not have to have this symbol.  */
       break;
     case 3: // use_raise_jmp_patching
+      /* false */
       if (use_raise_jmp_patching)
 	{
 	  /* Verify we trapped on the very first instruction of jmp.  */
@@ -187,6 +200,7 @@ frame_callback (Dwfl_Frame *state, void *frame_arg)
   printf ("#%2d %#" PRIx64 "%4s\t%s\n", *framenop, (uint64_t) pc,
 	  ! isactivation ? "- 1" : "", symname ?: "<null>");
   pid_t tid = dwfl_thread_tid (thread);
+  /* this is the function called for each stack frame in child */
   callback_verify (tid, *framenop, pc, symname, dwfl);
   (*framenop)++;
 
@@ -198,6 +212,7 @@ thread_callback (Dwfl_Thread *thread, void *thread_arg __attribute__((unused)))
 {
   printf ("TID %ld:\n", (long) dwfl_thread_tid (thread));
   int frameno = 0;
+  reduce_frameno = false;
   switch (dwfl_thread_getframes (thread, frame_callback, &frameno))
     {
     case 0:
-- 
2.30.0



More information about the Elfutils-devel mailing list