Bug 19303 - nptl/tst-cancel24-static fails on arm, mips and hppa
Summary: nptl/tst-cancel24-static fails on arm, mips and hppa
Status: RESOLVED DUPLICATE of bug 19826
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: 2.23
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-29 19:28 UTC by Aurelien Jarno
Modified: 2017-07-27 12:06 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aurelien Jarno 2015-11-29 19:28:54 UTC
Starting with version 2.22, the test nptl/tst-cancel24-static fails on at least: arm-linux-gnueabi, arm-linux-gnueabihf, hppa-linux-gnu, mips-linux-gnu and mipsel-linux-gnu. However it doesn't *seem* to fail on aarch64-linux-gnu, i586-linux-gnu, mips64el-linux-gnu, powerpc-linux-gnu, powerpc64le-linux-gnu, s390x-linux-gnu, x86_64-linux-gnu. I said "seem" because I have seen at least one run (over a few dozens) where the test didn't fail on arm-linux-gnueabihf.

I have been able to track down this regression to the following commit:

commit f8aeae347377f3dfa8cbadde057adf1827fb1d44
Author: Alexandre Oliva <aoliva@redhat.com>
Date:   Tue Mar 17 01:14:11 2015 -0300

    Fix DTV race, assert, DTV_SURPLUS Static TLS limit, and nptl_db garbage

The problem is still reproducible on master, and reverting this commit still fixes the issue. The backtrace shows that the issue happens in __cxa_begin_catch:

#0  0x00017fbc in __cxa_begin_catch ()
#1  0x00010b34 in tf (arg=<optimized out>) at tst-cancel24-static.cc:37
#2  0x0001206c in start_thread (arg=0xb6ffe300) at pthread_create.c:335
#3  0x0003bd7c in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:89
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Comment 1 Carlos O'Donell 2015-12-01 16:32:07 UTC
While this bug worries me, what is likely happening is bad interaction between static/dlopen. For cancellation we must dlopen libgcc.so, and there are lots of problems presently in glibc with static/dlopen use cases. Someone would have to debug this deeper to see what's going wrong in the test case.

There could be a race condition in the changes Alex made, but reverting them is not the right answer because it introduces a series of other problems which the patches solved.

A developer needs to step in to debug this thoroughly to determine what's wrong.
Comment 2 Szabolcs Nagy 2016-10-27 08:57:36 UTC
i think the fix for bug 19826 fixed this too.

(__cxa_begin_catch calls __cxa_get_globals which returns a pointer to tls, and most likely that had an uninitialized dtv entry.)
Comment 3 Szabolcs Nagy 2017-02-07 13:41:41 UTC
please close this as dup of bug 19826
Comment 4 Florian Weimer 2017-07-27 12:06:46 UTC

*** This bug has been marked as a duplicate of bug 19826 ***