Bug 19329 - dl-tls.c assert failure at concurrent pthread_create and dlopen
Summary: dl-tls.c assert failure at concurrent pthread_create and dlopen
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.22
: P2 normal
Target Milestone: 2.34
Assignee: Szabolcs Nagy
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-04 12:37 UTC by Szabolcs Nagy
Modified: 2021-10-19 12:23 UTC (History)
29 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
test case (main module) (237 bytes, text/x-csrc)
2016-01-08 18:19 UTC, Szabolcs Nagy
Details
test case (build script) (321 bytes, application/x-shellscript)
2016-01-08 18:20 UTC, Szabolcs Nagy
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Szabolcs Nagy 2015-12-04 12:37:03 UTC
(this is a continuation of bug 17918, but it turns out to be a different
issue that was originally reported there.)

failure:

Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' failed!

caused by dlopen (in _dl_add_to_slotinfo and in dl_open_worker) doing

  listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
  //...
  if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0))

while pthread_create (in _dl_allocate_tls_init) concurrently doing

  assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation));

so

T1:
  y = x + 1;
  ++x;

T2:
  assert(y <= x);

this is hard to trigger as the race window is short compared to the time
dlopen and pthread_create takes, however if i add a usleep(1000) between
the two operations in T1, it is triggered all the time.

the slotinfo and tls generation update lack any sort of synchronization or atomics in _dl_allocate_tls_init (dlopen holds GL(dl_load_lock)).

on x86_64 with added usleep:

(gdb) p _rtld_local._dl_tls_dtv_slotinfo_list->slotinfo[0]@64
$11 = {{gen = 0, map = 0x7ffff7ff94e8}, {gen = 1, map = 0x7ffff7ff94e8}, {gen = 2, map = 0x7ffff0000910}, {gen = 0, map = 0x0} <repeats 61 times>}
(gdb) p _rtld_local._dl_tls_generation
$12 = 1

T1:
#0  0x00007ffff7df2097 in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ffff7df1f74 in usleep (useconds=<optimised out>) at ../sysdeps/posix/usleep.c:32
#2  0x00007ffff7decc6b in dl_open_worker (a=a@entry=0x7ffff7611c80) at dl-open.c:527
#3  0x00007ffff7de8314 in _dl_catch_error (objname=objname@entry=0x7ffff7611c70, errstring=errstring@entry=0x7ffff7611c78, mallocedp=mallocedp@entry=0x7ffff7611c6f, 
    operate=operate@entry=0x7ffff7dec720 <dl_open_worker>, args=args@entry=0x7ffff7611c80) at dl-error.c:187
#4  0x00007ffff7dec2a9 in _dl_open (file=0x7ffff7611ee0 "mod-0.so", mode=-2147483646, caller_dlopen=0x4007e2 <start+34>, nsid=-2, argc=<optimised out>, 
    argv=<optimised out>, env=0x7fffffffe378) at dl-open.c:652
#5  0x00007ffff7bd5ee9 in dlopen_doit (a=a@entry=0x7ffff7611eb0) at dlopen.c:66
#6  0x00007ffff7de8314 in _dl_catch_error (objname=0x7ffff00008d0, errstring=0x7ffff00008d8, mallocedp=0x7ffff00008c8, operate=0x7ffff7bd5e90 <dlopen_doit>, 
    args=0x7ffff7611eb0) at dl-error.c:187
#7  0x00007ffff7bd6521 in _dlerror_run (operate=operate@entry=0x7ffff7bd5e90 <dlopen_doit>, args=args@entry=0x7ffff7611eb0) at dlerror.c:163
#8  0x00007ffff7bd5f82 in __dlopen (file=file@entry=0x7ffff7611ee0 "mod-0.so", mode=mode@entry=2) at dlopen.c:87
#9  0x00000000004007e2 in start (a=<optimised out>) at a.c:19
#10 0x00007ffff79bf3d4 in start_thread (arg=0x7ffff7612700) at pthread_create.c:333
#11 0x00007ffff76feedd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

T2:
#0  __GI___assert_fail (assertion=0x7ffff7df8840 "listp->slotinfo[cnt].gen <= GL(dl_tls_generation)", file=0x7ffff7df68e6 "dl-tls.c", line=493, 
    function=0x7ffff7df9020 <__PRETTY_FUNCTION__.9528> "_dl_allocate_tls_init") at dl-minimal.c:220
#1  0x00007ffff7deb492 in __GI__dl_allocate_tls_init (result=0x7fffb7fff700) at dl-tls.c:493
#2  0x00007ffff79bff67 in allocate_stack (stack=<synthetic pointer>, pdp=<synthetic pointer>, attr=0x7fffffffdf90) at allocatestack.c:579
#3  __pthread_create_2_1 (newthread=newthread@entry=0x7fffffffe078, attr=attr@entry=0x0, start_routine=start_routine@entry=0x4007c0 <start>, arg=arg@entry=0xd)
    at pthread_create.c:526
#4  0x000000000040062a in main () at a.c:34


i think
  GL(dl_tls_generation)
  GL(dl_tls_dtv_slotinfo_list)
  listp->slotinfo[i].map
  listp->slotinfo[i].gen
  listp->next
  
may all be accessed concurrently by pthread_create and dlopen without
any synchronization.

this can also cause wrong maxgen computation into dtv[0].counter
Comment 1 Ilya Palachev 2015-12-29 10:51:48 UTC
Hi, I've suggested a patch for this bug:
https://sourceware.org/ml/libc-alpha/2015-12/msg00570.html
Comment 2 Szabolcs Nagy 2016-01-08 18:19:09 UTC
Created attachment 8893 [details]
test case (main module)
Comment 3 Szabolcs Nagy 2016-01-08 18:20:10 UTC
Created attachment 8894 [details]
test case (build script)
Comment 4 Szabolcs Nagy 2016-02-02 11:14:58 UTC
assigned this to myself, will work on it for 2.24, the current latest patch is
https://sourceware.org/ml/libc-alpha/2016-01/msg00480.html
Comment 5 Matt Avant 2016-08-05 20:40:10 UTC
Is this patch still being reviewed? The last update I see is https://sourceware.org/ml/libc-alpha/2016-01/msg00620.html, but I'm not familiar with how issue tracking works for this project so I could easily have missed something...
Comment 6 Markus Trippelsdorf 2017-06-15 05:45:43 UTC
I sometimes see the same failure during make check:

env GCONV_PATH=/var/tmp/glibc-build/iconvdata LOCPATH=/var/tmp/glibc-build/localedata LC_ALL=C   /var/tmp/glibc-build/elf/ld-linux-x86-64.so.2 --library-path /var/tmp/glibc-build:/var/tmp/glibc-build/math:/var/tmp/glibc-build/elf:/var/tmp/glibc-build/dlfcn:/var/tmp/glibc-build/nss:/var/tmp/glibc-build/nis:/var/tmp/glibc-build/rt:/var/tmp/glibc-build/resolv:/var/tmp/glibc-build/crypt:/var/tmp/glibc-build/mathvec:/var/tmp/glibc-build/support:/var/tmp/glibc-build/nptl /var/tmp/glibc-build/nptl/tst-stack4  > /var/tmp/glibc-build/nptl/tst-stack4.out; \                                                                                                                                                                   
../scripts/evaluate-test.sh nptl/tst-stack4 $? false false > /var/tmp/glibc-build/nptl/tst-stack4.test-result                                                                      
Inconsistency detected by ld.so: dl-tls.c: 488: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! 

This is unfortunately not reproducible.
Comment 7 Pádraig Brady 2017-08-28 21:24:52 UTC
We were often hitting this issue with some multithreaded binaries with many shared libs. These patches referenced here, address the issue. Specifically:
  https://patches.linaro.org/patch/85007/
  https://patches.linaro.org/patch/85008/

These have been _extensively_ tested here with glibc-2.23 with many binaries
Comment 8 Carlos O'Donell 2017-08-29 20:30:16 UTC
(In reply to Pádraig Brady from comment #7)
> We were often hitting this issue with some multithreaded binaries with many
> shared libs. These patches referenced here, address the issue. Specifically:
>   https://patches.linaro.org/patch/85007/
>   https://patches.linaro.org/patch/85008/
> 
> These have been _extensively_ tested here with glibc-2.23 with many binaries

Please repost those to libc-alpha so we can review.
Comment 9 Pádraig Brady 2017-09-29 23:54:54 UTC
We found an off by one issue with this (with ASAN + certain number of shared libs). When the last vector in the _dl_allocate_tls_init list of vectors was of size one it would have been skipped. The fix is:

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 073321c..2c9ad2a 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -571,7 +571,7 @@ _dl_allocate_tls_init (void *result)
        }

       total += cnt;
-      if (total >= dtv_slots)
+      if (total > dtv_slots)
        break;

       /* Synchronize with dl_add_to_slotinfo.  */
Comment 10 Mike Mezeul 2018-01-16 18:20:03 UTC
Has there been any activity on this one lately? Does anyone know if a fix will be coming anytime soon?
Comment 11 Pádraig Brady 2018-01-17 15:41:50 UTC
This has been _very_ well tested at facebook
Note the additional fix in comment #9
It would be great to merge this. thanks!
Comment 12 Szabolcs Nagy 2018-01-17 16:40:58 UTC
(In reply to Pádraig Brady from comment #11)
> This has been _very_ well tested at facebook
> Note the additional fix in comment #9
> It would be great to merge this. thanks!

sorry i didnt have time to work on this in this release cycle, i'll try to look at it in the next one if others don't beat me to it (the comments can be improved, dtv_slots should be fixed so it has consistent meaning and one should reason about the consequences of removing the asserts, they might catch valid corruption that is still present via dlclose races that are not fixed).
Comment 13 Lukasz Koniecki 2019-02-27 18:43:08 UTC
(In reply to Szabolcs Nagy from comment #12)

> sorry i didnt have time to work on this in this release cycle, i'll try to
> look at it in the next one if others don't beat me to it (the comments can
> be improved, dtv_slots should be fixed so it has consistent meaning and one
> should reason about the consequences of removing the asserts, they might
> catch valid corruption that is still present via dlclose races that are not
> fixed).

Any update on this? It has been over a year since the last comment.
Comment 14 Mike Gulick 2019-05-24 19:14:35 UTC
Just want to add that the two patches posted here (and the off-by-one fix in the comments) have been running by my employer (MathWorks) on at least 1000 Debian 9 systems for the past 6 months without issue.  It would be great if these patches could be accepted into glibc itself.
Comment 15 Carlos O'Donell 2019-05-24 19:46:49 UTC
(In reply to Mike Gulick from comment #14)
> Just want to add that the two patches posted here (and the off-by-one fix in
> the comments) have been running by my employer (MathWorks) on at least 1000
> Debian 9 systems for the past 6 months without issue.  It would be great if
> these patches could be accepted into glibc itself.

None of these changes are easy to integrate because they fail to explain in detailed notes why they are correct. We take the conservative approach not to apply complete solutions. Someone seeing this problem has to take the position to champion the broader solution as positioned by Szabolcs from Arm. Alternatively someone needs to explain why the partial solution is better than no solution and champion that on libc-alpha.
Comment 16 Robert O'Callahan 2019-06-26 01:04:45 UTC
This testcase seems to reproduce the bug pretty reliably.

https://github.com/jrmuizel/dl-open-race
Comment 17 Robert O'Callahan 2019-06-26 03:03:44 UTC
Sorry, it actually does not. And I see there was already a testcase posted here. https://sourceware.org/ml/libc-alpha/2016-11/msg00917.html
Comment 18 Sergei Trofimovich 2020-05-03 10:56:07 UTC
In https://bugs.gentoo.org/719674#c12 gentoo sees nptl/tst-stack4 crashes somewhat reliably on arm64:

# while :; do date; env GCONV_PATH=/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/iconvdata LOCPATH=/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/localedata LC_ALL=C   /var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/elf/ld-linux-aarch64.so.1 --library-path /var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/math:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/elf:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/dlfcn:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/nss:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/nis:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/rt:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/resolv:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/mathvec:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/support:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/crypt:/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/nptl::/var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl//dlfcn /var/tmp/portage/sys-libs/glibc-2.30-r8/work/build-arm64-aarch64-unknown-linux-gnu-nptl/nptl/tst-stack4; done

Sun 03 May 2020 10:42:08 AM UTC
Sun 03 May 2020 10:42:21 AM UTC
Sun 03 May 2020 10:42:34 AM UTC
Didn't expect signal from child: got `Segmentation fault'
...
Sun 03 May 2020 10:42:56 AM UTC
malloc(): invalid size (unsorted)
Didn't expect signal from child: got `Aborted'
..
Sun 03 May 2020 10:46:21 AM UTC
free(): corrupted unsorted chunks
Didn't expect signal from child: got `Aborted'
...
Sun 03 May 2020 10:46:55 AM UTC
Didn't expect signal from child: got `Segmentation fault'
Sun 03 May 2020 10:47:04 AM UTC
double free or corruption (!prev)
Didn't expect signal from child: got `Aborted'
...
Sun 03 May 2020 10:50:54 AM UTC
free(): invalid pointer
Didn't expect signal from child: got `Aborted'
...
Sun 03 May 2020 10:52:12 AM UTC
tst-stack4: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
Didn't expect signal from child: got `Aborted'

Does it look like the same issue described here?

# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  1
Core(s) per socket:  48
Socket(s):           2
Vendor ID:           Cavium
Model:               1
Model name:          ThunderX 88XX
Stepping:            0x1
BogoMIPS:            200.00
L1d cache:           32K
L1i cache:           78K
L2 cache:            16384K
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32

# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/aarch64-unknown-linux-gnu/9.3.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-9.3.0/work/gcc-9.3.0/configure --host=aarch64-unknown-linux-gnu --build=aarch64-unknown-linux-gnu --prefix=/usr --bindir=/usr/aarch64-unknown-linux-gnu/gcc-bin/9.3.0 --includedir=/usr/lib/gcc/aarch64-unknown-linux-gnu/9.3.0/include --datadir=/usr/share/gcc-data/aarch64-unknown-linux-gnu/9.3.0 --mandir=/usr/share/gcc-data/aarch64-unknown-linux-gnu/9.3.0/man --infodir=/usr/share/gcc-data/aarch64-unknown-linux-gnu/9.3.0/info --with-gxx-include-dir=/usr/lib/gcc/aarch64-unknown-linux-gnu/9.3.0/include/g++-v9 --with-python-dir=/share/gcc-data/aarch64-unknown-linux-gnu/9.3.0/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 9.3.0 p2' --disable-esp --enable-libstdcxx-time --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --disable-multilib --disable-altivec --disable-fixed-point --enable-libgomp --disable-libmudflap --disable-libssp --disable-libada --disable-systemtap --enable-vtable-verify --enable-lto --without-isl --enable-default-pie --enable-default-ssp
Thread model: posix
gcc version 9.3.0 (Gentoo 9.3.0 p2)

# uname -r
4.9.0-4-arm64
Comment 19 Szabolcs Nagy 2020-05-04 13:14:28 UTC
(In reply to Sergei Trofimovich from comment #18)
> ...
> Sun 03 May 2020 10:46:55 AM UTC
> Didn't expect signal from child: got `Segmentation fault'
> Sun 03 May 2020 10:47:04 AM UTC
> double free or corruption (!prev)
> Didn't expect signal from child: got `Aborted'
> ...
> Sun 03 May 2020 10:50:54 AM UTC
> free(): invalid pointer
> Didn't expect signal from child: got `Aborted'
> ...
> Sun 03 May 2020 10:52:12 AM UTC
> tst-stack4: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top
> (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE &&
> prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)'
> failed.
> Didn't expect signal from child: got `Aborted'
> 
> Does it look like the same issue described here?

it can be related, hard to tell.
(your failures are consistently heap corruptions
detected in malloc/free, instead of dynamic tls
related state corruption)

if you can rebuild glibc try the patches from
comment 7 if they don't help then your issue
is different. (if the issue disappears we don't
know if the new barriers just masked your issue
or fixed them).
Comment 20 Sergei Trofimovich 2020-05-04 18:25:15 UTC
(In reply to Szabolcs Nagy from comment #19)
> (In reply to Sergei Trofimovich from comment #18)
> > ...
> > Sun 03 May 2020 10:46:55 AM UTC
> > Didn't expect signal from child: got `Segmentation fault'
> > Sun 03 May 2020 10:47:04 AM UTC
> > double free or corruption (!prev)
> > Didn't expect signal from child: got `Aborted'
> > ...
> > Sun 03 May 2020 10:50:54 AM UTC
> > free(): invalid pointer
> > Didn't expect signal from child: got `Aborted'
> > ...
> > Sun 03 May 2020 10:52:12 AM UTC
> > tst-stack4: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top
> > (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE &&
> > prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)'
> > failed.
> > Didn't expect signal from child: got `Aborted'
> > 
> > Does it look like the same issue described here?
> 
> it can be related, hard to tell.
> (your failures are consistently heap corruptions
> detected in malloc/free, instead of dynamic tls
> related state corruption)
> 
> if you can rebuild glibc try the patches from
> comment 7 if they don't help then your issue
> is different. (if the issue disappears we don't
> know if the new barriers just masked your issue
> or fixed them).

Tried patches from #comment7 on glibc-2.30. No failures after 100 test runs. Usually fails after 3-4 runs.
Comment 21 Jonny Grant 2020-09-09 22:12:15 UTC
running latest Ubuntu 20.04.1 LTS
$ ldd --version
ldd (Ubuntu GLIBC 2.31-0ubuntu9) 2.31

Just got this when launching google-chrome from the command line


Inconsistency detected by ld.so: ../elf/dl-tls.c: 481: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
Command exited with non-zero status 127


Could that assert be updated to give more information if you have another assert_str() macro that could be used.
Comment 22 Jonny Grant 2020-09-28 15:27:36 UTC
It's a bit surprising there is an assert() in a release build in Ubuntu. Usually assert() would be for a debug build.
Comment 23 Jonny Grant 2020-10-09 10:51:09 UTC
On my computer any C program compiled with assert(0) dumps a core file, but this glibc issue assert does not dump a core file. Is there an issue with this assert macro in glibc? The message output on the terminal is different from the standard macro



$ ./a
a: a.c:6: main: Assertion `0' failed.
Aborted (core dumped)


$ cat a.c
// gcc -Wall -o a a.c
#include <assert.h>

int main()
{
    assert(0);
}
Comment 24 Carlos O'Donell 2020-10-09 20:21:37 UTC
(In reply to Jonny Grant from comment #23)
> On my computer any C program compiled with assert(0) dumps a core file, but
> this glibc issue assert does not dump a core file. Is there an issue with
> this assert macro in glibc? The message output on the terminal is different
> from the standard macro

Please ask these questions on libc-help@sourceware.org where developers can help you with any issues you have trying to put together a reproducer.
Comment 25 lvying 2020-10-29 03:12:39 UTC
Hi, when I use Szabolcs Nagy's comment 2 comment 3, I got Assertion:
Inconsistency detected by ld.so: dl-tls.c: 517: _dl_allocate_tls_init: Assertion `listp != NULL' failed!
Also, I try this testcase pacth: https://patchwork.ozlabs.org/project/glibc/patch/5836CC80.9070101@arm.com/
After I put patch the patch into the glibc code, and run nptl testcase, I got:
../scripts/evaluate-test.sh nptl/tst-tls7 $? false false > /root/build/build/glibc/nptl/tst-tls7.test-result
Inconsistency detected by ld.so: dl-tls.c: 517: _dl_allocate_tls_init: Assertion `listp != NULL' failed!

Both of the testcases got different assertion, not the assertion:
Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' failed!

So, how to reproduce this problem to get the same failure.
And is there a plan to solve this problem? This problem has been going on for a long time.

Thanks!
Comment 26 lvying 2020-12-01 02:15:17 UTC
Update test result:
I use glibc2.28 source code to reproduce this problem:
1. testcase: Szabolcs Nagy's comment 2 comment 3
   result: Inconsistency detected by ld.so: dl-tls.c: 517: _dl_allocate_tls_init: Assertion `listp != NULL' failed!
   Not the same assertion
2. testcase: add Szabolcs Nagy's testcase v3 into nptl testcase:
https://patchwork.ozlabs.org/project/glibc/patch/5836CC80.9070101@arm.com/
    tst-tls7 result: same as No.1
3. testcase same as NO.2, However I add usleep(1000) between _dl_add_to_slotinfo and ++GL(dl_tls_generation) at file elf/dl-open.c.
   tst-tls7 result: same as No.1
   tst-stack4 can reliably reproduce this problem
Comment 27 Szabolcs Nagy 2020-12-24 16:59:46 UTC
i wrote down some more background before i resubmit my patches:
https://sourceware.org/pipermail/libc-alpha/2020-December/121090.html
Comment 28 Carlos O'Donell 2021-02-07 23:04:20 UTC
I still see this issue in 2.33 testing. I saw it recently for ppc64 BE testing.
Comment 29 Szabolcs Nagy 2021-02-17 16:00:15 UTC
i have a new patch set that includes a different fix for this bug:
https://sourceware.org/pipermail/libc-alpha/2021-February/122626.html

the new fix takes the dlopen lock at thread creation time instead
of just using atomics (which cannot work for fixing the same race
with dlclose: bug 27111).

using atomics is still necessary for tls access.

it will likely take a few review iterations to get this in glibc.
Comment 30 Sdrkun 2021-03-15 11:45:54 UTC
(In reply to Szabolcs Nagy from comment #29)
> i have a new patch set that includes a different fix for this bug:
> https://sourceware.org/pipermail/libc-alpha/2021-February/122626.html
> 
> the new fix takes the dlopen lock at thread creation time instead
> of just using atomics (which cannot work for fixing the same race
> with dlclose: bug 27111).
> 
> using atomics is still necessary for tls access.
> 
> it will likely take a few review iterations to get this in glibc.

Hi,Szabolcs, 

Do you know when will these patches be reviewed?
Their Delegate is still Nobody, https://patchwork.sourceware.org/project/glibc/list/?series=1673.
Comment 31 cvs-commit@gcc.gnu.org 2021-05-11 16:17:10 UTC
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1387ad6225c2222f027790e3f460e31aa5dd2c54

commit 1387ad6225c2222f027790e3f460e31aa5dd2c54
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Wed Dec 30 19:19:37 2020 +0000

    elf: Fix data races in pthread_create and TLS access [BZ #19329]
    
    DTV setup at thread creation (_dl_allocate_tls_init) is changed
    to take the dlopen lock, GL(dl_load_lock).  Avoiding data races
    here without locks would require design changes: the map that is
    accessed for static TLS initialization here may be concurrently
    freed by dlclose.  That use after free may be solved by only
    locking around static TLS setup or by ensuring dlclose does not
    free modules with static TLS, however currently every link map
    with TLS has to be accessed at least to see if it needs static
    TLS.  And even if that's solved, still a lot of atomics would be
    needed to synchronize DTV related globals without a lock. So fix
    both bug 19329 and bug 27111 with a lock that prevents DTV setup
    running concurrently with dlopen or dlclose.
    
    _dl_update_slotinfo at TLS access still does not use any locks
    so CONCURRENCY NOTES are added to explain the synchronization.
    The early exit from the slotinfo walk when max_modid is reached
    is not strictly necessary, but does not hurt either.
    
    An incorrect acquire load was removed from _dl_resize_dtv: it
    did not synchronize with any release store or fence and
    synchronization is now handled separately at thread creation
    and TLS access time.
    
    There are still a number of racy read accesses to globals that
    will be changed to relaxed MO atomics in a followup patch. This
    should not introduce regressions compared to existing behaviour
    and avoid cluttering the main part of the fix.
    
    Not all TLS access related data races got fixed here: there are
    additional races at lazy tlsdesc relocations see bug 27137.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 32 cvs-commit@gcc.gnu.org 2021-05-11 16:17:15 UTC
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f4f8f4d4e0f92488431b268c8cd9555730b9afe9

commit f4f8f4d4e0f92488431b268c8cd9555730b9afe9
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Wed Dec 30 19:19:37 2020 +0000

    elf: Use relaxed atomics for racy accesses [BZ #19329]
    
    This is a follow up patch to the fix for bug 19329.  This adds relaxed
    MO atomics to accesses that were previously data races but are now
    race conditions, and where relaxed MO is sufficient.
    
    The race conditions all follow the pattern that the write is behind the
    dlopen lock, but a read can happen concurrently (e.g. during tls access)
    without holding the lock.  For slotinfo entries the read value only
    matters if it reads from a synchronized write in dlopen or dlclose,
    otherwise the related dtv entry is not valid to access so it is fine
    to leave it in an inconsistent state.  The same applies for
    GL(dl_tls_max_dtv_idx) and GL(dl_tls_generation), but there the
    algorithm relies on the fact that the read of the last synchronized
    write is an increasing value.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 33 cvs-commit@gcc.gnu.org 2021-05-11 16:17:25 UTC
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=572bd547d57a39b6cf0ea072545dc4048921f4c3

commit 572bd547d57a39b6cf0ea072545dc4048921f4c3
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Thu Dec 31 13:59:38 2020 +0000

    elf: Fix DTV gap reuse logic [BZ #27135]
    
    For some reason only dlopen failure caused dtv gaps to be reused.
    
    It is possible that the intent was to never reuse modids for a
    different module, but after dlopen failure all gaps are reused
    not just the ones caused by the unfinished dlopened.
    
    So the code has to handle reused modids already which seems to
    work, however the data races at thread creation and tls access
    (see bug 19329 and bug 27111) may be more severe if slots are
    reused so this is scheduled after those fixes. I think fixing
    the races are not simpler if reuse is disallowed and reuse has
    other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are
    removed from the middle of the slotinfo list. The value does
    not have to be correct: incorrect true value causes the next
    modid query to do a slotinfo walk, incorrect false will leave
    gaps and new entries are added at the end.
    
    Fixes bug 27135.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 34 Szabolcs Nagy 2021-05-11 16:24:10 UTC
fixed for 2.34
Comment 35 xujing 2021-09-11 04:54:02 UTC
(In reply to cvs-commit@gcc.gnu.org from comment #31)
> The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:
> 
> https://sourceware.org/git/gitweb.cgi?p=glibc.git;
> h=1387ad6225c2222f027790e3f460e31aa5dd2c54
> 
> commit 1387ad6225c2222f027790e3f460e31aa5dd2c54
> Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
> Date:   Wed Dec 30 19:19:37 2020 +0000
> 
>     elf: Fix data races in pthread_create and TLS access [BZ #19329]
>     
>     DTV setup at thread creation (_dl_allocate_tls_init) is changed
>     to take the dlopen lock, GL(dl_load_lock).  Avoiding data races
>     here without locks would require design changes: the map that is
>     accessed for static TLS initialization here may be concurrently
>     freed by dlclose.  That use after free may be solved by only
>     locking around static TLS setup or by ensuring dlclose does not
>     free modules with static TLS, however currently every link map
>     with TLS has to be accessed at least to see if it needs static
>     TLS.  And even if that's solved, still a lot of atomics would be
>     needed to synchronize DTV related globals without a lock. So fix
>     both bug 19329 and bug 27111 with a lock that prevents DTV setup
>     running concurrently with dlopen or dlclose.
>     
>     _dl_update_slotinfo at TLS access still does not use any locks
>     so CONCURRENCY NOTES are added to explain the synchronization.
>     The early exit from the slotinfo walk when max_modid is reached
>     is not strictly necessary, but does not hurt either.
>     
>     An incorrect acquire load was removed from _dl_resize_dtv: it
>     did not synchronize with any release store or fence and
>     synchronization is now handled separately at thread creation
>     and TLS access time.
>     
>     There are still a number of racy read accesses to globals that
>     will be changed to relaxed MO atomics in a followup patch. This
>     should not introduce regressions compared to existing behaviour
>     and avoid cluttering the main part of the fix.
>     
>     Not all TLS access related data races got fixed here: there are
>     additional races at lazy tlsdesc relocations see bug 27137.
>     
>     Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

this patch use dl_load_lock in _dl_allocate_tls_init, is there a problem when dlopen a dynamic library which will call pthread_create? I think it will cause dl_load_lock and dl_load_lock dead lock.
Comment 36 xujing 2021-09-11 05:06:18 UTC
(In reply to xujing from comment #35)
> (In reply to cvs-commit@gcc.gnu.org from comment #31)
> > The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:
> > 
> > https://sourceware.org/git/gitweb.cgi?p=glibc.git;
> > h=1387ad6225c2222f027790e3f460e31aa5dd2c54
> > 
> > commit 1387ad6225c2222f027790e3f460e31aa5dd2c54
> > Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
> > Date:   Wed Dec 30 19:19:37 2020 +0000
> > 
> >     elf: Fix data races in pthread_create and TLS access [BZ #19329]
> >     
> >     DTV setup at thread creation (_dl_allocate_tls_init) is changed
> >     to take the dlopen lock, GL(dl_load_lock).  Avoiding data races
> >     here without locks would require design changes: the map that is
> >     accessed for static TLS initialization here may be concurrently
> >     freed by dlclose.  That use after free may be solved by only
> >     locking around static TLS setup or by ensuring dlclose does not
> >     free modules with static TLS, however currently every link map
> >     with TLS has to be accessed at least to see if it needs static
> >     TLS.  And even if that's solved, still a lot of atomics would be
> >     needed to synchronize DTV related globals without a lock. So fix
> >     both bug 19329 and bug 27111 with a lock that prevents DTV setup
> >     running concurrently with dlopen or dlclose.
> >     
> >     _dl_update_slotinfo at TLS access still does not use any locks
> >     so CONCURRENCY NOTES are added to explain the synchronization.
> >     The early exit from the slotinfo walk when max_modid is reached
> >     is not strictly necessary, but does not hurt either.
> >     
> >     An incorrect acquire load was removed from _dl_resize_dtv: it
> >     did not synchronize with any release store or fence and
> >     synchronization is now handled separately at thread creation
> >     and TLS access time.
> >     
> >     There are still a number of racy read accesses to globals that
> >     will be changed to relaxed MO atomics in a followup patch. This
> >     should not introduce regressions compared to existing behaviour
> >     and avoid cluttering the main part of the fix.
> >     
> >     Not all TLS access related data races got fixed here: there are
> >     additional races at lazy tlsdesc relocations see bug 27137.
> >     
> >     Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
> 
> this patch use dl_load_lock in _dl_allocate_tls_init, is there a problem
> when dlopen a dynamic library which will call pthread_create? I think it
> will cause dl_load_lock and dl_load_lock dead lock.

dlopen will hold on dl_load_lock, and pthread_create will call _dl_allocate_tls_init and then will hold on dl_load_lock. Finally, it will cause dead lock.
Comment 37 xujing 2021-09-11 10:44:37 UTC
(In reply to cvs-commit@gcc.gnu.org from comment #31)
> The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:
> 
> https://sourceware.org/git/gitweb.cgi?p=glibc.git;
> h=1387ad6225c2222f027790e3f460e31aa5dd2c54
> 
> commit 1387ad6225c2222f027790e3f460e31aa5dd2c54
> Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
> Date:   Wed Dec 30 19:19:37 2020 +0000
> 
>     elf: Fix data races in pthread_create and TLS access [BZ #19329]
>     
>     DTV setup at thread creation (_dl_allocate_tls_init) is changed
>     to take the dlopen lock, GL(dl_load_lock).  Avoiding data races
>     here without locks would require design changes: the map that is
>     accessed for static TLS initialization here may be concurrently
>     freed by dlclose.  That use after free may be solved by only
>     locking around static TLS setup or by ensuring dlclose does not
>     free modules with static TLS, however currently every link map
>     with TLS has to be accessed at least to see if it needs static
>     TLS.  And even if that's solved, still a lot of atomics would be
>     needed to synchronize DTV related globals without a lock. So fix
>     both bug 19329 and bug 27111 with a lock that prevents DTV setup
>     running concurrently with dlopen or dlclose.
>     
>     _dl_update_slotinfo at TLS access still does not use any locks
>     so CONCURRENCY NOTES are added to explain the synchronization.
>     The early exit from the slotinfo walk when max_modid is reached
>     is not strictly necessary, but does not hurt either.
>     
>     An incorrect acquire load was removed from _dl_resize_dtv: it
>     did not synchronize with any release store or fence and
>     synchronization is now handled separately at thread creation
>     and TLS access time.
>     
>     There are still a number of racy read accesses to globals that
>     will be changed to relaxed MO atomics in a followup patch. This
>     should not introduce regressions compared to existing behaviour
>     and avoid cluttering the main part of the fix.
>     
>     Not all TLS access related data races got fixed here: there are
>     additional races at lazy tlsdesc relocations see bug 27137.
>     
>     Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

Hi! I think this commit may cause an ABBA deadlock problem which i mentioned in https://sourceware.org/bugzilla/show_bug.cgi?id=28331.
Comment 38 Szabolcs Nagy 2021-09-12 01:23:47 UTC
(In reply to xujing from comment #35)
> (In reply to cvs-commit@gcc.gnu.org from comment #31)
> > commit 1387ad6225c2222f027790e3f460e31aa5dd2c54
> > Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
> > Date:   Wed Dec 30 19:19:37 2020 +0000
> > 
> >     elf: Fix data races in pthread_create and TLS access [BZ #19329]
> >     
> this patch use dl_load_lock in _dl_allocate_tls_init, is there a problem
> when dlopen a dynamic library which will call pthread_create? I think it
> will cause dl_load_lock and dl_load_lock dead lock.

the real bug is that ctors are run with the dlopen lock held.
that can causes deadlocks anyway (a ctor can create threads
and that thread can call dlopen). this is bug 15686 which is not
easy to fix, but that's the right solution. (in general, running
user callbacks while libc internal locks are held is wrong.)

that bug is now more exposed because the lock is also taken
at _dl_allocate_tls_init during thread creation. however i
expect that to be called in the parent thread only, so there
should be no deadlock when ctor calls pthread_create, only
when the child thread calls it again (which i considered rare).

if you have example code that you think should work but now
deadlocks, then please report it.
Comment 39 xujing 2021-09-13 02:50:09 UTC
(In reply to Szabolcs Nagy from comment #38)
> (In reply to xujing from comment #35)
> > (In reply to cvs-commit@gcc.gnu.org from comment #31)
> > > commit 1387ad6225c2222f027790e3f460e31aa5dd2c54
> > > Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
> > > Date:   Wed Dec 30 19:19:37 2020 +0000
> > > 
> > >     elf: Fix data races in pthread_create and TLS access [BZ #19329]
> > >     
> > this patch use dl_load_lock in _dl_allocate_tls_init, is there a problem
> > when dlopen a dynamic library which will call pthread_create? I think it
> > will cause dl_load_lock and dl_load_lock dead lock.
> 
> the real bug is that ctors are run with the dlopen lock held.
> that can causes deadlocks anyway (a ctor can create threads
> and that thread can call dlopen). this is bug 15686 which is not
> easy to fix, but that's the right solution. (in general, running
> user callbacks while libc internal locks are held is wrong.)
> 
> that bug is now more exposed because the lock is also taken
> at _dl_allocate_tls_init during thread creation. however i
> expect that to be called in the parent thread only, so there
> should be no deadlock when ctor calls pthread_create, only
> when the child thread calls it again (which i considered rare).
> 
> if you have example code that you think should work but now
> deadlocks, then please report it.

I'm sorry, I misled you. I think there is an ABBA deadlock issue in some scenarios.

If I have a c++ dynamic library(named libA.so) that contains a global object, the global object will call the post-constructor at initialization and hold it's own lock(named A_lock) when dlopen loads libA.so. Assume that two threads execute the following process:
    Thread1:dlopen(libA.so) => hold dl_load_lock => load libA.so => init global 
            object from libA.so => wait for hold A_lock
    Thread2:my own code hold A_lock => pthread_create => _dl_allocate_tls_init 
            => wait for hold dl_load_lock
In this case, an ABBA deadlock occurs. Is this a bug?

My stack looks like this:
Thread 1 (LWP 136013):
#0  0x00007f57a108510d in ?? () from /usr/lib64/libpthread.so.0
#1  0x00007f57a107e4d1 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0
#1  stack waiting for holding A_lock
...
#6  0x00007f5781c1bb8b in LogProcess::Init (strProcName=..., nProcHandle=nProcHandle@entry=0) at ./service/biz_frame/code/server/src/logging/logprocess.cpp:107
...
#20 0x00007f57a0fef21f in _dl_catch_exception () from /usr/lib64/libc.so.6
#21 0x00007f57a786442b in ?? () from /lib64/ld-linux-x86-64.so.2
#22 0x00007f57a3de2296 in ?? () from /usr/lib64/libdl.so.2
#23 0x00007f57a0fef21f in _dl_catch_exception () from /usr/lib64/libc.so.6
#24 0x00007f57a0fef2af in _dl_catch_error () from /usr/lib64/libc.so.6
#25 0x00007f57a3de2985 in ?? () from /usr/lib64/libdl.so.2
#26 0x00007f57a3de2351 in dlopen () from /usr/lib64/libdl.so.2
...
...
#38 0x00007f57a0fb3520 in clone () from /usr/lib64/libc.so.6

Thread 2 (LWP 134627):
#0  0x00007f57a108510d in ?? () from /usr/lib64/libpthread.so.0
#1  0x00007f57a107e580 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0
#2  0x00007f57a7863835 in _dl_allocate_tls_init () from /lib64/ld-linux-x86-64.so.2
#3  0x00007f57a107cb7c in pthread_create () from /usr/lib64/libpthread.so.0
...
#10 Stack holding A_lock
...
#14 0x0000561689e0d579 in main ()
Comment 40 Szabolcs Nagy 2021-09-13 08:25:14 UTC
(In reply to xujing from comment #39)
> I'm sorry, I misled you. I think there is an ABBA deadlock issue in some
> scenarios.
> 
> If I have a c++ dynamic library(named libA.so) that contains a global
> object, the global object will call the post-constructor at initialization
> and hold it's own lock(named A_lock) when dlopen loads libA.so. Assume that
> two threads execute the following process:
>     Thread1:dlopen(libA.so) => hold dl_load_lock => load libA.so => init
> global 
>             object from libA.so => wait for hold A_lock
>     Thread2:my own code hold A_lock => pthread_create =>
> _dl_allocate_tls_init 
>             => wait for hold dl_load_lock
> In this case, an ABBA deadlock occurs. Is this a bug?

yes i think this should work (it is a lock ordering
issue between a user and libc internal lock, which
is only possible if user code is run while a libc
lock is held)

note that if you replace pthread_create with dlopen
that deadlocks too. so it's still bug 15686. but it
may be more common than i expected. i think we need
to look at fixing that bug. (fixing the dynamic tls
race of this bug without locks in pthread_create is
very hard, so i don't think we can revert the quoted
patch)
Comment 41 Szabolcs Nagy 2021-09-16 14:42:37 UTC
i'm trying to fix the ABBA deadlock by introducing a new
lock into dlopen that protects tls state and gets released
before init functions are run. using that lock at thread
creation would fix the issue.

this only requires a small amount of changes, but it seems
to be difficult to ensure that the new lock is released on
all failure paths within _dl_open_worker (which sometimes
uses longjmp for error handling).
Comment 42 Szabolcs Nagy 2021-09-21 13:20:25 UTC
i opened bug 28357 for the ABBA deadlock in pthread_create.
Comment 43 cvs-commit@gcc.gnu.org 2021-10-04 14:12:03 UTC
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=83b5323261bb72313bffcf37476c1b8f0847c736

commit 83b5323261bb72313bffcf37476c1b8f0847c736
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Wed Sep 15 15:16:19 2021 +0100

    elf: Avoid deadlock between pthread_create and ctors [BZ #28357]
    
    The fix for bug 19329 caused a regression such that pthread_create can
    deadlock when concurrent ctors from dlopen are waiting for it to finish.
    Use a new GL(dl_load_tls_lock) in pthread_create that is not taken
    around ctors in dlopen.
    
    The new lock is also used in __tls_get_addr instead of GL(dl_load_lock).
    
    The new lock is held in _dl_open_worker and _dl_close_worker around
    most of the logic before/after the init/fini routines.  When init/fini
    routines are running then TLS is in a consistent, usable state.
    In _dl_open_worker the new lock requires catching and reraising dlopen
    failures that happen in the critical section.
    
    The new lock is reinitialized in a fork child, to keep the existing
    behaviour and it is kept recursive in case malloc interposition or TLS
    access from signal handlers can retake it.  It is not obvious if this
    is necessary or helps, but avoids changing the preexisting behaviour.
    
    The new lock may be more appropriate for dl_iterate_phdr too than
    GL(dl_load_write_lock), since TLS state of an incompletely loaded
    module may be accessed.  If the new lock can replace the old one,
    that can be a separate change.
    
    Fixes bug 28357.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 44 cvs-commit@gcc.gnu.org 2021-10-19 12:23:00 UTC
The release/2.34/master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=024a7640ab9ecea80e527f4e4d7f7a1868e952c5

commit 024a7640ab9ecea80e527f4e4d7f7a1868e952c5
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Wed Sep 15 15:16:19 2021 +0100

    elf: Avoid deadlock between pthread_create and ctors [BZ #28357]
    
    The fix for bug 19329 caused a regression such that pthread_create can
    deadlock when concurrent ctors from dlopen are waiting for it to finish.
    Use a new GL(dl_load_tls_lock) in pthread_create that is not taken
    around ctors in dlopen.
    
    The new lock is also used in __tls_get_addr instead of GL(dl_load_lock).
    
    The new lock is held in _dl_open_worker and _dl_close_worker around
    most of the logic before/after the init/fini routines.  When init/fini
    routines are running then TLS is in a consistent, usable state.
    In _dl_open_worker the new lock requires catching and reraising dlopen
    failures that happen in the critical section.
    
    The new lock is reinitialized in a fork child, to keep the existing
    behaviour and it is kept recursive in case malloc interposition or TLS
    access from signal handlers can retake it.  It is not obvious if this
    is necessary or helps, but avoids changing the preexisting behaviour.
    
    The new lock may be more appropriate for dl_iterate_phdr too than
    GL(dl_load_write_lock), since TLS state of an incompletely loaded
    module may be accessed.  If the new lock can replace the old one,
    that can be a separate change.
    
    Fixes bug 28357.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
    (cherry picked from commit 83b5323261bb72313bffcf37476c1b8f0847c736)