This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Invalid program counters and unwinding
- From: Florian Weimer <fweimer at redhat dot com>
- To: GCC <gcc at gcc dot gnu dot org>, GNU C Library <libc-alpha at sourceware dot org>, Binutils <binutils at sourceware dot org>, gnu-gabi at sourceware dot org
- Date: Tue, 26 Jun 2018 11:26:16 +0200
- Subject: Invalid program counters and unwinding
I'm looking at ways to speed up _Unwind_Find_FDE when libgcc is running
on top of glibc. I have something (at the design level, with some of
the code written) which allows me to get a pointer to the
PT_GNU_EH_FRAME segment in memory in a lock-free fashion (so it would
also be async-signal safe).
This part works also when the program counter used in the search is
invalid and does not point to within a loaded object, even in the case
of concurrent dlopen/dlclose.
However, it's still necessary to read the PT_GNU_EH_FRAME data itself,
and if _Unwind_Find_FDE is not a valid program counter found on the
stack (with in a caller, where unmapping it with dlclose would be
invalid), it could happen that it is a random address in *another*,
unrelated object, which then gets dlclose'd (which is valid).
The current glibc-based implementation in libgcc calls dl_iterate_phdr,
which acquires a lock blocking dlclose for the entire duration of the
iteration. But I think this still doesn't support arbitrary, random PC
values because in the worst case, the PC value looks valid, we find some
unrelated FDE data with an associated personality routine, and end up
calling that, with disastrous consequences.
So it looks to me that the caller of _Unwind_Find_FDE needs to ensure
that the PC is a valid element of the call stack. Is this a correct
assumption?
I have some ideas how make reading the PT_GNU_EH_FRAME data safe, but
the question is whether we actually need that.
Previous discussions:
https://gcc.gnu.org/ml/gcc/2013-05/msg00253.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
https://sourceware.org/ml/libc-alpha/2016-07/msg00613.html
(patch with a spread lock, still not async-signal-safe)
Thanks,
Florian