This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Removing longjmp error handling from the dynamic loader


* Carlos O'Donell:

>> In the current scheme, more localized error handling is problematic
>> because it has high syntactic overhead: You need to define a struct for
>> argument data and a separate function that receives the data, and pass
>> both to _dl_catch_error.  There also could be a performance overhead if
>> individual malloc calls were protected in this way because each call to
>> _dl_catch_error incurs a call to setjmp.
>
> Yes, Method 1 requires passing down all information encapsulated in a
> structure.
>
> Why would malloc calls be protected in this way?

You need a local handler if the pointer is not rooted in something that
is thrown away by the top-most handler, and you call a
potentially-throwing function between the allocation and the
deallocation.

>> I personally do not have a problem with exceptions and stack unwinding,
>> but if this is what we want, we should use a DWARF-based unwinder and
>> GCC's exception handling features (the limited support in the C front
>> end is probably sufficient).
>
> I agree.
>
> Let's call this "Method 2: C exceptions"
>
> For dlopen-et-al I think a DWARF-based unwinder would be great.

It requires moving the unwinder implementation from libgcc_s to libc,
though.  The last time we discussed this (related to unwinder
performance issues and the dl_iterate_phdr interface), this idea was not
well-received.

If we fix the init/fini and IFUNC resolver bugs the way I intend, we
will never have to unwind through user code before hitting the final
handler, so there is no problem with the availability of unwinding data.
So at least that part is entirely solvable.

> I assume you envision that exception handling in C would help cleanup
> the code and allow more locally visible cleanups to happen (unwinding
> state).

It requires a vastly different coding style.  I have not thought much
about it.  I don't think we should rely on a relatively obscure GNU
extension so prominently.  Or put differently, if we want to use RAII as
an implementation technique, I don't think we should use C.

>> The alternative to unwinding is an explicit struct dl_exception *
>> argument for functions which can fail, and use the return value to
>> indicate whether there was a fatal error.  This sometimes causes issues
>> where the return value is already used to signal both success and
>> non-fatal error (e.g., -1 for failure from open_verify, or NULL for
>> RTLD_NOLOAD failure from _dl_map_object_from_fd).
>
> No, if we're going to change it should be *towards* something where the
> compiler can help us get it right.
>
> I don't want to see an explicit "this" passed into every function by hand.
>
> Let's call this "Method 3: Explicit this"

The explicit argument really helps to spot places where unwinding
happens.  For example, it makes it pretty clear that you have a memory
leak if you call a function with such an argument and an allocation is
not rooted in a global data structure.

>> There is some impact on <dl-machine.h> because the relocation processing
>> needs to change.  We can convert the relocation processing first to the
>> new scheme and continue to signal any errors using longjmp in the
>> generic code.  But supporting twice the number of relocation APIs for
>> incremental conversion of targets will still be difficult.  I think we
>> are still looking at one fairly large patch, given the number of
>> architectures we support, although the changes should just be a few
>> dozen lines per architecture.
>
> Please expand on this a bit more.

Let's look at sysdeps/powerpc/powerpc64/dl-machine.h.  elf_machine_rela
calls _dl_reloc_overflow and _dl_reloc_bad_type, among other things, and
those throw.  This means that elf_machine_rela needs to be adjusted with
the explicit argument.  RESOLVE_MAP calls _dl_lookup_symbol_x behind the
scenes as well, so that can throw, too.

Is there any state we need to roll back locally in elf_machine_rela?
Probably not.  So this is one area where we do not benefit from the
explicit argument.

>> A third option is not use an explicit struct dl_exception * argument,
>> but a per-thread variable.  This will require changes to support TLS
>> (presumably the initial-exec variant) in the dynamic linker itself,
>> which is currently missing.  Since the exception pointer is only needed
>> in case of an error, using a TLS variable for it will avoid the overhead
>> of maintaining the explicit exception pointer argument and passing it
>> around.  Adding TLS support to the dynamic linker could be implemented
>> incrementally across architectures, but the conversion itself faces the
>> same flag day challenge as the explicit argument solution.  The explict
>> argument also ensures that places stick out where encoding fatal errors
>> in the return argument is difficult.  (A fourth option would compile the
>> dynamic linker twice and use the TLS-less version for the initial
>> loading of the executables and its dependencies.)
>
> I don't like either of these options.
>
> Let's call this "Method 3: TP explicit this"
>
> I won't give the 4th option a name :-}

Note that the current implementation already stores the dl_exception *
and the jump buffer for the error handler in thread-local storage.
That's why we need to compile the exception handling machinery twice.

Thanks,
Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]