Bug 16585 - dlsym() shouldn't be declared as leaf
Summary: dlsym() shouldn't be declared as leaf
Status: RESOLVED WONTFIX
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.18
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-14 14:40 UTC by Stefan Seefeld
Modified: 2014-06-13 08:15 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Seefeld 2014-02-14 14:40:33 UTC
We are using a preload library with approximately this code:

  static struct mystruct *state;
  ...
  void foo()
  {
    // access 'state' here
  }

  void bar()
  {
    state = ...; // (1)
    dlsym(...);  // (2)
    state = ...; // (3)
  }

The compiler appears to optimize away statement (1), thinking that it is redundant. However, in reality dlsym(...) implicitly calls 'foo()' above (we use our own malloc wrappers...), which reads 'state', and consequently crashes the application since the attempted initialization of the variable has disappeared.
The reason is that dlsym() is declared "leaf" (by means of __THROW).

Changing the dlsym() declaration from __THROW to __THROWNL should fix this.
Comment 1 Carlos O'Donell 2014-02-14 15:50:55 UTC
(In reply to Stefan Seefeld from comment #0)
> We are using a preload library with approximately this code:
> 
>   static struct mystruct *state;
>   ...
>   void foo()
>   {
>     // access 'state' here
>   }
> 
>   void bar()
>   {
>     state = ...; // (1)
>     dlsym(...);  // (2)
>     state = ...; // (3)
>   }
> 
> The compiler appears to optimize away statement (1), thinking that it is
> redundant. However, in reality dlsym(...) implicitly calls 'foo()' above (we
> use our own malloc wrappers...), which reads 'state', and consequently
> crashes the application since the attempted initialization of the variable
> has disappeared.
> The reason is that dlsym() is declared "leaf" (by means of __THROW).
> 
> Changing the dlsym() declaration from __THROW to __THROWNL should fix this.

Would this not generate worse code for other applications calling dlsym(...) and not modifying state in the current compilation unit?

Why isn't the solution to mark state as volatile?
Comment 2 Stefan Seefeld 2014-02-14 16:07:28 UTC
On 02/14/2014 10:50 AM, carlos at redhat dot com wrote:

>> Changing the dlsym() declaration from __THROW to __THROWNL should fix this.
> 
> Would this not generate worse code for other applications calling dlsym(...)
> and not modifying state in the current compilation unit?
> 
> Why isn't the solution to mark state as volatile?

I understand that this will pessimise performance, and I also understand
that situations where dlsym() (and co.) will actually cause side-effects
are rare. Still, declaring it "leaf" when it really isn't seems wrong.
Marking the state variable as volatile indeed works around the issue,
but doesn't protect the bug from coming up again when the next person
runs into a similar situation.

The solution we have come up actually consists of using our own
(forward-)declaration of dlsym() that doesn't use the leaf attribute,
thus preventing the compiler from optimizing quite so aggressively.

This obviously only makes sense if the caller of dlsym() is in control
over the functions dlsym() calls internally, or otherwise he couldn't
make any statements about its leafness.

So, I would suggest a viable approach that wouldn't pessimize the
typical use of dlsym() is to document

a) the functions dlsym() (as well as dlopen(), dlerror(), etc.) might call

b) provide a formal mechanism by which users can turn off its leaf-ness



	Stefan
Comment 3 Alexander Monakov 2014-02-14 16:27:19 UTC
I don't think dlsym() is a function you'd see on a hot path (thus requiring "leaf" for performance [1]), or called in very many different places (thus requiring "leaf" for code size reduction).  Is having that attribute there really worth it?

[1] and beside that, gains from "leaf" optimization are quite small compared to the amount of work dlsym does
Comment 4 Jakub Jelinek 2014-02-14 16:34:08 UTC
How would that help?  For malloc wrappers like this, you would have to not use leaf attribute for any functin that might call malloc, that is obviously undesirable.  Just use volatile or put your malloc wrapper with all the related variables into another translation unit and don't inspect it directly, just through calls or similar.  Of course for LTO then you really have to just mark it volatile.
Comment 5 Jakub Jelinek 2014-02-14 16:37:40 UTC
Note, even malloc itself is leaf, both as gcc builtin and in glibc headers, and for generating good code it is very much desirable to keep it as is.

Sure, it makes it harder to malloc abuses like this, but only very few packages actually do that, and simply have to be prepared to do extra work so that the common case can generate better code.
Comment 6 Alexander Monakov 2014-02-14 16:54:36 UTC
It would be nice if this at least would be properly documented somewhere, LTTng may not be the last community running into this problem.

Let me also add that overriding malloc is not the only way the application can observe non-leafness of dlsym.  Another possibility is via IFUNC: dlsym will trigger the resolver function.
Comment 7 Rich Felker 2014-02-14 18:26:36 UTC
It seems like a bug to me that dlsym is calling malloc. Whatever resources it might need should already have been allocated as part of dlopen. The IFUNC issue potentially remains, but really there should be formal restrictions on what functions IFUNC resolvers are allowed to call; without that, I don't think any code can be safe in the presence of IFUNCs unless the dynamic linker resolves them all at load time rather than lazily (BTW, this would probably be a good idea.)

As for working around the issue reported here, an alternate approach is calling dlsym via a volatile function pointer with non-leaf type. This will ensure that the compiler cannot optimize based on the leaf property.

Still, I agree dlsym is not a hot path, so the leaf property should probably just be removed. See issue #14989 for details on why dlsym is _required_ to be slow and how glibc is non-conforming in this regard.
Comment 8 Carlos O'Donell 2014-02-14 22:09:06 UTC
(In reply to Rich Felker from comment #7)
> It seems like a bug to me that dlsym is calling malloc. Whatever resources
> it might need should already have been allocated as part of dlopen.

The error string is allocated dynamically (we might convert _dl_signal_error and build the error list at dlopen time, but that seems like a waste of time and resources if you never fail). The debug printing might also call malloc with LD_DEBUG set. There is one more case that might call calloc related to prelinking. It is non-trivial to clean all of this up and I don't see any immediate benefit.

> The
> IFUNC issue potentially remains, but really there should be formal
> restrictions on what functions IFUNC resolvers are allowed to call; without
> that, I don't think any code can be safe in the presence of IFUNCs unless
> the dynamic linker resolves them all at load time rather than lazily (BTW,
> this would probably be a good idea.)
> As for working around the issue reported here, an alternate approach is
> calling dlsym via a volatile function pointer with non-leaf type. This will
> ensure that the compiler cannot optimize based on the leaf property.

IFUNCs are AFAIK never processed lazily, the IRELATIVE relocs in the ports I've looked at are all handled at startup.
 
> Still, I agree dlsym is not a hot path, so the leaf property should probably
> just be removed. 

I disagree. Unless there is a strong use case I suggest leaving dlsym marked as leaf. Worse is that if Stefan changes his code tomorrow he might need more functions with the leaf markers removed, and thus it isn't a complete solution.

Therefore I can only recommend to mark the data as volatile or move it out of the translation unit.

> See issue #14989 for details on why dlsym is _required_ to
> be slow and how glibc is non-conforming in this regard.

Bug 14989 is a distinct issue and we might be able to change the standard, and if we can't we might actually decide to mark dlsym as "!posix" in the new safety notes and explain why. I don't see that the POSIX requirement is all that useful. If I've learned anything we should crash immediately on the second dlclose() so users can debug the issue.

(In reply to Alexander Monakov from comment #6)
> It would be nice if this at least would be properly documented somewhere,
> LTTng may not be the last community running into this problem.
> 
> Let me also add that overriding malloc is not the only way the application
> can observe non-leafness of dlsym.  Another possibility is via IFUNC: dlsym
> will trigger the resolver function.

I'm happy to review manual/* paches to add this information. I don't know where we would put this information. Please look through the manual and suggest something. Perhaps in the introduction of the chapter on memory allocation routines (manual/memory.texi). Send the patch to libc-alpha@sourceware.org.

In the meantime I'm marking this as RESOLVED/WONTFIX until we have more information about the use cases. Feel free to reopen.
Comment 9 Alexander Monakov 2014-02-15 08:32:14 UTC
Carlos, if you agree that the documentation can be adjusted to explicitely mention this, can the summary be changed accordingly (suggest: "document that leaf attribute assumes no interposition", or more generally "document constraints for LD_PRELOAD interposers") and the bug reopened?

This:

> IFUNCs are AFAIK never processed lazily, the IRELATIVE relocs in the ports I've looked at are all handled at startup.

is confusing, IFUNC and IRELATIVE are not the same.  IFUNCs are resolved lazily, and moreover, repeated calls to dlsym trigger the resolver repeatedly.  Here's a testcase:

$ cat test.c 
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>

static int counter;

void bar(void) __attribute__((ifunc("bar_resolver")));

static void *bar_resolver(void)
{
  printf("bar_resolver(): counter == %d\n", counter);
  return NULL;
}

int main(void)
{
  counter++;
  dlsym(RTLD_DEFAULT, "bar");
  counter++;
  dlsym(RTLD_DEFAULT, "bar");
  counter++;
  return counter;
}
$ gcc -O test.c -ldl -rdynamic 
$ ./a.out ; echo $?
bar_resolver(): counter == 0
bar_resolver(): counter == 0
3
Comment 10 Carlos O'Donell 2014-02-21 18:24:58 UTC
(In reply to Alexander Monakov from comment #9)
> Carlos, if you agree that the documentation can be adjusted to explicitely
> mention this, can the summary be changed accordingly (suggest: "document
> that leaf attribute assumes no interposition", or more generally "document
> constraints for LD_PRELOAD interposers") and the bug reopened?

Sounds good to me.
 
> This:
> 
> > IFUNCs are AFAIK never processed lazily, the IRELATIVE relocs in the ports I've looked at are all handled at startup.
> 
> is confusing, IFUNC and IRELATIVE are not the same.  IFUNCs are resolved
> lazily, and moreover, repeated calls to dlsym trigger the resolver
> repeatedly.  Here's a testcase:

Sorry, that's implementor speak.

STT_GNU_IFUNC is the symbol type and the relocation used is R_*_IRELATIVE.

The IRELATIVE relocations are, AFAIK, processed upfront.

> $ cat test.c 
> #define _GNU_SOURCE
> #include <dlfcn.h>
> #include <stdio.h>
> 
> static int counter;
> 
> void bar(void) __attribute__((ifunc("bar_resolver")));
> 
> static void *bar_resolver(void)
> {
>   printf("bar_resolver(): counter == %d\n", counter);
>   return NULL;
> }
> 
> int main(void)
> {
>   counter++;
>   dlsym(RTLD_DEFAULT, "bar");
>   counter++;
>   dlsym(RTLD_DEFAULT, "bar");
>   counter++;
>   return counter;
> }
> $ gcc -O test.c -ldl -rdynamic 
> $ ./a.out ; echo $?
> bar_resolver(): counter == 0
> bar_resolver(): counter == 0
> 3

This is not an indication of lazy resolution in the sense of relocation processing.

The dlsym code does re-run the resolver function just to ensure that it is using the most up to date version of the function that should be called.

If you change the code slightly to this:
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>

static int counter;

void bar(void) __attribute__((ifunc("bar_resolver")));

void foo(void)
{
  printf("foo!\n");
}

static void *bar_resolver(void)
{
  printf("bar_resolver(): counter == %d\n", counter);
  return foo;
}

int main(void)
{
  printf ("In main...\n");
  bar();
  counter++;
  dlsym(RTLD_DEFAULT, "bar");
  counter++;
  dlsym(RTLD_DEFAULT, "bar");
  counter++;
  return counter;
}

bar_resolver(): counter == 0
In main...
foo!
bar_resolver(): counter == 1
bar_resolver(): counter == 2

You see the IRELATIVE reloc is resolved non-lazily at startup (running the resolver function to resolve the reocation) before passing control to the application. Later the use of dlsym forces a re-running of the resolver to ensure it is correct before potentially being called through a function pointer.

WARING: The IFUNC resolver restrictions are pretty tight and while the loader tries hard to ensure you can do as much as possible in the resolver, the community hasn't yet worked out exactly what is and is not allowed. Since it happens non-lazily at startup you might get in trouble referencing other objects not yet relocated.

Cheers,
Carlos.