Bug 13579 - do_lookup_x may access dangling memory
Summary: do_lookup_x may access dangling memory
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.15
: P2 normal
Target Milestone: 2.16
Assignee: Carlos O'Donell
URL: http://sourceware.org/ml/libc-alpha/2...
Keywords:
: 12871 13906 14035 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-01-10 07:41 UTC by Paul Pluzhnikov
Modified: 2016-05-16 16:38 UTC (History)
14 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2012-04-22 00:00:00
fweimer: security-


Attachments
Patch that Fedora uses (2.36 KB, patch)
2012-04-05 18:06 UTC, Andreas Jaeger
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Pluzhnikov 2012-01-10 07:41:43 UTC
This shows up as a crash in gnucash with glibc-2.15 on Precise Pangolin.
https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/893605

Confirmed present in current glibc git trunk.

Test:

/// --- cut --- foo.c ---
int foo () { return bar (); }

/// --- cut --- bar.c ---
int bar () { return 42; }

/// --- cut --- t.c ---
#include <stdio.h>
#include <dlfcn.h>

int main ()
{
  void *h = dlopen ("./foo.so", RTLD_LAZY|RTLD_GLOBAL);
  void *p = dlsym (h, "bar");

  printf ("h = %p, p = %p\n", h, p);

  dlclose (h);

  h = dlopen ("./foo.so", RTLD_LAZY|RTLD_GLOBAL);
  p = dlsym (h, "bar");
  printf ("h = %p, p = %p\n", h, p);

  return 0;
}


gcc -fPIC -shared -o bar.so bar.c &&
gcc -fPIC -shared -o foo.so foo.c ./bar.so &&
gcc t.c ./foo.so ./bar.so -ldl

valgrind ./a.out        # no errors with glibc-2.11

==16605== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==16605== Using Valgrind-3.8.0.SVN and LibVEX; rerun with -h for copyright info
==16605== Command: ./a.out
==16605== 
h = 0x4023b78, p = 0x503759c
==16605== Invalid read of size 8
==16605==    at 0x40093F6: do_lookup_x (/tmp/glibc-git/elf/dl-lookup.c:98)
==16605==    by 0x4009E4A: _dl_lookup_symbol_x (/tmp/glibc-git/elf/dl-lookup.c:739)
==16605==    by 0x5551305: do_sym (/tmp/glibc-git/elf/dl-sym.c:178)
==16605==    by 0x523A043: dlsym_doit (/tmp/glibc-git/dlfcn/dlsym.c:51)
==16605==    by 0x400E685: _dl_catch_error (/tmp/glibc-git/elf/dl-error.c:178)
==16605==    by 0x523A4DB: _dlerror_run (/tmp/glibc-git/dlfcn/dlerror.c:164)
==16605==    by 0x523A099: dlsym (/tmp/glibc-git/dlfcn/dlsym.c:71)
==16605==    by 0x400806: main (in /tmp/bug/a.out)
==16605==  Address 0x57e6098 is 40 bytes inside a block of size 72 free'd
==16605==    at 0x4C2C0EB: free (/valgrind-test/coregrind/m_replacemalloc/vg_replace_malloc.c:426)
==16605==    by 0x4011D21: _dl_scope_free (/tmp/glibc-git/elf/dl-scope.c:32)
==16605==    by 0x4013446: _dl_close_worker (/tmp/glibc-git/elf/dl-close.c:130)
==16605==    by 0x401407B: _dl_close (/tmp/glibc-git/elf/dl-close.c:779)
==16605==    by 0x400E685: _dl_catch_error (/tmp/glibc-git/elf/dl-error.c:178)
==16605==    by 0x523A4DB: _dlerror_run (/tmp/glibc-git/dlfcn/dlerror.c:164)
==16605==    by 0x523A00E: dlclose (/tmp/glibc-git/dlfcn/dlclose.c:48)
==16605==    by 0x4007DF: main (in /tmp/bug/a.out)
==16605== 
h = 0x4023b78, p = 0x503759c
==16605== 
==16605== HEAP SUMMARY:
==16605==     in use at exit: 0 bytes in 0 blocks
==16605==   total heap usage: 2 allocs, 2 frees, 200 bytes allocated
==16605== 
==16605== All heap blocks were freed -- no leaks are possible
==16605== 
==16605== For counts of detected and suppressed errors, rerun with: -v
==16605== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 2 from 2)



The bug may have been introduced here:

commit 4bff6e0175ed195871f4e01cc4c4c33274b8f6e3
Author: Andreas Schwab <schwab@redhat.com>
Date:   Fri Feb 25 20:49:48 2011 -0500

    Fix memory leak in dlopen with RTLD_NOLOAD.
Comment 1 Paul Pluzhnikov 2012-03-28 14:12:39 UTC
*** Bug 13906 has been marked as a duplicate of this bug. ***
Comment 2 Andreas Jaeger 2012-04-05 18:05:32 UTC
The same bug has been reported for openSUSE with glibc 2.11.3 which contains a backport of the commit.
I'm appending the patch that Fedora seems to be using
Comment 3 Andreas Jaeger 2012-04-05 18:06:22 UTC
Created attachment 6317 [details]
Patch that Fedora uses
Comment 4 law 2012-04-05 20:08:37 UTC
Andreas,

Thanks for pulling those bits out of larger glibc-fedora.patch.  One of my ongoing TODOs is to identify independent patches from glibc-fedora.patch and create distinct patch files for them (easier to track what needs to be pushed upstream).
Comment 5 Carlos O'Donell 2012-04-06 03:17:35 UTC
Is there a clear description of exactly the problem solved by the patch? This looks like a serious problem that we need to fix, but I have not seen a clear "What problem are your solving?" kind of description of the problem and how the fix relates.

It appears that the patch does:

* In _dl_close_worker we don't swap back the old l_orig_initfini maps (is this what was causing the access to dangling memory?) we made at startup, and we don't free the current l_initfini maps we were using (they will get freed elsewhere).

* In _dl_map_object_deps we immediately free the old l_initfini maps instead of saving them to l_orig_initfini. We also use a new flag l_free_initfini to mark that the maps were dynamically allocated and need to be free'd.

* Given that we free the old l_initfini immediately, we only have the current l_initfini to free when l_free_initfini is 1, and that is done in libc_freeres_fn.

This seems logical and the change looks correct.

However, what was *actually* wrong with the original implementation?

Was it the swapping back of the l_orig_initfini?

What is wrong with the old l_orig_initfini?
Comment 6 Andreas Jaeger 2012-04-06 06:59:07 UTC
Btw. to just fix the accessing of dangling memory, here's a simple (but broken) patch with a comment to explain the problem that the current implementation has:

===================================================================
--- glibc-2.11.3.orig/elf/dl-close.c    2011-05-27 15:08:23.000000000 +0200
+++ glibc-2.11.3/elf/dl-close.c 2011-07-13 19:28:52.000000000 +0200
@@ -127,7 +127,13 @@ _dl_close_worker (struct link_map *map)
            {
              struct link_map **oldp = map->l_initfini;
              map->l_initfini = map->l_orig_initfini;
-             _dl_scope_free (oldp);
+             /* We can't remove the l_initfini memory because
+                it's shared with l_searchlist.r_list.  We don't clear
+                the latter so when we dlopen this object again that
+                entry would point to stale memory.  And we don't want
+                to recompute it as it would involve a new call to
+                map_object_deps.
+             _dl_scope_free (oldp); */
            }
        }
 
This patch is broken since now oldp never gets freed and thus some tests fail.

The Fedora patch is AFAIK applying Andreas Schwab's initial patch that Ulrich Drepper changed ontop of Ulrich's change (thus adding Andreas' initial version)

Here's a link to the initial patch
http://sourceware.org/ml/libc-hacker/2011-02/msg00004.html
Comment 7 Andreas Jaeger 2012-04-06 07:02:34 UTC
Carlos, the Fedora patch reverts Ulrich's patch and adds Andreas' initial patch.

So, instead of looking what the Fedora patch does, I think it's better to look at Ulrich's patch and Andreas commit individually since they are two different solutions to the same problem. I don't know why Ulrich changed Andreas' patch.

Looking at the diff between Ulrich's and Andreas' patch is IMO not the best way to review this - but that's what the Fedora patch does (instead of reverting a patch and adding another).
Comment 8 Carlos O'Donell 2012-04-06 08:48:59 UTC
(In reply to comment #6)
> +             /* We can't remove the l_initfini memory because
> +                it's shared with l_searchlist.r_list.  We don't clear
> +                the latter so when we dlopen this object again that
> +                entry would point to stale memory.  And we don't want
> +                to recompute it as it would involve a new call to
> +                map_object_deps.
> +             _dl_scope_free (oldp); */

Thanks this is starting to explain the problem.

Why is l_searchlist.r_list sharing the memory of l_initfini, they each have nothing to do with eachother, other than the fact that they are both of type `struct link_map **`.

In dl-close.c I see this:
~~~
318           if (imap->l_searchlist.r_list == NULL && imap->l_initfini != NULL)
319             {
320               /* The object is still used.  But one of the objects we are
321                  unloading right now is responsible for loading it.  If
322                  the current object does not have it's own scope yet we
323                  have to create one.  This has to be done before running
324                  the finalizers.
325
326                  To do this count the number of dependencies.  */
327               unsigned int cnt;
328               for (cnt = 1; imap->l_initfini[cnt] != NULL; ++cnt)
329                 ;
330
331               /* We simply reuse the l_initfini list.  */
332               imap->l_searchlist.r_list = &imap->l_initfini[cnt + 1];
333               imap->l_searchlist.r_nlist = cnt;
334
335               new_list = &imap->l_searchlist;
336             }
~~~
Which doesn't make any sense to me. Why are we resusing l_initfini for what appears to be a completely orthogonal purpose?

However, in dl-deps.c(_dl_map_object_deps) we clearly get:
~~~
513   /* Store the search list we built in the object.  It will be used for
514      searches in the scope of this object.  */
515   struct link_map **l_initfini =
516     (struct link_map **) malloc ((2 * nlist + 1)
517                                  * sizeof (struct link_map *));
518   if (l_initfini == NULL)
519     _dl_signal_error (ENOMEM, map->l_name, NULL,
520                       N_("cannot allocate symbol search list"));
521
522
523   map->l_searchlist.r_list = &l_initfini[nlist + 1];
524   map->l_searchlist.r_nlist = nlist;
~~~
Which allocates a list of 2*nlist+1 size, and points l_searchlist.r_list into nlist+1 which is 2 beyond nlist pointers, and we see why it does this in a second...

Then we point l_initfini at the start of the memory block and terminate it:
~~~
687   /* Terminate the list of dependencies.  */
688   l_initfini[nlist] = NULL;
689   atomic_write_barrier ();
690   map->l_initfini = l_initfini;
~~~

So we have a contiguous allocation of pointers split in two with a NULL entry:

|A---------------|NULL|B-----------------|

Where map->l_initfini points to A [0,nlist], including NULL, it is nlist+1 pointers.

Where map->l_searchlist.r_list points to B [nlist+1,2*nlist+1] which is only nlist pointers.

Obviously freeing l_initfini is impossible unless you also clear l_searchlist.r_list, and then you'd have to recompute the r_list.

I need to think about this some more.
Comment 9 Andreas Jaeger 2012-04-06 11:27:15 UTC
Michael, could you comment a bit on this? You debugged this some time ago and did the patch in comment #6.
Comment 10 Carlos O'Donell 2012-04-21 20:36:58 UTC
*** Bug 12871 has been marked as a duplicate of this bug. ***
Comment 11 Carlos O'Donell 2012-04-22 19:26:54 UTC
Reproduced with 2.15 head and trunk.
Comment 12 Andreas Jaeger 2012-05-02 10:53:29 UTC
*** Bug 14035 has been marked as a duplicate of this bug. ***
Comment 13 Markus Trippelsdorf 2012-06-21 09:43:59 UTC
Any update on this issue? 
What is holding the obvious patch from Andreas back?
Comment 14 Carlos O'Donell 2012-06-21 13:31:40 UTC
(In reply to comment #13)
> Any update on this issue? 
> What is holding the obvious patch from Andreas back?

Sorry, I've been holding up the obvious patch with a lengthy review.

I've filed BZ#14274 for the actual cleanup.

At this point in the code freeze it is too risky to make the larger change.

I'm running some final tests on the smaller patch and I'll post to libc-alpha for final review, then we'll check it in for 2.16.
Comment 15 Andreas Jaeger 2012-06-21 13:59:29 UTC
The proposed patch is used by several distributions (including Fedora and openSUSE) without known issues for some time, so it has received good testing.
Comment 16 Carlos O'Donell 2012-06-22 20:55:42 UTC
This is now fixed for 2.16 by my commit 0479b305c5b7c8e3fa8e3002982cf8cac02b842e
~~~~
Fix invalid memory access in do_lookup_x.

[BZ #13579] Do not free l_initfini and allow it to be reused
on subsequent dl_open calls for the same library. This fixes
the invalid memory access in do_lookup_x when the previously
free'd l_initfini was accessed through l_searchlist when a
library had been opened for the second time.
~~~

http://sourceware.org/git/?p=glibc.git;a=commit;h=0479b305c5b7c8e3fa8e3002982cf8cac02b842e

Filed BZ #14284 for the backport request.
Comment 17 Yogesh Gaur 2013-02-12 01:36:08 UTC
I know this issue is closed, but for someone who needs to see the exact test case using which how this bug is reproduced by simple C test code, please find below simple C test case:
***************************** Source Code ***********************************
yogesh$ cat lib1.c
#include <stdio.h>

int lib1_func()
{
        return lib2_func();
}
----------------------------------------------
yogesh$ cat lib2.c
#include <stdio.h>

int lib2_func()
{
        return 10;
}
----------------------------------------------
yogesh$ cat main.c
#include <stdio.h>
#include <dlfcn.h>
#include <pthread.h>

void *handle;

static void *thread_abc()
{
        handle = dlopen ("./lib1.so", RTLD_LAZY | RTLD_GLOBAL);
        void *func = dlsym (handle, "lib2_func");
        printf ("<thread_abc> Handle:%p, func:%p \n", handle, func);
        dlclose (handle);
        return NULL;
}

static void *thread_xyz()
{
        handle = dlopen ("./lib1.so", RTLD_LAZY | RTLD_GLOBAL);
        void *func = dlsym (handle, "lib2_func");
        printf ("<thread_xyz> Handle:%p, func:%p \n", handle, func);
        dlclose (handle);
        return NULL;
}

int main()
{
        pthread_t abc_arr[1000], xyz_arr[1000];
        int i=0;
        handle = dlopen ("./lib1.so", RTLD_LAZY | RTLD_GLOBAL);
        void *func = dlsym (handle, "lib2_func");
        printf ("<main> Handle:%p, func:%p \n", handle, func);
        for (i=0;i<10;i++)
        {
                pthread_create(&abc_arr[i], NULL, thread_abc, NULL);
                pthread_create(&xyz_arr[i], NULL, thread_xyz, NULL);
        }

        printf ("<main> Handle:%p, func:%p \n", handle, func);
        dlclose (handle);

        for (i=0;i<1000;i++)
        {
                pthread_create(&abc_arr[i], NULL, thread_abc, NULL);
                pthread_create(&xyz_arr[i], NULL, thread_xyz, NULL);
        }
        for (i=0;i<10;i++)
        {
                pthread_join(abc_arr[i], NULL);
                pthread_join(xyz_arr[i], NULL);
        }
        printf ("Returning from main\n");
        return 0;
}
************************** Compilation steps *********************
gcc -g -fPIC -shared -o lib2.so lib2.c &&				
gcc -g -fPIC -shared -o lib1.so lib1.c ./lib2.so &&	
gcc -g main.c ./lib1.so ./lib2.so -ldl -lpthread	
*******************************************************************

With the above test case this issue is 100% reproducible.
Comment 18 Bharath H S 2013-02-21 11:45:49 UTC
(In reply to comment #17)
> I know this issue is closed, but for someone who needs to see the exact test
> case using which how this bug is reproduced by simple C test code, please find
> below simple C test case:
> ***************************** Source Code ***********************************
> yogesh$ cat lib1.c
> #include <stdio.h>
> 
> int lib1_func()
> {
>         return lib2_func();
> }
> ----------------------------------------------
> yogesh$ cat lib2.c
> #include <stdio.h>
> 
> int lib2_func()
> {
>         return 10;
> }
> ----------------------------------------------
> yogesh$ cat main.c
> #include <stdio.h>
> #include <dlfcn.h>
> #include <pthread.h>
> 
> void *handle;
> 
> static void *thread_abc()
> {
>         handle = dlopen ("./lib1.so", RTLD_LAZY | RTLD_GLOBAL);
>         void *func = dlsym (handle, "lib2_func");
>         printf ("<thread_abc> Handle:%p, func:%p \n", handle, func);
>         dlclose (handle);
>         return NULL;
> }
> 
> static void *thread_xyz()
> {
>         handle = dlopen ("./lib1.so", RTLD_LAZY | RTLD_GLOBAL);
>         void *func = dlsym (handle, "lib2_func");
>         printf ("<thread_xyz> Handle:%p, func:%p \n", handle, func);
>         dlclose (handle);
>         return NULL;
> }
> 
> int main()
> {
>         pthread_t abc_arr[1000], xyz_arr[1000];
>         int i=0;
>         handle = dlopen ("./lib1.so", RTLD_LAZY | RTLD_GLOBAL);
>         void *func = dlsym (handle, "lib2_func");
>         printf ("<main> Handle:%p, func:%p \n", handle, func);
>         for (i=0;i<10;i++)
>         {
>                 pthread_create(&abc_arr[i], NULL, thread_abc, NULL);
>                 pthread_create(&xyz_arr[i], NULL, thread_xyz, NULL);
>         }
> 
>         printf ("<main> Handle:%p, func:%p \n", handle, func);
>         dlclose (handle);
> 
>         for (i=0;i<1000;i++)
>         {
>                 pthread_create(&abc_arr[i], NULL, thread_abc, NULL);
>                 pthread_create(&xyz_arr[i], NULL, thread_xyz, NULL);
>         }
>         for (i=0;i<10;i++)
>         {
>                 pthread_join(abc_arr[i], NULL);
>                 pthread_join(xyz_arr[i], NULL);
>         }
>         printf ("Returning from main\n");
>         return 0;
> }
> ************************** Compilation steps *********************
> gcc -g -fPIC -shared -o lib2.so lib2.c &&                
> gcc -g -fPIC -shared -o lib1.so lib1.c ./lib2.so &&    
> gcc -g main.c ./lib1.so ./lib2.so -ldl -lpthread    
> *******************************************************************
> 
> With the above test case this issue is 100% reproducible.

Hi yogesh,
With this application in your comment, the issue reported in this bug does not repro in stock eglibc-2.15 from svn 
svn co svn://svn.eglibc.org/branches/eglibc-2_15 eglibc-2.15
Comment 19 cvs-commit@gcc.gnu.org 2014-01-30 18:07:30 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.15/master has been updated
       via  1ba48eb07a72690406c0ffda642a963c88639752 (commit)
      from  e8b5394afb420449dde0b4cbefd4032936d96a25 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1ba48eb07a72690406c0ffda642a963c88639752

commit 1ba48eb07a72690406c0ffda642a963c88639752
Author: Andreas Schwab <schwab@redhat.com>
Date:   Fri Jun 22 11:10:31 2012 -0700

    Fix invalid memory access in do_lookup_x.
    
    [BZ #13579] Do not free l_initfini and allow it to be reused
    on subsequent dl_open calls for the same library. This fixes
    the invalid memory access in do_lookup_x when the previously
    free'd l_initfini was accessed through l_searchlist when a
    library had been opened for the second time.
    
    (cherry picked from commit 0479b305c5b7c8e3fa8e3002982cf8cac02b842e)

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog      |   11 +++++++++++
 NEWS           |    3 ++-
 elf/dl-close.c |   15 +++------------
 elf/dl-deps.c  |    7 ++++---
 elf/dl-libc.c  |    9 ++++++---
 elf/rtld.c     |    2 ++
 include/link.h |    8 ++++----
 7 files changed, 32 insertions(+), 23 deletions(-)
Comment 20 Jackie Rosen 2014-02-16 19:20:32 UTC Comment hidden (spam)