Bug 25993 - Bug in bfd causes crashes with DXVK
Summary: Bug in bfd causes crashes with DXVK
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.34
: P2 normal
Target Milestone: 2.35
Assignee: Alan Modra
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-15 05:32 UTC by William Pierce
Modified: 2020-06-04 03:57 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2020-05-17 00:00:00


Attachments
Logs from running valgrind on ld during DXVK compilation (13.57 KB, application/gzip)
2020-05-15 05:32 UTC, William Pierce
Details
Proposed patch (612 bytes, patch)
2020-05-15 10:45 UTC, Nick Clifton
Details | Diff
Proposed patch (531 bytes, patch)
2020-05-15 13:01 UTC, Nick Clifton
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description William Pierce 2020-05-15 05:32:32 UTC
Created attachment 12539 [details]
Logs from running valgrind on ld during DXVK compilation

I bisected a bug in DXVK to be created and fixed by two commits to bfd.

DXVK transforms DX9-11 into Vulkan calls and is compiled as a .dll to be
used with WINE. This is DXVK's homepage: https://github.com/doitsujin/dxvk

DXVK uses mingw-w64-binutils to compile into a .dll

The mingw-w64-binutils I tested with originates from here:
https://aur.archlinux.org/packages/mingw-w64-binutils
That gets built using a PKGBUILD script from the ArchLinux AUR:
https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=mingw-w64-binutils

The build process from that boils down to

_targets="i686-w64-mingw32 x86_64-w64-mingw32"
  for _target in $_targets; do
    msg "Building ${_target} cross binutils"
    mkdir -p "$srcdir"/binutils-${_target} && cd "${srcdir}/binutils-${_target}"
    "$srcdir"/binutils-gdb/configure --prefix=/usr \
        --target=${_target} \
        --infodir=/usr/share/info/${_target} \
        --enable-lto --enable-plugins \
        --enable-deterministic-archives \
        --disable-multilib --disable-nls \
        --disable-werror
     make


I bisected the bug with binutils in the DXVK issue here:
https://github.com/doitsujin/dxvk/issues/1625
The bug manifests as WINE encountering a page fault at address 0x0, or
executing illegal instructions, or a few other manifestations with
different symptoms when DXVK is compiled with mingw-w64-binutils 2.34.
mingw-w64-binutils 2.33 does not have this bug.
We think the issue is that the linker makes DXVK jump to a bogus
address and start executing garbage.

I bisected the bug to be introduced by binutils commit
ea933f17c3c6b9fa1daf8d03baa34f7bec855d6c: "Release bfd_alloc memory in
bfd_check_format_matches" between the branch points for 2.33 and 2.34

I bisected the bug to be fixed in at least one case by binutils commit
b03202e32c8235997b3485b0b4655926ad97a1cc: "bfd_get_size cache"
between 2.34 branch point and master.
Note that also in this bisection between 2.34 and master, there was a
point where the reproduction hung with this in the WINE log instead of
crashing outright:
49801.150:00c0:00c4:warn:seh:setup_exception_record exception outside of stack limits in thread 00c4 eip 00314be2 esp c79140f7 stack 0x222000-0x320000
49801.150:00c0:00c4:err:seh:setup_exception_record nested exception on signal stack in thread 00c4 eip 7bce9d9d esp 7ffdb930 stack 0x222000-0x320000

While these commits do both touch bfd, they don't touch the same file.
Maybe the bug is only perturbed by the newer commit (or maybe my bisect
is slightly off). Also, the newer commit isn't trying to address the
bug caused by the original commit, so maybe the bug could be
reintroduced anytime.

I was asked to run valgrind on ld to see if there are any memory errors.
I did this on both commit ea933f17c3c6b9fa1daf8d03baa34f7bec855d6c that
introduced the problem and the commit right before it to see if there's
any difference. I ran it while compiling dxvk by replacing the ld binary
with a shell script that does this:

exec valgrind -s --log-file=/path/to/logs/ld-%p.log /usr/bin/i686-w64-mingw32-ld.bfd.bak "$@"

where /usr/bin/i686-w64-mingw32-ld.bfd.bak is the real ld that would be used.
Both seem to report I believe the same number of use-after-free errors,
so I think the bug got perturbed in different ways by these two commits
to effectively introduce and get rid of the error.

I've attached these logs for the build for the two commits.
In particular, I observed the repro happen in the 32 bit d3d9.dll build,
but both valgrinds outputs are the same. In the attached logs, they are
logs-f24bdec48621f419fdc9dcd58f46891f062b7bc0/ld-102345.log
logs-ea933f17c3c6b9fa1daf8d03baa34f7bec855d6c/ld-183630.log
Comment 1 William Pierce 2020-05-15 08:29:10 UTC
The files in my previous comment ran the build commands to build binutils using no extra compiler or linker flags. In this case, I actually found that I couldn't  reproduce the problem! So the logs I provided in the first comment 
may not be useful.

The default flags to use when building in the distribution I use are

CPPFLAGS="-D_FORTIFY_SOURCE=2"
CFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fno-plt"
CXXFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fno-plt"
LDFLAGS="-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now"

I found if I disable all the CPPFLAGS/CFLAGS/CXXFLAGS and leave only the LDFLAGS, then I can reproduce the issue. (I did not try out disabling other
combinations of flags.)

I tried with a few subsets of the LDFLAGS flags like
LDFLAGS="-Wl,-O1"
or
LDFLAGS="-Wl,--sort-common,--as-needed"

and could reproduce it with either of those as well (but maybe not some other
combinations).

Any advice on how to debug further would be appreciated. I could pare down the
flags more or diff the resulting binaries in the actual good/bad case.
Comment 2 Nick Clifton 2020-05-15 10:45:27 UTC
Created attachment 12542 [details]
Proposed patch

Hi William,

  This definitely looks like a memory corruption bug.

  Based upon the valgrind logs I have a possible patch - please could you try it it out and let me know if it works ?

Cheers
  Nick
Comment 3 Alan Modra 2020-05-15 12:16:33 UTC
Hi Nick, if it really is the case that is->filename == is->the_bfd->filename then it looks to me that your patch isn't sufficient.  You've avoided accessing is->filename for the strlen, but you'll do so a little later in sprintf..

BTW, the latest sources seem to strdup anything stored in bfd->filename, at least for code that goes by any of the functions in bfd/opncls.c
Comment 4 Nick Clifton 2020-05-15 13:01:01 UTC
Created attachment 12545 [details]
Proposed patch

Alan is of course totally correct.  So here is an alternative patch.  I suspect however that it will not be enough and that more analysis will be needed.

Is there a simple, standalone way to reproduce the problem ?
Comment 5 William Pierce 2020-05-16 07:52:36 UTC
I think the repro crash in DXVK could be perturbed in a few different ways, and my previous comment was running into a few different ways it could be perturbed. The crash also doesn't repro if I run ld under valgrind while building. Though for a memory corruption, these could make sense.

Nick, your patch completely fixes the problem for me.

I tested the patch being very careful to not perturb the setup in wrong way as I had seen. I very clearly saw the crash repro without your patch applied and not repro with it applied.

I also tried out the build running with ld under valgrind with and without your patch. Without your patch, there are many errors (but, somehow I don't observe the crash when running the built dll as I mentioned). With your patch, there are no errors at all (and the built dll runs fine).

Building DXVK with mingw-w64-binutils as mentioned before is sufficient to see the memory corruption errors with ld during build time. I don't yet have an easy example app to demonstrate the crash when using the built dll. Bayonetta on Steam was being used for testing in the DXVK bug 
https://github.com/doitsujin/dxvk/issues/1625, but the crash happens for a variety of D3D applications.

Thanks a ton for the fast response!
Comment 6 Alan Modra 2020-05-17 00:01:19 UTC
ldmain.c:add_archive_element is where we duplicate the file name pointers:
  input->filename = abfd->filename;
  input->local_sym_name = abfd->filename;
Comment 7 Alan Modra 2020-05-17 22:40:38 UTC
We should do something about the above local_sym_name pointer into what is freed memory after pe.em/pep.em have twiddled the bfd filename.  The simplest solution is to make another copy of the name in add_archive_element, which I think would mean the pe.em/pep.em changes are not needed.

An alternative approach is to make bfd_set_filename allocate a copy of the name using bfd_alloc and not free any old name.  I think that's the best solution, and I have a patch to do that but it touches a lot of files and probably should be held off until after the gdb-9.2 release next weekend.
Comment 8 Sourceware Commits 2020-05-18 09:29:56 UTC
The master branch has been updated by Nick Clifton <nickc@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=5e365e474b7561318ddb1a107f05cf0c002e8284

commit 5e365e474b7561318ddb1a107f05cf0c002e8284
Author: Nick Clifton <nickc@redhat.com>
Date:   Mon May 18 10:28:52 2020 +0100

    Prevent a potential use-after-fee memory corruption bug in the linker (for PE format files).
    
            PR 25993
            * emultempl/pe.em (_after_open): Check for duplicate filename
            pointers before renaming the dll.
            * emultempl/pep.em (_after_open): Likewise.
Comment 9 Nick Clifton 2020-05-18 09:32:52 UTC
Hi Alan,

> An alternative approach is to make bfd_set_filename allocate a copy of the
> name using bfd_alloc and not free any old name.  I think that's the best
> solution, and I have a patch to do that but it touches a lot of files and
> probably should be held off until after the gdb-9.2 release next weekend.

OK, that is fine by me.  I have gone ahead and applied my patch, since it 
cannot hurt and will fix the code until your patch is ready.

Cheers
  Nick
Comment 10 Sourceware Commits 2020-05-20 02:17:29 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7b958a48e1322880f23cdb0a1c35643dd27d3ddb

commit 7b958a48e1322880f23cdb0a1c35643dd27d3ddb
Author: Alan Modra <amodra@gmail.com>
Date:   Tue May 19 12:58:59 2020 +0930

    PR25993, read of freed memory
    
    ldmain.c:add_archive_element copies file name pointers from the bfd to
    a lang_input_statement_type.
      input->filename = abfd->filename;
      input->local_sym_name = abfd->filename;
    This results in stale pointers when twiddling the bfd filename in
    places like the pe ld after_open.  So don't free the bfd filename,
    and make copies using bfd_alloc memory that won't result in small
    memory leaks that annoy memory checkers.
    
            PR 25993
    bfd/
            * archive.c (_bfd_get_elt_at_filepos): Don't strdup filename,
            use bfd_set_filename.
            * elfcode.h (_bfd_elf_bfd_from_remote_memory): Likewise.
            * mach-o.c (bfd_mach_o_fat_member_init): Likewise.
            * opncls.c (bfd_fopen, bfd_openstreamr, bfd_openr_iovec, bfd_openw),
            (bfd_create): Likewise.
            (_bfd_delete_bfd): Don't free filename.
            (bfd_set_filename): Copy filename param to bfd_alloc'd memory,
            return pointer to the copy or NULL on alloc fail.
            * vms-lib.c (_bfd_vms_lib_get_module): Free newname and test
            result of bfd_set_filename.
            * bfd-in2.h: Regenerate.
    gdb/
            * solib-darwin.c (darwin_bfd_open): Don't strdup pathname for
            bfd_set_filename.
            * solib-aix.c (solib_aix_bfd_open): Use std::string for name
            passed to bfd_set_filename.
            * symfile-mem.c (add_vsyscall_page): Likewise for string
            passed to symbol_file_add_from_memory.
            (symbol_file_add_from_memory): Make name param a const char* and
            don't strdup.
    ld/
            * emultempl/pe.em (gld_${EMULATION_NAME}_after_open): Don't copy
            other_bfd_filename for bfd_set_filename, and test result of
            bfd_set_filename call.  Don't create a new is->filename, simply
            copy from bfd filename.  Free new_name after bfd_set_filename.
            * emultempl/pep.em (gld_${EMULATION_NAME}_after_open): Likewise.
Comment 11 Alan Modra 2020-05-20 02:29:11 UTC
bfd_set_filename changes applied.  Nick, your patch is a lot more suitable for backports than mine, if you so desire.
Comment 12 William Pierce 2020-05-20 06:05:03 UTC
I just tested Alan's patch on master. Valgrind still comes back with no errors, and the game still launches just fine.

I didn't look to see just how far back the memory corruption goes, but I think the DXVK community would appreciate a fix on at least branch 2.34 since games only started crashing with that release. (It's possible these crashes could have been happening before, but we don't have substantial evidence that it's this ld bug.)

Again, thank you for the fast response.
Comment 13 Sourceware Commits 2020-05-21 14:11:50 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=0490dd41ae89e66efd8b3cee122c189a481269de

commit 0490dd41ae89e66efd8b3cee122c189a481269de
Author: Alan Modra <amodra@gmail.com>
Date:   Thu May 21 23:34:58 2020 +0930

    Re: PR25993, read of freed memory
    
    git commit 7b958a48e132 put the bfd filename in the bfd objalloc
    memory.  That means the filename is freed by _bfd_free_cached_info.
    Which is called by _bfd_compute_and_write_armap to tidy up symbol
    tables after they are done with.
    
    Unfortunately, _bfd_write_archive_contents wants to seek and read from
    archive elements after that point, and if the number of elements
    exceeds max_open_files in cache.c then some of those elements will
    have their files closed.  To reopen, you need the filename.
    
            PR 25993
            * opncls.c (_bfd_free_cached_info): Keep a copy of the bfd
            filename.
            (_bfd_delete_bfd): Free the copy.
            (_bfd_new_bfd): Free nbfd->memory on error.
Comment 14 William Pierce 2020-05-29 02:54:49 UTC
Is it possible to back port Nick's patch to 2.34?

Although, if Nick's patch were backported to 2.34, would the downloads at https://ftp.gnu.org/gnu/binutils/ be updated? Most folks building release versions will use those downloads rather than the git repo. If not, DXVK folks experiencing the original problem would just have to use a patched build until then.

Thanks
Comment 15 Sourceware Commits 2020-06-03 14:17:11 UTC
The binutils-2_34-branch branch has been updated by Nick Clifton <nickc@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=463ec189fe9eca199edf87cda2c31efbe850390d

commit 463ec189fe9eca199edf87cda2c31efbe850390d
Author: Nick Clifton <nickc@redhat.com>
Date:   Wed Jun 3 15:16:48 2020 +0100

    Prevent a potential use-after-fee memory corruption bug in the linker (for PE format files).
    
            PR 25993
            * emultempl/pe.em (_after_open): Check for duplicate filename
            pointers before renaming the dll.
            * emultempl/pep.em (_after_open): Likewise.
Comment 16 Nick Clifton 2020-06-03 14:19:45 UTC
Hi William,

> Is it possible to back port Nick's patch to 2.34?

Yes - done.

> Although, if Nick's patch were backported to 2.34, would the downloads at
> https://ftp.gnu.org/gnu/binutils/ be updated?

No.  In order to do that we would need to create a minor update release.
This could be done, but I am reluctant to do as I am planning on making
the next major release (2.35) in a month's time...

Cheers
  Nick
Comment 17 William Pierce 2020-06-04 03:57:19 UTC
Thank you!