Created attachment 12539 [details] Logs from running valgrind on ld during DXVK compilation I bisected a bug in DXVK to be created and fixed by two commits to bfd. DXVK transforms DX9-11 into Vulkan calls and is compiled as a .dll to be used with WINE. This is DXVK's homepage: https://github.com/doitsujin/dxvk DXVK uses mingw-w64-binutils to compile into a .dll The mingw-w64-binutils I tested with originates from here: https://aur.archlinux.org/packages/mingw-w64-binutils That gets built using a PKGBUILD script from the ArchLinux AUR: https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=mingw-w64-binutils The build process from that boils down to _targets="i686-w64-mingw32 x86_64-w64-mingw32" for _target in $_targets; do msg "Building ${_target} cross binutils" mkdir -p "$srcdir"/binutils-${_target} && cd "${srcdir}/binutils-${_target}" "$srcdir"/binutils-gdb/configure --prefix=/usr \ --target=${_target} \ --infodir=/usr/share/info/${_target} \ --enable-lto --enable-plugins \ --enable-deterministic-archives \ --disable-multilib --disable-nls \ --disable-werror make I bisected the bug with binutils in the DXVK issue here: https://github.com/doitsujin/dxvk/issues/1625 The bug manifests as WINE encountering a page fault at address 0x0, or executing illegal instructions, or a few other manifestations with different symptoms when DXVK is compiled with mingw-w64-binutils 2.34. mingw-w64-binutils 2.33 does not have this bug. We think the issue is that the linker makes DXVK jump to a bogus address and start executing garbage. I bisected the bug to be introduced by binutils commit ea933f17c3c6b9fa1daf8d03baa34f7bec855d6c: "Release bfd_alloc memory in bfd_check_format_matches" between the branch points for 2.33 and 2.34 I bisected the bug to be fixed in at least one case by binutils commit b03202e32c8235997b3485b0b4655926ad97a1cc: "bfd_get_size cache" between 2.34 branch point and master. Note that also in this bisection between 2.34 and master, there was a point where the reproduction hung with this in the WINE log instead of crashing outright: 49801.150:00c0:00c4:warn:seh:setup_exception_record exception outside of stack limits in thread 00c4 eip 00314be2 esp c79140f7 stack 0x222000-0x320000 49801.150:00c0:00c4:err:seh:setup_exception_record nested exception on signal stack in thread 00c4 eip 7bce9d9d esp 7ffdb930 stack 0x222000-0x320000 While these commits do both touch bfd, they don't touch the same file. Maybe the bug is only perturbed by the newer commit (or maybe my bisect is slightly off). Also, the newer commit isn't trying to address the bug caused by the original commit, so maybe the bug could be reintroduced anytime. I was asked to run valgrind on ld to see if there are any memory errors. I did this on both commit ea933f17c3c6b9fa1daf8d03baa34f7bec855d6c that introduced the problem and the commit right before it to see if there's any difference. I ran it while compiling dxvk by replacing the ld binary with a shell script that does this: exec valgrind -s --log-file=/path/to/logs/ld-%p.log /usr/bin/i686-w64-mingw32-ld.bfd.bak "$@" where /usr/bin/i686-w64-mingw32-ld.bfd.bak is the real ld that would be used. Both seem to report I believe the same number of use-after-free errors, so I think the bug got perturbed in different ways by these two commits to effectively introduce and get rid of the error. I've attached these logs for the build for the two commits. In particular, I observed the repro happen in the 32 bit d3d9.dll build, but both valgrinds outputs are the same. In the attached logs, they are logs-f24bdec48621f419fdc9dcd58f46891f062b7bc0/ld-102345.log logs-ea933f17c3c6b9fa1daf8d03baa34f7bec855d6c/ld-183630.log
The files in my previous comment ran the build commands to build binutils using no extra compiler or linker flags. In this case, I actually found that I couldn't reproduce the problem! So the logs I provided in the first comment may not be useful. The default flags to use when building in the distribution I use are CPPFLAGS="-D_FORTIFY_SOURCE=2" CFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fno-plt" CXXFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fno-plt" LDFLAGS="-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now" I found if I disable all the CPPFLAGS/CFLAGS/CXXFLAGS and leave only the LDFLAGS, then I can reproduce the issue. (I did not try out disabling other combinations of flags.) I tried with a few subsets of the LDFLAGS flags like LDFLAGS="-Wl,-O1" or LDFLAGS="-Wl,--sort-common,--as-needed" and could reproduce it with either of those as well (but maybe not some other combinations). Any advice on how to debug further would be appreciated. I could pare down the flags more or diff the resulting binaries in the actual good/bad case.
Created attachment 12542 [details] Proposed patch Hi William, This definitely looks like a memory corruption bug. Based upon the valgrind logs I have a possible patch - please could you try it it out and let me know if it works ? Cheers Nick
Hi Nick, if it really is the case that is->filename == is->the_bfd->filename then it looks to me that your patch isn't sufficient. You've avoided accessing is->filename for the strlen, but you'll do so a little later in sprintf.. BTW, the latest sources seem to strdup anything stored in bfd->filename, at least for code that goes by any of the functions in bfd/opncls.c
Created attachment 12545 [details] Proposed patch Alan is of course totally correct. So here is an alternative patch. I suspect however that it will not be enough and that more analysis will be needed. Is there a simple, standalone way to reproduce the problem ?
I think the repro crash in DXVK could be perturbed in a few different ways, and my previous comment was running into a few different ways it could be perturbed. The crash also doesn't repro if I run ld under valgrind while building. Though for a memory corruption, these could make sense. Nick, your patch completely fixes the problem for me. I tested the patch being very careful to not perturb the setup in wrong way as I had seen. I very clearly saw the crash repro without your patch applied and not repro with it applied. I also tried out the build running with ld under valgrind with and without your patch. Without your patch, there are many errors (but, somehow I don't observe the crash when running the built dll as I mentioned). With your patch, there are no errors at all (and the built dll runs fine). Building DXVK with mingw-w64-binutils as mentioned before is sufficient to see the memory corruption errors with ld during build time. I don't yet have an easy example app to demonstrate the crash when using the built dll. Bayonetta on Steam was being used for testing in the DXVK bug https://github.com/doitsujin/dxvk/issues/1625, but the crash happens for a variety of D3D applications. Thanks a ton for the fast response!
ldmain.c:add_archive_element is where we duplicate the file name pointers: input->filename = abfd->filename; input->local_sym_name = abfd->filename;
We should do something about the above local_sym_name pointer into what is freed memory after pe.em/pep.em have twiddled the bfd filename. The simplest solution is to make another copy of the name in add_archive_element, which I think would mean the pe.em/pep.em changes are not needed. An alternative approach is to make bfd_set_filename allocate a copy of the name using bfd_alloc and not free any old name. I think that's the best solution, and I have a patch to do that but it touches a lot of files and probably should be held off until after the gdb-9.2 release next weekend.
The master branch has been updated by Nick Clifton <nickc@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=5e365e474b7561318ddb1a107f05cf0c002e8284 commit 5e365e474b7561318ddb1a107f05cf0c002e8284 Author: Nick Clifton <nickc@redhat.com> Date: Mon May 18 10:28:52 2020 +0100 Prevent a potential use-after-fee memory corruption bug in the linker (for PE format files). PR 25993 * emultempl/pe.em (_after_open): Check for duplicate filename pointers before renaming the dll. * emultempl/pep.em (_after_open): Likewise.
Hi Alan, > An alternative approach is to make bfd_set_filename allocate a copy of the > name using bfd_alloc and not free any old name. I think that's the best > solution, and I have a patch to do that but it touches a lot of files and > probably should be held off until after the gdb-9.2 release next weekend. OK, that is fine by me. I have gone ahead and applied my patch, since it cannot hurt and will fix the code until your patch is ready. Cheers Nick
The master branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7b958a48e1322880f23cdb0a1c35643dd27d3ddb commit 7b958a48e1322880f23cdb0a1c35643dd27d3ddb Author: Alan Modra <amodra@gmail.com> Date: Tue May 19 12:58:59 2020 +0930 PR25993, read of freed memory ldmain.c:add_archive_element copies file name pointers from the bfd to a lang_input_statement_type. input->filename = abfd->filename; input->local_sym_name = abfd->filename; This results in stale pointers when twiddling the bfd filename in places like the pe ld after_open. So don't free the bfd filename, and make copies using bfd_alloc memory that won't result in small memory leaks that annoy memory checkers. PR 25993 bfd/ * archive.c (_bfd_get_elt_at_filepos): Don't strdup filename, use bfd_set_filename. * elfcode.h (_bfd_elf_bfd_from_remote_memory): Likewise. * mach-o.c (bfd_mach_o_fat_member_init): Likewise. * opncls.c (bfd_fopen, bfd_openstreamr, bfd_openr_iovec, bfd_openw), (bfd_create): Likewise. (_bfd_delete_bfd): Don't free filename. (bfd_set_filename): Copy filename param to bfd_alloc'd memory, return pointer to the copy or NULL on alloc fail. * vms-lib.c (_bfd_vms_lib_get_module): Free newname and test result of bfd_set_filename. * bfd-in2.h: Regenerate. gdb/ * solib-darwin.c (darwin_bfd_open): Don't strdup pathname for bfd_set_filename. * solib-aix.c (solib_aix_bfd_open): Use std::string for name passed to bfd_set_filename. * symfile-mem.c (add_vsyscall_page): Likewise for string passed to symbol_file_add_from_memory. (symbol_file_add_from_memory): Make name param a const char* and don't strdup. ld/ * emultempl/pe.em (gld_${EMULATION_NAME}_after_open): Don't copy other_bfd_filename for bfd_set_filename, and test result of bfd_set_filename call. Don't create a new is->filename, simply copy from bfd filename. Free new_name after bfd_set_filename. * emultempl/pep.em (gld_${EMULATION_NAME}_after_open): Likewise.
bfd_set_filename changes applied. Nick, your patch is a lot more suitable for backports than mine, if you so desire.
I just tested Alan's patch on master. Valgrind still comes back with no errors, and the game still launches just fine. I didn't look to see just how far back the memory corruption goes, but I think the DXVK community would appreciate a fix on at least branch 2.34 since games only started crashing with that release. (It's possible these crashes could have been happening before, but we don't have substantial evidence that it's this ld bug.) Again, thank you for the fast response.
The master branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=0490dd41ae89e66efd8b3cee122c189a481269de commit 0490dd41ae89e66efd8b3cee122c189a481269de Author: Alan Modra <amodra@gmail.com> Date: Thu May 21 23:34:58 2020 +0930 Re: PR25993, read of freed memory git commit 7b958a48e132 put the bfd filename in the bfd objalloc memory. That means the filename is freed by _bfd_free_cached_info. Which is called by _bfd_compute_and_write_armap to tidy up symbol tables after they are done with. Unfortunately, _bfd_write_archive_contents wants to seek and read from archive elements after that point, and if the number of elements exceeds max_open_files in cache.c then some of those elements will have their files closed. To reopen, you need the filename. PR 25993 * opncls.c (_bfd_free_cached_info): Keep a copy of the bfd filename. (_bfd_delete_bfd): Free the copy. (_bfd_new_bfd): Free nbfd->memory on error.
Is it possible to back port Nick's patch to 2.34? Although, if Nick's patch were backported to 2.34, would the downloads at https://ftp.gnu.org/gnu/binutils/ be updated? Most folks building release versions will use those downloads rather than the git repo. If not, DXVK folks experiencing the original problem would just have to use a patched build until then. Thanks
The binutils-2_34-branch branch has been updated by Nick Clifton <nickc@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=463ec189fe9eca199edf87cda2c31efbe850390d commit 463ec189fe9eca199edf87cda2c31efbe850390d Author: Nick Clifton <nickc@redhat.com> Date: Wed Jun 3 15:16:48 2020 +0100 Prevent a potential use-after-fee memory corruption bug in the linker (for PE format files). PR 25993 * emultempl/pe.em (_after_open): Check for duplicate filename pointers before renaming the dll. * emultempl/pep.em (_after_open): Likewise.
Hi William, > Is it possible to back port Nick's patch to 2.34? Yes - done. > Although, if Nick's patch were backported to 2.34, would the downloads at > https://ftp.gnu.org/gnu/binutils/ be updated? No. In order to do that we would need to create a minor update release. This could be done, but I am reluctant to do as I am planning on making the next major release (2.35) in a month's time... Cheers Nick
Thank you!