Bug 30448 - ld fails to make a valid DLL when used with gnatdll
Summary: ld fails to make a valid DLL when used with gnatdll
Status: NEW
Alias: None
Product: binutils
Classification: Unclassified
Component: binutils (show other bugs)
Version: 2.34
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-15 11:26 UTC by Tom Kacvinsky
Modified: 2023-05-27 13:28 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2023-05-16 00:00:00


Attachments
pep.em patch (756 bytes, patch)
2023-05-15 15:01 UTC, Tom Kacvinsky
Details | Diff
Dump of files as generated by binutils 2.35 and binutils 2.36 (7.91 MB, application/x-bzip2)
2023-05-18 12:29 UTC, Tom Kacvinsky
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Kacvinsky 2023-05-15 11:26:10 UTC
I found an issue when generating a DLL via gnatdll.

The issue was introduced btween 2.33.1 and 2.34, and I believe it was this
specific commit dc9bd8c92af6 which introduced --enable-reloc-section in
pe.pem and pep.pem.  So, 2.33.1, which didn't have this option, worked,
and 2.34, which did have this option, did not.  It was until 2.35 when the
corresponding --disable-reloc-section was introduced that the problem was
fixed.  All I had to do is used --disable-reloc-section for my linker
options and the DLL generated by gnatdll went away.

This option works all the way through 2.40.
Comment 1 Nick Clifton 2023-05-15 11:33:54 UTC
Hi Tom,

  So using --disable-reloc-section is a viable workaround, yes ?

  I am not familiar with gnatdll, but is it possible to have that tool
  automatically use --disable-reloc-section somehow ?

  Do you know if the problem is caused by the mere presence of
  the .reloc section, or if the problem is that one or more of
  the relocs inside the section are wrong ?

  Is there a non-Windows specific way to reproduce the problem ?
  (I do not have a Windows development environment available to me).

Cheers
  Nick
Comment 2 Tom Kacvinsky 2023-05-15 12:09:23 UTC
Yes, --disable-reloc-section is a viable workaround.  And it can be added to the gnatdll option -largs (which is the option that allows you to specify options to the linker).

I would have to check to see if an MSVC DLL has a .reloc section.  If it doesn't, it could just be the presence of the .reloc section that cause the problem, or how it is set up to be referenced in the DLL.  If an MSVC DLL has a .reloc section (I think they call it an .rdata section, but I am not sure), then it might be a relocation in the .reloc section is wrong.

I have no non-Windows specific way of reproducing the problem.  This is a Windows only issues as it involves the Windows PE code in binutils.

I'll look into the .reloc section stuff today.
Comment 3 Tom Kacvinsky 2023-05-15 14:48:22 UTC
Have a different take on this issue now.  I was able to get 2.34 working.  I had to modify pep.em to get it to build on MSYS2 + MinGW-w64.

The commit that was partially reverted was 1ff6de031241.  In that commit there
was this section of code

${LDEMUL_EMIT_CTF_EARLY-NULL}

which the genscripts.sh script did not like (it was one of the few things that cased a hang in that script).  Replacing that by just NULL, as well as doing the same for

${LDEMUL_EXAMINE_STRTAB_FOR_CTF-NULL}

and a few other constructs like that helped with the problem.
Comment 4 Tom Kacvinsky 2023-05-15 15:01:35 UTC
Created attachment 14882 [details]
pep.em patch

Attached is the patch for pep.em that fixed the problem in binutils 2.34
Comment 5 Tom Kacvinsky 2023-05-16 01:47:27 UTC
I've conflated things.  The issues of building binutils 2.34 and 2.35 on MinGW-w64 is a separate issue (perhaps an issue the MInGW-w64 folks know about and could help solve my build problems without hacks).

I now have 2.34 and 2.35 building.

As a starting point, I built our product with binutils 2.30 and GCC 121.1.0.  This build was successful in the sense the application did not crash on start.

Then, I bisected between 2.31 and 2.37 and found the break happened at 2.36.  I scripted the steps gnatdll would use to make a DLL, then just changed which version of binutils I used in that script.  At 2.36, I saw a break - a way around it is to use --disable-reloc-section.

I just wanted to clarify the breakage happens at 2.36 and the other issues were not germane.

An interesting observation the the --base-file output between 2.35 and 2.36 with  --disable-reloc-section do not differ when using cmp -s.  The .exp disassembles via objdump -Da (it's a PE image) to differ in one two byte sequence. objdump -p (relocation information) on the DLLs shows there is minimal difference (just timestamp and checksum, etc...) and most importantly, the relocations were exactly the same.  So that's a +1.

However, when using 2.36 without --disable-reloc-section, the base files fail a binary comparison to 2.35 (they should be the same, as they're to carry relocation offsets written out by cofflink.c).  The exp files also disassemble with a multitude of differences (again, looking at differences of objdump -Da output with diff -u).  The dump of the DLLs relocation tables obtained by objdump -p also show a multitude of differences - mostly relocations.

So yeah, definite difference between 2.35 and 2.36 with relocation handling.
Comment 6 Nick Clifton 2023-05-16 10:43:03 UTC
(In reply to Tom Kacvinsky from comment #5)
Hi Tom,

> So yeah, definite difference between 2.35 and 2.36 with relocation handling.

I suspect that this is all because of this PR:

  https://sourceware.org/bugzilla/show_bug.cgi?id=19011

Which changed some of the defaults for building PE binaries so that they were more secure.

I guess that the real question is "does anything actually need to be fixed in the linker ?"  I am assuming that the new, more secure, default settings are correct for most situations, and that now that you know about using --disable-reloc-section your builds with gnatdll will start working again.

Possibly some more documentation is warranted ?  Maybe the description of the --disable-reloc-section in the linker manual should include a paragraph about using it if building DLLs no longer works ?

Cheers
  Nick
Comment 7 Tom Kacvinsky 2023-05-16 13:28:07 UTC
I don't understood _why_ those patches introduced the issue.

Looking at them, one was from 2018, and the problem started sometime after 2019, and the other patches are changing ld options for DLL characteristics, but I saw nothing in them that would change how to relocations are generated.

You'd think that for a DLL, you would always want a .reloc section and so disabling it should cause an issue (not the other way around), and my understanding is the DLL characteristics are just a guide to the run time loader, anyway.  I can play around with the options those patches introduced if I can see if enabling/disabling them makes a difference.

Anyway, it'd be great if that question could be answered.  If not, I'll just carry on with my workaround without understanding why.  It does bother my "Mr. Definitive" take on issues like this, but hey, sometimes you can't let perfect be the enemy of the done.
Comment 8 Tom Kacvinsky 2023-05-16 14:30:54 UTC
I think it was actually this commit

dc9bd8c92af67947db44b3cb428c050259b15cd0

That had pep_dll_enable_reloc_section = 1 only if --enable-dynamicbase was specified (which we hadn't been doing).  Later on in, in commit 514b4e191d5f it was enabled by default, irrespective of --enable-dynamicbase or --enable-reloc-section being specified.  So I think the crucial commit that didn't take full effect by default until later is dc9bd8c92af67947db44b3cb428c050259b15cd0.
Comment 9 Eric Botcazou 2023-05-16 17:28:48 UTC
You don't clearly say what the problem is though.  Yes, DLLs (and executables) need a .reloc section to be relocated at load time, that's the way Position-Independent Code work in the Microsoft world.
Comment 10 Tom Kacvinsky 2023-05-16 17:35:58 UTC
The problem is an access violation at startup, deep in the guts of the DLL loader.  Doing a debug session with Visual Studio and looking at registers and memory locations, it was determined that loading the DLL in question is where things went south.

And, for what it's worth, perhaps --disable-reloc-section is not a good name for an option - the DLL I produced with that option does have a .reloc section.  So what exactly does --disable-reloc-section mean if specifying that option still results in a DLL with a .reloc section?
Comment 11 Eric Botcazou 2023-05-16 18:03:38 UTC
> The problem is an access violation at startup, deep in the guts of the DLL
> loader.  Doing a debug session with Visual Studio and looking at registers
> and memory locations, it was determined that loading the DLL in question is
> where things went south.

For a DLL generated by GNAT?  Which version of GNAT is that?

> And, for what it's worth, perhaps --disable-reloc-section is not a good name
> for an option - the DLL I produced with that option does have a .reloc
> section.  So what exactly does --disable-reloc-section mean if specifying
> that option still results in a DLL with a .reloc section?

So it generates a different .reloc section?
Comment 12 Tom Kacvinsky 2023-05-16 18:13:42 UTC
(In reply to Eric Botcazou from comment #11)
> > The problem is an access violation at startup, deep in the guts of the DLL
> > loader.  Doing a debug session with Visual Studio and looking at registers
> > and memory locations, it was determined that loading the DLL in question is
> > where things went south.
> 
> For a DLL generated by GNAT?  Which version of GNAT is that?

GNAT that comes from GCC 12.1.0.  I am using MSYS2 + MinGw-w64 for my base setup, then compile a custom GCC 12.1.0 with mingw-w64-crt v10.0.0 and binutils 2.38.  This includes MinGW-w64's support for the UCRT, as we need that for the version of Visual Studio we also use to build out product.

> 
> > And, for what it's worth, perhaps --disable-reloc-section is not a good name
> > for an option - the DLL I produced with that option does have a .reloc
> > section.  So what exactly does --disable-reloc-section mean if specifying
> > that option still results in a DLL with a .reloc section?
> 
> So it generates a different .reloc section?

Yes, binutils 2.35 generates a different .reloc section than binutils 2.36 (with default options) unless one uses binutils 2.36 with --disable-reloc-section.

I shall gather what files I can attach that don't contains the proprietary binary code, just the relocation information.
Comment 13 Eric Botcazou 2023-05-16 18:47:36 UTC
> GNAT that comes from GCC 12.1.0.  I am using MSYS2 + MinGw-w64 for my base
> setup, then compile a custom GCC 12.1.0 with mingw-w64-crt v10.0.0 and
> binutils 2.38.  This includes MinGW-w64's support for the UCRT, as we need
> that for the version of Visual Studio we also use to build out product.

There was a known issue with DLLs relocations at load time in earlier GCC versions, but it has been fixed by:

2021-07-02  Eric Botcazou  <ebotcazou@adacore.com>

	* config/i386/i386.c (asm_preferred_eh_data_format): Always use the
	PIC encodings for PE-COFF targets.

in GCC 11.2.0 and later.

> Yes, binutils 2.35 generates a different .reloc section than binutils 2.36
> (with default options) unless one uses binutils 2.36 with
> --disable-reloc-section.

Weird.  I wonder what its contents were with 2.35 then.
Comment 14 Tom Kacvinsky 2023-05-18 12:29:29 UTC
Created attachment 14887 [details]
Dump of files as generated by binutils 2.35 and binutils 2.36

The files in this archive contain dumps of files generated by binutils 2.36 and 2.36.  The .p files were generated by "objdump -p" and the .s files were generated by "objdump -Da".

You will note that the .base file is different, and the that the relocation information in the dump of the DLL is different.
Comment 15 Tom Kacvinsky 2023-05-27 10:43:12 UTC
I am on vacation.  When I get back, I will also upload the .exp file as generated by the Visual Studio tool chain.  Because the final link done via gnatdll works with that .exp file, but not the one generated by dlltool.
Comment 16 Tom Kacvinsky 2023-05-27 13:28:22 UTC
(In reply to Tom Kacvinsky from comment #14)
 
> The files in this archive contain dumps of files generated by binutils 2.36
> and 2.36.  The .p files were generated by "objdump -p" and the .s files were
> generated by "objdump -Da".

That should be "dumps of files generated by binutils 2.35 and 2.36."