Bug 28719 - DWARF-5 section names in PE/PEP and weak symbols in Cygwin
Summary: DWARF-5 section names in PE/PEP and weak symbols in Cygwin
Status: UNCONFIRMED
Alias: None
Product: binutils
Classification: Unclassified
Component: binutils (show other bugs)
Version: 2.37
: P2 critical
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-20 19:39 UTC by Achim
Modified: 2022-08-21 07:56 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Linker map new (ld-2.37) (5.13 KB, text/plain)
2022-01-29 07:24 UTC, Achim
Details
Linker map old (ld-2.36) (5.13 KB, text/plain)
2022-01-29 07:24 UTC, Achim
Details
Linker map new (ld-2.38 w/patches) (5.23 KB, text/plain)
2022-02-13 17:25 UTC, Achim
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Achim 2021-12-20 19:39:28 UTC
As detailed in

https://sourceware.org/pipermail/binutils/2021-December/118844.html

commit ba6eb62ff0ea9843a018cfd7cd06777bd66ae0a0 broke the resolution of weak symbols in Cygwin (which should always resolve to 0x0 since they don't work due to limitations of the OS loader).

Removing just the .debug_loclists section from that change reverts to the previous behaviour, but might introduce other bugs elsewhere.
Comment 1 Nick Clifton 2022-01-24 14:44:55 UTC
Hi Achim,

  I am very concerned about how adding a DWARF debug section to the linker script can cause this effect, and I think that we need to know more about the underlying cause.  Otherwise it is possible that this problem will happen again in the future when more debugging sections are added.

  Are you able to compare working and non-working(*) binaries and check a few things ?  For example if you compile the test code to an object file and examine the relocations in it, are there any relocations that reference the fputs symbol for the .debug_loclists section ?

  Also - when you perform the link where fputs becomes non-weak, are you able to create a linker map (-Wl,-Map,foo.map) and then look inside it to find out what has happened to the fputs symbol.  Similarly, what kind of output do you get if you add the -Wl,--trace-symbol=fputs ?

Cheers
  Nick

(*) working = does not run and non-working = runs because there are no weak symbols.
Comment 2 Achim 2022-01-29 07:24:06 UTC
Created attachment 13941 [details]
Linker map new (ld-2.37)
Comment 3 Achim 2022-01-29 07:24:52 UTC
Created attachment 13942 [details]
Linker map old (ld-2.36)
Comment 4 Achim 2022-01-29 07:27:47 UTC
...packages/binutils-gdb/build (2056)# objdump -r weak.o

weak.o:     file format pe-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000009 IMAGE_REL_AMD64_REL32  __main
0000000000000015 IMAGE_REL_AMD64_REL32  .refptr.fputs


RELOCATION RECORDS FOR [.pdata]:
OFFSET           TYPE              VALUE 
0000000000000000 IMAGE_REL_AMD64_ADDR32NB  .text
0000000000000004 IMAGE_REL_AMD64_ADDR32NB  .text
0000000000000008 IMAGE_REL_AMD64_ADDR32NB  .xdata


RELOCATION RECORDS FOR [.rdata$.refptr.fputs]:
OFFSET           TYPE              VALUE 
0000000000000000 IMAGE_REL_AMD64_ADDR64  fputs


...packages/binutils-gdb/build (2057)# ./weak_new ; echo weak_new exit status: $? ; ./weak_old ; echo weak_old exit status: $? ;
weak_new exit status: 0
./weak_old: ./weak_old: cannot execute binary file
weak_old exit status: 126
...packages/binutils-gdb/build (2058)# objdump -r weak.o

weak.o:     file format pe-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000009 IMAGE_REL_AMD64_REL32  __main
0000000000000015 IMAGE_REL_AMD64_REL32  .refptr.fputs


RELOCATION RECORDS FOR [.pdata]:
OFFSET           TYPE              VALUE 
0000000000000000 IMAGE_REL_AMD64_ADDR32NB  .text
0000000000000004 IMAGE_REL_AMD64_ADDR32NB  .text
0000000000000008 IMAGE_REL_AMD64_ADDR32NB  .xdata


RELOCATION RECORDS FOR [.rdata$.refptr.fputs]:
OFFSET           TYPE              VALUE 
0000000000000000 IMAGE_REL_AMD64_ADDR64  fputs


...packages/binutils-gdb/build (2059)# gcc -B/mnt/share/packages/binutils-gdb/build/ld/tmpdir/ld/ -o weak_old weak.o
/mnt/share/packages/binutils-gdb/build/ld/tmpdir/ld/collect-ld: weak_old.exe:/4: section below image base
...packages/binutils-gdb/build (2060)# gcc -o weak_new weak.o
...packages/binutils-gdb/build (2061)# ./weak_new ; echo weak_new exit status: $? ; ./weak_old ; echo weak_old exit status: $? ;
weak_new exit status: 0
./weak_old: ./weak_old: cannot execute binary file
weak_old exit status: 126

...packages/binutils-gdb/build (2067)# gcc -B/mnt/share/packages/binutils-gdb/build/ld/tmpdir/ld/ -Wl,--trace-symbol=fputs -o weak_old weak.o
/mnt/share/packages/binutils-gdb/build/ld/tmpdir/ld/collect-ld: weak.o: reference to fputs
/mnt/share/packages/binutils-gdb/build/ld/tmpdir/ld/collect-ld: /usr/lib/gcc/x86_64-pc-cygwin/11/../../../../lib/libcygwin.a(t-d000578.o): definition of fputs
/mnt/share/packages/binutils-gdb/build/ld/tmpdir/ld/collect-ld: weak_old.exe:/4: section below image base

...packages/binutils-gdb/build (2068)# gcc -Wl,--trace-symbol=fputs -o weak_new weak.o
/usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/bin/ld: weak.o: reference to fputs
/usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/bin/ld: /usr/lib/gcc/x86_64-pc-cygwin/11/../../../../lib/libcygwin.a(t-d000578.o): definition of fputs
Comment 5 Achim 2022-02-13 17:25:39 UTC
Created attachment 13976 [details]
Linker map new (ld-2.38 w/patches)

Patch to discard the two DWARF5 sections identified during bisecting applied (see patch on mailing list).
Comment 6 Achim 2022-02-13 17:26:58 UTC
So it turns out that removing the section at that commit (as I had done during bisecting) would actually restore the previuos behaviour (the executable would not run).  Applying the same fix on top of both the 2.37 and 2.38 releases doesn't, however.  So it seems that one of the later commits would be responsible for the actual resolution of the weak symbol.
Comment 7 Nick Clifton 2022-02-14 15:53:07 UTC
(In reply to Achim from comment #6)
> So it turns out that removing the section at that commit (as I had done
> during bisecting) would actually restore the previous behaviour (the
> executable would not run).  Applying the same fix on top of both the 2.37
> and 2.38 releases doesn't, however.  So it seems that one of the later
> commits would be responsible for the actual resolution of the weak symbol.

Hmm, I suspect that this is going to be hard to track down.

One thing I am not sure about however is the involvement of the cygwin.a library.
The trace you reported in comment #4 shows that it is being used to resolve the weak reference to fputs.  Presumably this should not be happening, but why not ?  Should cygwin.a not be included in the link ?  Or should it be included but not used to resolve weak references ?  Or ... included, used to resolve non-weak references, but if a member of the archive is used to resolve a non-weak reference then that member can then also be used to resolve weak references.  In which case, the problem is why is the t-d000578.o member of cygwin.a being pulled in in the first place ?

(I am also wondering if this problem has anything to do with the changes made to the xcoff_link_check_ar_symbols() function in bfd/xcofflink.c).
Comment 8 gee 2022-08-20 22:22:59 UTC
Okay, Can I ask why this patch applied in binutils toolchain which targets *-mingw32?
Also, Cygwin/MINGW64(which is alias of cygwin to their unix application) employs their own linker script for their own cygwin1.dll which complicates the problem furthermore. 

So cygwin maintainers should add dwarf5 debug section to that file accordingly.

It's very frustrating that former cygwin project lead had retired long ago...
Comment 9 gee 2022-08-21 02:18:16 UTC
For someone who uses x86_64-w64-mingw32-binutils from cygwin, use below script to patch the affected ld executable file. better than recompiling every stuff to get your own.

###

f=/usr/bin/x86_64-w64-mingw32-ld.exe;for i in $(grep --text -b "===" $f| tail +2|sed -e "s/:.*//"); do echo $i;sp="      "; echo "$sp"|dd if=/proc/self/fd/0 of=$f iflag=count_bytes,skip_bytes oflag=seek_bytes conv=notrunc seek=$i count=${#sp};done
lines=$(grep --text -bP '(?:/\* Copyright|    \*\(\.(?:z?debug_loclists|drectve)\))' $f|grep -E "Copyright" -A2|grep -E "loclists"|sort -k2)
for i in $(echo "$lines"|head -6|sed -e "s/:.*//");do sp="                      "; echo "$sp"|dd if=/proc/self/fd/0 of=$f iflag=count_bytes,skip_bytes oflag=seek_bytes conv=notrunc seek=$i count=${#sp};done
for i in $(echo "$lines"|tail -6|sed -e "s/:.*//");do sp="                       "; echo "$sp"|dd if=/proc/self/fd/0 of=$f iflag=count_bytes,skip_bytes oflag=seek_bytes conv=notrunc seek=$i count=${#sp};done

###
Comment 10 Achim 2022-08-21 07:56:20 UTC
(In reply to gee from comment #8)
> Okay, Can I ask why this patch applied in binutils toolchain which targets
> *-mingw32?

That was a mistake that will be fixed in the next release.

> Also, Cygwin/MINGW64(which is alias of cygwin to their unix application)
> employs their own linker script for their own cygwin1.dll which complicates
> the problem furthermore. 
> 
> So cygwin maintainers should add dwarf5 debug section to that file
> accordingly.

If anything that's a separate issue and if so I'd welcome if you would help fix it.  Presumably that'd involve a report here and/or on the Cygwin mailing list.

> It's very frustrating that former cygwin project lead had retired long ago...

Your point is what exactly?