Bug 28903

Summary: LD producing SegFault executables with FreePascal 2.6.4, in Binutils-2.36.1 and later
Product: binutils Reporter: John B Thiel <jbthiel>
Component: ldAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED INVALID    
Severity: normal CC: amodra, hjl.tools, nickc
Priority: P2    
Version: 2.36   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed: 2022-02-17 00:00:00
Bug Depends on: 27100    
Bug Blocks:    
Attachments: Complete test case demonstration package
The working link.res
The fixed link.res
A simpler linker script

Description John B Thiel 2022-02-17 16:48:34 UTC
It appears that LD in binutils versions after 2.35 is producing segfault executables with Free Pascal 2.6.4.  You cannot build a working HelloWorld.

I have written up a full report, please see
  https://gitlab.com/freepascal.org/fpc/source/-/issues/39324

The Free Pascal developers believe this is a regression of a similar LD bug that previously occurred and was fixed in LD.  They put a workaround/redesign in later versions of FreePascal, the 3.x series, but are unable to backport their fix to FPC 2.6.4.
So it needs to be fixed again in binutils LD.

It is very easy to reproduce.
The simplest HelloWorld will yield a segfault executable.  Demonstration:

PROGRAM, put this code in myprog.pas:
program myprog;
begin
writeln('hello');
end.

COMPILE via:
$ /usr/local/fpc-2.6.4/bin/fpc -Xm -va myprog.pas
    -Xm  produces linker map
    -va  gives verbose output

RUN:
$ ./myprog
Segmentation fault


On Gentoo Linux, it works with binutils-2.35, and segfaults with binutils-2.36 and binutils-2.37, both tested.

If you need any other details or test files, please advise and I will try to supply.

This is an extremely critical bug. It completely kills the 2.6.x series of FreePascal, which I and likely many others require to keep running.

Gentoo has already switched to binutils-2.37, and started to mask out binutils-2.35 by default.  A fix is needed soon, before the older versions are dropped altogether.

Thanks for your work on binutils.

-- JBThiel
Comment 1 John B Thiel 2022-02-17 16:57:13 UTC
Here are some notes from my FPC bug report, pulled forward for easy reference.  I also put 2 linker maps in the other bug report, good-2.35.2  and bad-2.36.1,  let me know if you need them attached again here.  Will also supply other object files, maps, traces, etc. let me know if anything needed.

===

Minimal helloworld.pas generates faulty executable that immediately seg faults.

GDB shows it breaks before main:
(gdb) break main
Breakpoint 1 at 0x401064
(gdb) run
During startup program terminated with signal SIGSEGV, Segmentation fault.

I have debugged some and it seems related to binutils-2.36.1  or binutils-libs. The problem also occurs with binutils-2.37.  Reverting back to binutils-2.35.2 restores correct function, no segfault.

===

The linker maps I attached above are from default linker ld, which is ld.bfd.
Now I tested 2 other linkers, by giving fpc -sh, then editing the ppas.sh to call a different linker, namely:

  ld.gold  (also from binutils, same version as ld.bfd the default ld)
  ld.lld-12.0.1

Both these linkers give Segmentation Fault in the exe, on BOTH binutils-2.35.2 and 2.36.1.

So only LD.BFD linker was working at all, and as of 2.36.1 it stopped working too.

Note that LD.LLD is not in the binutils package, rather in sys-devel/lld.
So the link problem seems to be at a deeper layer, not only binutils.

The link.res shows there are only 3 input files:

INPUT(
/usr/local/fpc-2.6.4//lib/fpc/2.6.4/units/x86_64-linux/rtl/prt0.o
myprog.o
/usr/local/fpc-2.6.4//lib/fpc/2.6.4/units/x86_64-linux/rtl/system.o
)

The prt0.o, system.o, are the stock ones from the binary Linux FPC264 distribution:

  1712 Mar  3  2014 /usr/local/fpc-2.6.4//lib/fpc/2.6.4/units/x86_64-linux/rtl/prt0.o
  361692 Mar  3  2014 /usr/local/fpc-2.6.4//lib/fpc/2.6.4/units/x86_64-linux/rtl/system.o

Also, I briefly started examining objdump of the executables. It seems like (not sure yet), the correct startup code and entry point are present in the segfaulting exe.  This suggests a problem with the startup trap used to jump there, or maybe some attributes of the code segment are not being set right.  I have read elsewhere of evolving security initiatives with kernel changes, that are tightening requirements on the structure/permissions of executable code.
Comment 2 H.J. Lu 2022-02-17 17:15:05 UTC
Please make ALL linker input files available to reproduce it on
different machines.
Comment 3 John B Thiel 2022-02-17 18:16:10 UTC
Created attachment 13986 [details]
Complete test case demonstration package

Here is a complete test case demonstration package, including 
  Makefile 
  myprog.pas .o .map 
  myprog  (executable that segfaults, as built with binutils 2.37_p1-r1 in Gentoo)
  link.res 
  ppas.sh (link script)
  and the 2 system .o files (prt0.o  system.o)

$ make clean
rm myprog.o myprog.map myprog link.res ppas.sh testcase-28903.zip

$ make dist
/usr/local/fpc-2.6.4/bin/fpc -Xm -sh myprog.pas
Free Pascal Compiler version 2.6.4 [2014/03/03] for x86_64
Copyright (c) 1993-2014 by Florian Klaempfl and others
Target OS: Linux for x86-64
Compiling myprog.pas
Closing script ppas.sh
4 lines compiled, 0.0 sec 
sh ppas.sh
Linking myprog
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?
zip testcase-28903.zip Makefile myprog.pas myprog.o myprog.map myprog link.res ppas.sh /usr/local/fpc-2.6.4/lib/fpc/2.6.4/units/x86_64-linux/rtl/prt0.o /usr/local/fpc-2.6.4/lib/fpc/2.6.4/units/x86_64-linux/rtl/system.o 
  adding: Makefile (deflated 46%)
  adding: myprog.pas (stored 0%)
  adding: myprog.o (deflated 68%)
  adding: myprog.map (deflated 87%)
  adding: myprog (deflated 68%)
  adding: link.res (deflated 85%)
  adding: ppas.sh (deflated 36%)
  adding: usr/local/fpc-2.6.4/lib/fpc/2.6.4/units/x86_64-linux/rtl/prt0.o (deflated 69%)
  adding: usr/local/fpc-2.6.4/lib/fpc/2.6.4/units/x86_64-linux/rtl/system.o (deflated 76%)

$ ./myprog 
Segmentation fault

$ cat ppas.sh 
#!/bin/sh
DoExitAsm ()
{ echo "An error occurred while assembling $1"; exit 1; }
DoExitLink ()
{ echo "An error occurred while linking $1"; exit 1; }
echo Linking myprog
OFS=$IFS
IFS="
"
/usr/bin/ld -b elf64-x86-64 -m elf_x86_64     -Map myprog.map -L. -o myprog link.res
if [ $? != 0 ]; then DoExitLink myprog; fi
IFS=$OFS
Comment 4 H.J. Lu 2022-02-17 18:22:31 UTC
(In reply to John B Thiel from comment #3)
> Created attachment 13986 [details]
> Complete test case demonstration package
> 
> Here is a complete test case demonstration package, including 
>   Makefile 
>   myprog.pas .o .map 
>   myprog  (executable that segfaults, as built with binutils 2.37_p1-r1 in
> Gentoo)
>   link.res 
>   ppas.sh (link script)
>   and the 2 system .o files (prt0.o  system.o)
> 
> $ make clean
> rm myprog.o myprog.map myprog link.res ppas.sh testcase-28903.zip
> 
> $ make dist
> /usr/local/fpc-2.6.4/bin/fpc -Xm -sh myprog.pas

I don't have /usr/local/fpc-2.6.4/bin/fpc and I don't need it.  I need the
FULL linker command line with ALL linker inputs to create the broken executable.
Comment 5 John B Thiel 2022-02-17 18:37:19 UTC
(In reply to H.J. Lu from comment #4)
> (In reply to John B Thiel from comment #3)
> > Created attachment 13986 [details]
> > Complete test case demonstration package
> > 
> > Here is a complete test case demonstration package, including 
> >   Makefile 
> >   myprog.pas .o .map 
> >   myprog  (executable that segfaults, as built with binutils 2.37_p1-r1 in
> > Gentoo)
> >   link.res 
> >   ppas.sh (link script)
> >   and the 2 system .o files (prt0.o  system.o)
> > 
> > $ make clean
> > rm myprog.o myprog.map myprog link.res ppas.sh testcase-28903.zip
> > 
> > $ make dist
> > /usr/local/fpc-2.6.4/bin/fpc -Xm -sh myprog.pas
> 
> I don't have /usr/local/fpc-2.6.4/bin/fpc and I don't need it.  I need the
> FULL linker command line with ALL linker inputs to create the broken
> executable.


It is there.
I just showed the whole sequence to fully document, so you can see what's going on.
The /usr/local/fpc-2.6.4/bin/fpc  with -sh flag generates a link script (ppas.sh) and link.res and myprog.o

The ppas.sh you see there (and in the zipfile) invokes the linker as:
/usr/bin/ld -b elf64-x86-64 -m elf_x86_64     -Map myprog.map -L. -o myprog link.res

The link.res has 3 files in the INPUT section:
  myprog.o
  and the 2 system files (prt0.o system.o), which I gave with the complete subpath, and you can see in the zipfile.

All these files are in the attached zipfile. 
This should be the complete reproducible package needed, from what I can tell.  
I can't see any other files referenced.  If there are, let me know.

Just don't run 'make clean' or 'make dist' again.  
Start with ppas.sh
Comment 6 H.J. Lu 2022-02-17 18:52:00 UTC
What is the last known working binutils?
Comment 7 H.J. Lu 2022-02-17 18:54:59 UTC
If there is no known working binutils, Free Pascal 2.6.4 never worked
with any binutils.
Comment 8 John B Thiel 2022-02-17 19:10:26 UTC
(In reply to H.J. Lu from comment #6)
> What is the last known working binutils?

The last known good is binutils-2.35.2
On Gentoo, if I switch to it, this exact test case works fine, the program prints 'hello'.

I just confirmed reproduced it again on my dev system.
binutils-2.35.2     works
binutils-2.37_p1-r1 segfault

I had also previously tested 
  binutils-2.36.1     segfault


On Gentoo, one can switch between them via eselect:

AS ROOT
# eselect binutils list
 [1] x86_64-pc-linux-gnu-2.35.2
 [2] x86_64-pc-linux-gnu-2.37_p1 *

# eselect binutils set 1
 * Switching to x86_64-pc-linux-gnu-2.35.2 ...                                                                                                                                                  [ ok ]
 * Please remember to run:
 *   # . /etc/profile

NOW AS USER:
$ . /etc/profile
$ sh ppas.sh 
Linking myprog
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?
$ ./myprog 
hello
Comment 9 H.J. Lu 2022-02-17 19:19:11 UTC
(In reply to John B Thiel from comment #8)
> (In reply to H.J. Lu from comment #6)
> > What is the last known working binutils?
> 
> The last known good is binutils-2.35.2

binutils-2.35.2 doesn't work for me:

[hjl@gnu-tgl-3 pr28903]$ ./ld -V
GNU ld (GNU Binutils) 2.35.2
  Supported emulations:
   elf_x86_64
   elf32_x86_64
   elf_i386
   elf_iamcu
   elf_l1om
   elf_k1om
[hjl@gnu-tgl-3 pr28903]$ make
./ld -b elf64-x86-64 -m elf_x86_64 -Map myprog.map -L. -T link.t -o x
./x
make: *** [Makefile:5: all] Segmentation fault
[hjl@gnu-tgl-3 pr28903]$ 

It crashed during start up.
Comment 10 John B Thiel 2022-02-17 19:36:35 UTC
(In reply to H.J. Lu from comment #9)
> (In reply to John B Thiel from comment #8)
> > (In reply to H.J. Lu from comment #6)
> > > What is the last known working binutils?
> > 
> > The last known good is binutils-2.35.2
> 
> binutils-2.35.2 doesn't work for me:
> 
> [hjl@gnu-tgl-3 pr28903]$ ./ld -V
> GNU ld (GNU Binutils) 2.35.2
>   Supported emulations:
>    elf_x86_64
>    elf32_x86_64
>    elf_i386
>    elf_iamcu
>    elf_l1om
>    elf_k1om
> [hjl@gnu-tgl-3 pr28903]$ make
> ./ld -b elf64-x86-64 -m elf_x86_64 -Map myprog.map -L. -T link.t -o x
> ./x
> make: *** [Makefile:5: all] Segmentation fault
> [hjl@gnu-tgl-3 pr28903]$ 
> 
> It crashed during start up.



Well, now we are into your territory, I hope you can figure it out.  What I can tell you is binutils-2.35.2 and earlier have been working solid reliable for a long time, years.

I noticed you are using the -T flag.  The FPC specifically doesn't use that, something to do with the builtin system script that they override with their own, or the other way round.

For a possible clue, here is what the FPC devs had a guess it relates to, copied from the FPC bug report I put at
  https://gitlab.com/freepascal.org/fpc/source/-/issues/39324



https://wiki.freepascal.org/User_Changes_3.2.0#GNU_Binutils_2.19.1_or_later_are_required_by_default
GNU Binutils 2.19.1 or later are required by default
    Old behaviour: The compiler invocation of the linker always resulted in a warning stating "did you forget -T?"
    New behaviour: The compiler now uses a different way to invoke the linker, which prevents this warning, but this requires functionality that is only available in GNU Binutils 2.19 and later.
    Reason: Get rid of the linker warning, which was caused by the fact that we used the linker in an unsupported way (and which hence occasionally caused issues).
    Remedy: If you have a system with an older version of GNU Binutils, you can use the new -X9 command line parameter to make the compiler revert to the old behaviour. You will not be able to (easily) bootstrap the new version of FPC on such a system though, so use another system with a more up-to-date version of GNU Binutils for that.
---
https://gitlab.com/freepascal.org/fpc/source/-/commit/4564bffb85e5947cf7bdfa3e2c67bc032775d0c5
 * use binutils 2.19+ linker script "augmentation" functionality to specify
    how the fpc sections have to be linked *on Linux*. This prevents the
    "did you forget -T" warnings from ld, and in general is more correct than
    our previous approach of specifying a complete linker script without -T
    and hoping that there won't be any unexpected interactions with ld's
    built-in linker script (fixed version of r31664, thanks to Alan Modra)
   o use the new -X9 command line option to generate linker scripts that
     are compatible with binutils older than 2.19 (reverts to the old
     behaviour)


Note it is *not* confirmed that this is the actual issue here.  This prior issue relates to something in binutils 2.19,  whereas I have fully confirmed this current issue is between binutils-2.35 and 2.36.   Maybe it's the same problem, or not.
Comment 11 H.J. Lu 2022-02-17 20:10:02 UTC
commit 21401fc7bf67dbf73f4a3eda4bcfc58fa4211584
Author: Alan Modra <amodra@gmail.com>
Date:   Tue Nov 24 23:41:31 2020 +1030

    Duplicate output sections in scripts
    
    Previously, ld merged duplicate output sections if such existed in
    scripts, except for those with a constraint of SPECIAL.  This makes
    scripts with duplicate output section statements create duplicate
    output sections in the linker output file.
    
            * ldlang.c (lang_output_section_statement_lookup): Change "create"
            parameter to a tristate, if 2 then always create a new output
            section statement.  Update all callers, with
            lang_enter_output_section_statement using "2".
            (map_input_to_output_sections): Don't ignore SPECIAL constraint
            here.
            * ldlang.h (lang_output_section_statement_type): Update prototype.
            (lang_output_section_find): Update.

caused:

./ld -b elf64-x86-64 -m elf_x86_64 -Map myprog.map -L. link.t -o x
./ld: warning: link.t contains output sections; did you forget -T?
./ld: final link failed: bad value
Comment 12 H.J. Lu 2022-02-17 20:14:22 UTC
commit de34d42812a0b978b278cd344abeaee7c71fa55c
Author: Alan Modra <amodra@gmail.com>
Date:   Thu Dec 24 15:56:23 2020 +1030

    PR27100, final link failed: bad value
    
    The failure on this PR is due to using the same bfd section for
    multiple output sections.  Commit 21401fc7bf67 managed to create
    duplicate linker script output section statements, but not the actual
    bfd sections.
    
fixed:

./ld: final link failed: bad value

but generated the bad executable.
Comment 13 H.J. Lu 2022-02-17 20:16:13 UTC
Created attachment 13987 [details]
The working link.res
Comment 14 H.J. Lu 2022-02-18 13:45:44 UTC
Bad executable has 2 .data sections:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .note.ABI-tag     NOTE            0000000000400190 001190 000020 00   A  0   0  4
  [ 2] .text             PROGBITS        0000000000401000 002000 020160 00  AX  0   0 16
  [ 3] .data             PROGBITS        0000000000422000 023000 005680 00  WA  0   0 16
  [ 4] .bss              NOBITS          0000000000427680 028680 002228 00  WA  0   0 16
  [ 5] .debug_frame      PROGBITS        0000000000000000 028680 000048 00      0   0  8
  [ 6] .data             PROGBITS        0000000000000190 000190 000030 00  WA  0   0  8
  [ 7] .bss              NOBITS          00000000000001c0 000000 000000 00  WA  0   0  1
  [ 8] .symtab           SYMTAB          0000000000000000 0286c8 00c4f8 18      9  10  8
  [ 9] .strtab           STRTAB          0000000000000000 034bc0 00e303 00      0   0  1
  [10] .shstrtab         STRTAB          0000000000000000 042ec3 000047 00      0   0  1

The second one shouldn't be there.
Comment 15 H.J. Lu 2022-02-18 13:51:08 UTC
Created attachment 13988 [details]
The fixed link.res
Comment 16 H.J. Lu 2022-02-18 13:55:15 UTC
$ ld -b elf64-x86-64 -m elf_x86_64 -Map myprog.map -L. link.res -o x
ld: warning: link.res contains output sections; did you forget -T?

has given you a clue that -T was missing.  With my attached linker script, I
got

[hjl@gnu-tgl-3 pr28903]$ make
ld -b elf64-x86-64 -m elf_x86_64 -Map myprog.map -L. -T link.res -o x
./x
hello
[hjl@gnu-tgl-3 pr28903]$
Comment 17 H.J. Lu 2022-02-18 14:10:39 UTC
Created attachment 13989 [details]
A simpler linker script

[hjl@gnu-tgl-3 pr28903]$ make
ld -b elf64-x86-64 -m elf_x86_64 -Map myprog.map -L. link.t -o x
./x
hello
[hjl@gnu-tgl-3 pr28903]$
Comment 18 John B Thiel 2022-02-27 17:09:30 UTC
Looks like good progress, HJ Lu, though I don't follow all the internal specifics.

Is this still under diagnosing/investigation, or is it now clear what the fix is?

Any idea what kind of timeframe for the fix to get into a new binutils version?

It would be highly desirable that a new working binutils/ld is released before 2.35 is rolled out of distros like gentoo, debian, etc.
Then FPC users can skip over 2.36/2.37 and jump from 2.35 to the next working version.
Comment 19 H.J. Lu 2022-02-27 17:12:56 UTC
(In reply to John B Thiel from comment #18)
> Looks like good progress, HJ Lu, though I don't follow all the internal
> specifics.
> 
> Is this still under diagnosing/investigation, or is it now clear what the
> fix is?
> 
> Any idea what kind of timeframe for the fix to get into a new binutils
> version?

There is no bug in binutils.

> It would be highly desirable that a new working binutils/ld is released
> before 2.35 is rolled out of distros like gentoo, debian, etc.
> Then FPC users can skip over 2.36/2.37 and jump from 2.35 to the next
> working version.

The bug is in the linker script in FreePascal 2.6.4.
Comment 20 John B Thiel 2022-02-28 14:43:59 UTC
Given that the root cause is a bug in the linker script from FPC, still the observable end-user fact is the combination of FPC 2.6.4 + binutils/LD up to 2.35 works, and has for years.  It runs and produces correct output, a working executable.  The newer version of binutils-2.36/37 does not.

So it is a change in binutils/LD version 2.36 and later that "caused" the problem, in this sense.  There were apparently offsetting bugs, where the older binutils/LD accepted and worked with the faulty input script. By tightening up/bugfix/improving on the binutils side, it has rendered the legacy FPC approach inoperable.

From an overall system perspective, it does not do the end-user any good to say, bug there not here.

Because FPC 2.6.4 is a legacy version, the way it produces the linker script is basically "etched in stone".  The FPC team cannot release a new version in that branch.  And it is neither possible for end-user developers to customize the link scripts on-the-fly, to my knowledge.  (and even if so, would require a very expert knowledge level beyond most developers' expertise)

So I think it is fair to consider the onus on binutils/LD developers to make a compensating change, to maintain backwards compatibility.  Because the breaking change has been introduced on this end.

As I see, a couple options:

1) You could rollback or relax whatever got fixed/tightened in binutils/LD-2.36, so it still fully accepts the legacy linker scripts produced by FPC 2.6.4.  This would be most preferable from an FPC end-user perspective, requiring no change in usage.

2) If accepting the faulty linker script in LD is considered very troublesome, like a security hole for example, you could put the backwards compatibility on an option flag.  This would allow usage from FPC via its flag
    -k<x>  Pass <x> to the linker
This is not ideal for FPC devs, because they will encounter the segfaulting executable, and have to research this problem, and update their build parameters.  But at least it provides some workaround, so legacy applications remain buildable.

3) Additional options, suggestions... ?

I stress again, this incompatibility between LD / FPC completely breaks the entire FPC 2.6.4 series toolchain, which is a giant stack of high quality software, including a comprehensive LIBC equivalent, plus cross-platform GUI,  which is widely used for custom business, industrial, scientific, and gaming applications.  

You do not necessarily hear about these applications because they are in specialized fields.  FPC/Lazarus developers and their customers just install some version of Debian, etc and deploy their app.  Also, 1) many of these developers and applications might not be well-connected with the open-source community, or use Windows, etc. and 2) most Linux distros are still using the older versions of binutils.  This won't start majorly affecting downstream until the mainline Debian/Ubuntu, for example, goes past 2.35.  It's a potential tidal wave of damage waiting to happen.

SUMMARY: Binutils/LD is the only working linker for FPC 2.6.4 series toolchain.  It is a relatively tiny piece of a huge stack, and has changed in a way which systemically breaks the compiler and every single application developed with it. Backward compatibility is crucially needed.  Please find a way to support this.
Comment 21 John B Thiel 2022-02-28 15:01:34 UTC
ReOpened.
I noticed it was marked Resolved, but it has not been resolved at all, just diagnosed.  The way FPC 2.6.4 produces linker scripts is "etched in stone" and cannot be changed, since it is a static legacy series.  It is binutils/LD that changed recently to make it incompatible, so a compensating back-compatible change is needed in LD.
Comment 22 Nick Clifton 2022-02-28 16:23:05 UTC
(In reply to John B Thiel from comment #20)
 
> From an overall system perspective, it does not do the end-user any
> good to say, bug there not here.

Well we are not talking to users or developers, we are talking to you.


> Because FPC 2.6.4 is a legacy version, the way it produces the
> linker script is basically "etched in stone".  The FPC team cannot
> release a new version in that branch. 

Really ?  Why not ?  What if a security bug is found in the compiler ?

Plus are you saying that the FPC compiler actually manufactures a
program specific linker script on the fly ?  Ie it does not just have
a script as a single file as part of the FPC package which it uses
whenever it needs to perform a link ?

If the script is manufactured, how is it manufactured ?  Can it be
edited ?  Could a step be inserted between the generation of the
script and the invocation of the linker which performs any necessary
transliterations ?


> And it is neither possible for end-user developers to
> customize the link scripts on-the-fly, to my knowledge.  (and even
> if so, would require a very expert knowledge level beyond most
> developers' expertise)

Well if everything is totally fixed and unchangeable then you really
need to have a set-in-stone version of the linker too, and not be
attempting to use newer versions.  Otherwise problems like this will
arise again when even newer versions of the linker are released.


> So I think it is fair to consider the onus on binutils/LD developers
> to make a compensating change, to maintain backwards compatibility.
> Because the breaking change has been introduced on this end.

No - the breaking change was fixing a bug.  We are not going to
reintroduce that bug just for you.  If you want the old behaviour,
use the old linker.


> 1) You could rollback or relax whatever got fixed/tightened in
> binutils/LD-2.36, so it still fully accepts the legacy linker
> scripts produced by FPC 2.6.4.  This would be most preferable from
> an FPC end-user perspective, requiring no change in usage.

There would still need to be some way for the linker to detect if it
is handling these old legacy scripts, which would involve adding
something - either a new linker command line option or a new keyword
in the linker script.  So I do not think that this alternative will
work.


> 2) If accepting the faulty linker script in LD is considered very
> troublesome, like a security hole for example, you could put the
> backwards compatibility on an option flag.  This would allow usage
> from FPC via its flag
>     -k<x>  Pass <x> to the linker
> This is not ideal for FPC devs, because they will encounter the
> segfaulting executable, and have to research this problem, and
> update their build parameters.  But at least it provides some
> workaround, so legacy applications remain buildable.

Unless of course the FPC compiler automatically adds this option for
the developer without them having to do anything.


> I stress again, this incompatibility between LD / FPC completely
> breaks the entire FPC 2.6.4 series toolchain, which is a giant stack
> of high quality software, including a comprehensive LIBC equivalent,
> plus cross-platform GUI,  which is widely used for custom business,
> industrial, scientific, and gaming applications.

And we will reiterate that if the FPC 2.6.4 compiler cannot be changed
then do not change the version of the binutils that you use with it
either.

I am guessing however that you will tell me that this is not possible.
Ie that the FPC compiler cannot have its own linker and that it has to
use the system provided linker.

So, how to move forward ?

If FPC 2.6.4 cannot change, and we are unwilling to make a change in
the upstream binutils sources, then I think that you are going to have
to talk to the distributions themselves.  (Is this just a Debian
problem or do other distributions support FPC 2.6.4 ?)  Distribution
specific patches to the binutils are certainly possible. so maybe that
is the way to proceed.
Comment 23 Alan Modra 2022-03-01 00:02:24 UTC
The segfaults are due to your linker script setting the value of "dot" to near zero with ". = 0 +  SIZEOF_HEADERS;" then containing a .data output section with additional contents over the standard .data section.  That extra .data section then has a vma in the unmapped page at zero (unmapped to catch NULL pointer dereferences).  Unsurprisingly you get segfaults in the loader.  If the linker uses your script with -T, which seems to be the intent, the the whole binary is mapped low.  Segfaults again.

The linker script is plainly and obviously broken.  Newer linkers are simply doing as asked.  Closing, please don't reopen.
Comment 24 John B Thiel 2022-03-01 18:14:05 UTC
(In reply to Nick Clifton from comment #22)
> (In reply to John B Thiel from comment #20)
>  
> > Because FPC 2.6.4 is a legacy version, the way it produces the
> > linker script is basically "etched in stone".  The FPC team cannot
> > release a new version in that branch. 
> 
> Really ?  Why not ?  What if a security bug is found in the compiler ?

As I understand, because of resource constraints, they cannot support this version anymore, and will not release any further updates.  I consider this very unfortunate, but it appears to be a hard reality.  This version was last of the 2.x series, and has been super reliable for me, for many years.


> Plus are you saying that the FPC compiler actually manufactures a
> program specific linker script on the fly ?  Ie it does not just have
> a script as a single file as part of the FPC package which it uses
> whenever it needs to perform a link ?

Yes, it generates a custom link script on-the-fly, and executes it when building an executable.  As in my demo attachment above, you can see the link script by using these flags:
      -sh        Generate script to link on host
      -st        Generate script to link on target

With  '-sh' flag for example, it just compiles into .o files, and generates ppas.sh and link.res files, which you can run manually, and inspect to see the exact linking process.

The script is generated line-by-line via code in the compiler.  There is some conditional logic around platform, cpu, compilation mode, what libraries are being included, etc.

So it needs recompiling the compiler to alter this.
It is definitely not a single boilerplate file you can just swap in.
Code for this generation appears in
  /usr/local/fpc-2.6.4/src/fpc-2.6.4/compiler/systems/t_linux.pas


> Could a step be inserted between the generation of the
> script and the invocation of the linker which performs any necessary
> transliterations ?

I think it would be extremely unwieldy and fragile to post-process the script like that, and it replicates what the compiler already does.


> > So I think it is fair to consider the onus on binutils/LD developers
> > to make a compensating change, to maintain backwards compatibility.
> > Because the breaking change has been introduced on this end.
> 
> No - the breaking change was fixing a bug.  We are not going to
> reintroduce that bug just for you.  If you want the old behaviour,
> use the old linker.
> 

While I understand, even agree, with that position, there is also a long-established tradition of maintaining backward-compatibility in applications.

Especially a tool like LD linker, which by definition does essentially nothing useful on its own.  It basically exists to serve the many clients -- compilers and application platforms that rely on it.  FPC 2.6.4 was a major client relying on it, which you are now disregarding, and consequently wholly breaking;  not just some rare outlier case, but every single application.

You frame the recent changes in 2.36+ as a bugfix, which presumably they were, and  someone benefitted. But it's not good service to FPC 264, that's for sure.  From that point of view it's 100% breakage, not a bugfix.

It doesn't matter if the FPC link script was "plainly and obviously broken", per Alan.  The actual fact is, it wasn't broken, because it worked!

And this FPC 264 version works very well on other platforms, on legacy platforms, old versions of Windows, OSX, etc. It just works.  Except now it doesn't work on Linux because of the binutils/LD change.

To my mind, this is a situation where backwards-compatibility ought be considered.
Then it's a question of how difficult and much work it would be on your end.


> 
> > 1) You could rollback or relax whatever got fixed/tightened in
> > binutils/LD-2.36, so it still fully accepts the legacy linker
> > scripts produced by FPC 2.6.4.  This would be most preferable from
> > an FPC end-user perspective, requiring no change in usage.
> 
> There would still need to be some way for the linker to detect if it
> is handling these old legacy scripts, which would involve adding
> something - either a new linker command line option or a new keyword
> in the linker script.  So I do not think that this alternative will
> work.

Is there a version id in the LD script/response files currently?  If not, could one be added, and then everything without a version would be deemed 2.35 and older  (or whatever a good cut point is).

This would also enable you to support other backward compatible concerns in future with other clients.

To generalize this point: is there a defined standard for the linker response/command language?  Then both linker and compilers can be working to the standard, instead of having to chase each other.  

And then the linker can more easily and definitively support multiple versions of the command language, including support for backward-compatibility, errata and special cases like this, instead of the ever-moving-target of "latest version".



> Well if everything is totally fixed and unchangeable then you really
> need to have a set-in-stone version of the linker too, and not be
> attempting to use newer versions.  Otherwise problems like this will
> arise again when even newer versions of the linker are released.
> 
> And we will reiterate that if the FPC 2.6.4 compiler cannot be changed
> then do not change the version of the binutils that you use with it
> either.
> 
> I am guessing however that you will tell me that this is not possible.
> Ie that the FPC compiler cannot have its own linker and that it has to
> use the system provided linker.

I have likewise considered trying to patch in the older binutils, as another absolute last ditch solution.
FPC has a provision for invoking a custom linker, via flag:
      -XP<x>     Prepend the binutils names with the prefix <x>

Along this line, I would build my own private version of binutils, and use the linker in there.
I have no idea yet if this is actually possible, and how it would interact with the system binutils.  It seems fraught with complexities and hidden problems, and extremely onerous and heavyweight.  But it might end up being the best/only solution here.

This general idea is indeed what FPC has moved to -- they have an internal linker which is used on some platforms, and even Linux in later versions.  So it is tightly coupled with the FPC version.

Currently, on Gentoo, I can have multiple binutils installed simultaneously, and use the 'eselect' utility to switch between them.  But only one is ever visible at a time.  And it is a rolling platform, so binutils-2.35 will eventually drop out.


> So, how to move forward ?
> 
> If FPC 2.6.4 cannot change, and we are unwilling to make a change in
> the upstream binutils sources, then I think that you are going to have
> to talk to the distributions themselves.  (Is this just a Debian
> problem or do other distributions support FPC 2.6.4 ?)  Distribution
> specific patches to the binutils are certainly possible. so maybe that
> is the way to proceed.

This issue has almost nothing to do with distros, except that the working binutils-2.35 is still avail/mainline in most of them, but will be rolled away soon.

FPC is by design an almost self-contained cross-platform ecosystem with very few dependencies.  On other platforms FPC uses its own internal linker, and I believe also on Linux they have moved to that in later versions.
Comment 25 John B Thiel 2022-03-03 15:02:35 UTC
(In reply to Alan Modra from comment #23)
> The segfaults are due to your linker script setting the value of "dot" to
> near zero with ". = 0 +  SIZEOF_HEADERS;" then containing a .data output
> section with additional contents over the standard .data section.  That
> extra .data section then has a vma in the unmapped page at zero (unmapped to
> catch NULL pointer dereferences).  Unsurprisingly you get segfaults in the
> loader.  If the linker uses your script with -T, which seems to be the
> intent, the the whole binary is mapped low.  Segfaults again.
> 

Thanks much Alan, for this info and explanation.

Can you please point out more exactly what line(s) of the link.res are wrong, and how to correct it?

I had looked at the "fixed link.res" versions HJ Lu attached, but those are not showing corrections, they just edit the search paths for your local copy of the object files. And the last one (13989) just deleted all the actual detail specs, thus it falls back to builtin defaults in LD, I assume.  That might work for this example helloworld, but it might not for a more complex application.  Anyways, it doesn't give me any clue what is actually wrong with the link.res script from FPC.

If you guys absolutely willnot fix this, then I will have to consider extreme measures like patching the compiler for myself.  So I need to understand exactly what is the problem line(s), and how to correct it.

The other reference we have is the change FPC team applied in 3.x series,  which adapted to the recommended -T approach in latest LD versions. (the commit/4564bffb85e5947cf7bdfa3e2c67bc032775d0c5  I noted above)   
I would hope to avoid understanding/backporting that whole concept, at least to start. I just want the minimum patch to get this link.res working.


> The linker script is plainly and obviously broken.  Newer linkers are simply
> doing as asked.  Closing, please don't reopen.

Ok, I thought status was changed inadvertently.  It's obviously not resolved or invalid, binutils/LD 2.36+ doesnt work for FPC 2.6.4, and nothing so far has given a fix.

You are calling it wontfix, I guess. Which is extremely disappointing, and pushes the problem to me and other end-user developers.  Deciphering arcane linker scripts is not my expertise or responsibility.  I have already put in a substantial effort and contribution in debugging the problem to this point, and submitting multiple bug reports on multiple sites, and trying to motivate and explain the severity of this.