Bug 22831

Summary:	ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed
Product:	binutils	Reporter:	Luke Kenneth Casson Leighton <lkcl>
Component:	ld	Assignee:	Not yet assigned to anyone <unassigned>
Status:	NEW ---
Severity:	enhancement	CC:	dion, giovanni.lostumbo, hjl.tools, ian, nickc, sam
Priority:	P3
Version:	2.31
Target Milestone:	---
Host:		Target:
Build:		Last reconfirmed:	2018-02-11 00:00:00
Attachments:	repro test case updated version of liinker torturer attachment-2923945-0.html ld.c 3-29-1991 reads object files and libraries 2x to minimize memory

Description Luke Kenneth Casson Leighton 2018-02-10 23:38:43 UTC

ok so this is quite complex / comprehensive, as it's a performance-related
bug that is slowly getting more and more critical as software such as
firefox and chromium (particularly when compiled with debug symbols enabled)
get inexorably larger and larger.

back in 2008 i decided to add gobject bindings to webkit.  had a really nice
laptop, 2gb of RAM, blah blah processor, blah blah disk space, took an hour
to do the compile phase, couldn't care less, plodded along, did the job.
switched off -g because it just took far too long, and didn't need it.

one day i needed to debug something particularly tricky, and i accidentally
did a recompile and ended up with like 100mb object files instead of the
usual 5mb or whatever they are.

when it came to the linker phase all hell broke loose.  the laptop went into
total meltdown, loadavg shot up to 50 and above, the X server became completely
unresponsive, i had to hold my fingers down on the ctrl-alt-f1 key combination
for over a minute to get to console and recover the machine.

turns out that it had gone into complete and utter swap-space thrashing.

on further investigation and thought i realised that the huge amount of
cross-referencing that goes on at the linker phase basically means that if
*EVEN ONE* object file is not permanently ram-resident the entire system
goes into total thrashing meltdown.

some of debian's smaller build systems now spend SEVERAL DAYS performing the
linker phase for these insanely-large binaries.

now, i completely ignored this because it's not my problem, not my
responsibility, nothing to do with me, blah blah, you're the maintainers
of binutils, you're doing a superb job of coordinating things.

however i don't believe anyone has informed you quite how out-of-hand things
are getting.  firefox now requires SEVEN GIGABYTES of RESIDENT MEMORY to
perform the final linker phase.

that's beyond the memory address space of every single 32-bit processor
on the planet, which is very very serious.  

why is it serious?  well, it's because it makes perfectly good
32-bit hardware look completely useless, and HUNDREDS OF MILLIONS OF
COMPUTERS will END UP IN LANDFILL very very quickly over the next 1-2
years unless you respond and do something about this bug.

in looking at this:
https://sourceware.org/bugzilla/show_bug.cgi?id=12682

whatever it is, it's basically... not enough.  trying to *reduce* memory
overhead is simply not radical enough.  why?  because what happens is,
in 2-3 years time as the inexorable and insane increase in complexity
of software goes up, yet another limit will be hit, and another, and another.

so the entire linker algorithm and the impact on the system needs to be
looked at.

back in 1989 our year was given a very interesting problem to solve.  we
were asked, "if you had to do an ultra-large matrix multiply where the
disk storage capacity far exceeded the available RAM, how would you go about
it?"

whilst most people did a "best guess" algorithm, i analysed the problem and
realised that the amount of time *pre-analysing* the problem was entirely
computationally-based and that the actual multiply would, by virtue of being
I/O bound, be far and above the most time-consuming part.

so i took a brute-force approach.

the algorithm basically said, "ok, if you allow N object files to be
resident at a time until all of their functions are linked with all
other object files, and you allow M object files to be brought in
temporarily to link with the N-sized group, brute-force go through
ALL possible combinations of values of N and M to give the best
estimate for the linker time".

it's clearly going to be a little bit more complex than that, as the
amount of available resident RAM is going to dynamically change depending
on system load.

also, it's not quite N "object files" it's "a size of RAM where object
files happen to sit, resident permanently until all functions are linked"

but you get the general idea, the important thing being to remember that
at the end of the "row" (when any one "N" group is completed) you don't
throw the M group out immediately, you bring in the *new* N group and
link with the current (resident) M group... *then* you throw out the
current M group and bring in a new one.

this saves O(M) time on an O(N*M) algorithm where you have absolutely no
idea if making M or N large is significant or not... hence the O(M)
time saving cannot be discounted, and, really, *only* doing this brute
force style analysis is the only real practical option.  anything else -
a more "elegant" algorithm - *will* be sub-optimal.

it was an interesting practical exercise.

anyway i leave it with you.

Comment 1 H.J. Lu 2018-02-11 13:01:35 UTC

Please try binutils 2.30 with "-Wl,--no-keep-memory".

Comment 2 Luke Kenneth Casson Leighton 2018-02-11 13:55:33 UTC

hi HJ, thanks for that advice - bear in mind that i am not actually
directly involved in any of the projects that are experiencing these
insane levels of thrashing.  we (you and i) are therefore talking
"to the general wider internet".

so, for anyone *finding* this bugreport, HJ is recommending that,
if you are a build maintainer and the linker phase is going into
insane thrashing, that you try 2.30 and the option HJ recommends.

now.

here's the thing, HJ: distros have to "fix" the version of binutils
and make it the default / standard for sometimes up to 18 months.
also, there's no *guarantee* that they will ever hear about this option.

can i recommend, if reports start coming in that it works, that this
option be either enabled *by default*... or... that instead, there be
an "auto-resident-memory detect" option similar to that used in gcc,
where it detects available free resident RAM for compiling and uses
that and that alone?

... and then when *that* is stable... make *that* the default.

what do you think?

Comment 3 H.J. Lu 2018-02-27 20:35:16 UTC

Please try users/hjl/pr18028 branch at

https://github.com/hjl-tools/binutils-gdb

Comment 4 Luke Kenneth Casson Leighton 2018-03-01 01:09:30 UTC

(In reply to H.J. Lu from comment #3)
> Please try users/hjl/pr18028 branch at
> 
> https://github.com/hjl-tools/binutils-gdb

hi hjl, i will point some people at this, it may be some time
as one of them is the debian-riscv team, they may be maintaining
special patches so it might not be as straightforward as just
cloning the above branch.

others severely affected include armhf systems (max 2GB RAM, 32-bit)
and i note that the patch from a couple of days ago mentions
"enabled by default on x86", is that correct?  what options would
be needed to try this out?

also i note from the patch commit message it says "change maximum
page size", how would that stop severe / critical thrashing?  how
would it reduce memory usage to only that which is available on
the actual system (like when gcc performs compiles, it only uses
available memory)?

did i miss something?  is there another patch in that branch which
i did not see?

Comment 5 H.J. Lu 2018-03-01 02:59:34 UTC

(In reply to Luke Kenneth Casson Leighton from comment #4)
> (In reply to H.J. Lu from comment #3)
> > Please try users/hjl/pr18028 branch at
> > 
> > https://github.com/hjl-tools/binutils-gdb
> 
> did i miss something?  is there another patch in that branch which
> i did not see?

Please read my suggestion again and follow it to the letter.

Comment 6 Luke Kenneth Casson Leighton 2018-03-01 03:54:15 UTC

(In reply to H.J. Lu from comment #5)

> Please read my suggestion again and follow it to the letter.

sorry, hjl, i appreciate you're busy so are providing extremely
short responses: please read again what i wrote.  i am *not* the
person installing or running this.  i am acting merely as a
*messenger* after seeing and experiencing reports from at least
FIVE separate teams over the past SIX years of increasingly
difficult build problems due to this increasingly-important bug.

i am NOT the person who will be running any of the suggestions
that you are giving (because my laptop will potentially be damaged
by doing so and i cannot risk that), i will be RELAYING the suggestions
to various people across the internet, making them AWARE that you
are willing to tackle this particular problem.

therefore i require and seek CLARITY on EXACTLY what it is
that i am going to tell people, BEFORE suggesting to them that
they come and look at this bugreport.

is there anything that is unreasonable about that?

if so, please let me know.

https://github.com/hjl-tools/binutils-gdb/commit/9999de060bbcc7cca9dce213deeeec6593887a8e 

ok so after re-reading twice, i eventually spotted the (misordered)
branch name.  can i suggest in future, rather than refer to the
main branch, to instead post people the link *directly* to the
branch, like this?

https://github.com/hjl-tools/binutils-gdb/tree/users/hjl/pr18028

it was a simple mistake, much more helpful to say "you missed that
i suggested trying a branch named xyz".

now, i took a quick look, and there is an assumption in the
patch, that the problem will *exclusively* occur on 32-bit systems.

this is not the case.

there are actually *two* inter-related problems.

the FIRST is that the amount of memory used for linking e.g. firefox
is so insane that it now requires 7 GIGABYTES of resident memory in
order to avoid thrashing... this is simply impossible to do on a
32-bit system.

the SECOND is that the linker phase GOES INTO THRASHING IN THE FIRST
PLACE and has done for many many years now INCLUDING on 64-BIT SYSTEMS.

if you read the original bug-report you will see that i said that
one 64-bit x86 laptop that i had, 6 years ago, only had 2GB of RAM.

the one that i have now has *16* GB of RAM but because it is an NVMe
SSD and an ultra-expensive laptop (USD $2500) i cannot risk the NVMe
drive getting damaged so swap is *DISABLED*.  despite this, it still
goes into total meltdown (loadavg over 100) whenever memory usage
approaches 16GB.

for both these systems - both of them *64-bit* systems *NOT* 32-bit
systems - going into swap-space is an absolute unmitigated disaster,
but this is now considered to be NORMAL that a build should go from
taking about 1 hour to link if it is below the 100% resident memory
usage threshold to taking SEVERAL DAYS in some cases if it goes even
the TINIEST FRACTION above the available resident memory...
because distros *do not have any choice in the matter*.

this is why i suggested the algorithm above, because the algorithm
above was part of an exercise set by extremely competent lecturers
at Imperial College University during an era where available memory
was a tiny fraction of what it is now, and running in virtual memory
was simply flat-out inconceivable because most systems were still
16-bit let alone 32-bit.

so.

questions:

1) would the proposed patch - which reduces virtual memory usage for
32-bit systems to half that of the available memory - *actually*
fix the problem as described on a *64-bit* system?

2) what would happen if *more than half* of the available virtual
memory is taken up by programs that happen to be running at the
same time as the linker phase?  consider the cases where, in
complex builds, there may be a REALLY LARGE chain of applications
that have spawned any given usage of the ld executable, such that
the expectation that there will *be* half of the total amount of
virtual memory space even *available* is not actually true.

Comment 7 Luke Kenneth Casson Leighton 2018-03-14 05:30:19 UTC

hi hjl,

so how are you getting on with analysing this problem? is there anything
that is unclear that i can assist you with understanding?

Comment 8 H.J. Lu 2018-03-14 11:45:20 UTC

(In reply to Luke Kenneth Casson Leighton from comment #7)
> hi hjl,
> 
> so how are you getting on with analysing this problem? is there anything
> that is unclear that i can assist you with understanding?

Have you tried users/hjl/pr18028 branch?

Comment 9 Luke Kenneth Casson Leighton 2018-03-14 11:52:51 UTC

(In reply to H.J. Lu from comment #8)

> Have you tried users/hjl/pr18028 branch?

no, hj, i have not, because it is a fix for a 32-bit system,
not a 64-bit system.  what do you need to know to make it
clear that this is a problem that occurs on a 64-bit system
as well as a 32-bit system?

is there a particular reason why you are not answering my
questions?  in particular can i refer you to my questions
at the end of comment #6?  i am trying to understand if
there is anything unclear about my questions.

Comment 10 H.J. Lu 2018-03-14 11:59:22 UTC

(In reply to Luke Kenneth Casson Leighton from comment #9)
> (In reply to H.J. Lu from comment #8)
> 
> > Have you tried users/hjl/pr18028 branch?
> 
> no, hj, i have not, because it is a fix for a 32-bit system,
> not a 64-bit system.  what do you need to know to make it
> clear that this is a problem that occurs on a 64-bit system
> as well as a 32-bit system?

What is your main issue? 32-bit system or 64-bit system?

Comment 11 Luke Kenneth Casson Leighton 2018-03-14 12:06:57 UTC

(In reply to H.J. Lu from comment #10)

> What is your main issue?

 i do not (personally) have an issue, hj.  this is a flaw that is
 independent of me (personally).

 do you mean to ask, "what is THE main issue?"


> 32-bit system or 64-bit system?

 there are two issues:

 1.  32-bit system
 2.  64-bit system

 both 32-bit and 64-bit are affected by this issue.

 the patch that you wrote however looks like it only addresses
 32-bit.

 that leaves 64-bit systems still affected.

Comment 12 H.J. Lu 2018-03-14 12:26:22 UTC

(In reply to Luke Kenneth Casson Leighton from comment #11)
> (In reply to H.J. Lu from comment #10)
>  there are two issues:
> 
>  1.  32-bit system
>  2.  64-bit system
> 
>  both 32-bit and 64-bit are affected by this issue.
> 
>  the patch that you wrote however looks like it only addresses
>  32-bit.

True.  My patch is a starting point.  I'd like to know if it helps
32-bit system or not.  If it doesn't address the issue for 32-bit
system, my approach won't for 64-bit system.
 
>  that leaves 64-bit systems still affected.

You can always and should get more RAM for 64-bit system. But 32-bit
system is limited by address space.

Comment 13 Luke Kenneth Casson Leighton 2018-03-14 13:41:31 UTC

On Wed, Mar 14, 2018 at 12:26 PM, hjl.tools at gmail dot com
<sourceware-bugzilla@sourceware.org> wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22831
>
> --- Comment #12 from H.J. Lu <hjl.tools at gmail dot com> ---
> (In reply to Luke Kenneth Casson Leighton from comment #11)
>> (In reply to H.J. Lu from comment #10)
>>  there are two issues:
>>
>>  1.  32-bit system
>>  2.  64-bit system
>>
>>  both 32-bit and 64-bit are affected by this issue.
>>
>>  the patch that you wrote however looks like it only addresses
>>  32-bit.
>
> True.  My patch is a starting point.  I'd like to know if it helps
> 32-bit system or not.  If it doesn't address the issue for 32-bit
> system, my approach won't for 64-bit system.

 unfortutely i cannot risk damaging my system by carrying out any
tests (because any tests will result in a loadavg over 120 and 30
seconds later it is guaranteed to hard crash).  so we will have to
wait for someone else to test the patch.

>>  that leaves 64-bit systems still affected.
>
> You can always and should get more RAM for 64-bit system.

 i have 16 GB of DDR4 2400 mhz RAM on my laptop... and because when
that system goes into swap (it has an NVMe) its loadavg goes over 120
and it is absolutely guaranteed to crash about 30 seconds later,
adding more RAM is *not* the solution.

 however much more RAM is added, there *will* be a piece of software
within 1-5 years which requires more RAM for the linker phase than any
system provides.

 how does gcc do compilation?  how does it stay within the bounds of
available memory?

l.

Comment 14 H.J. Lu 2018-03-14 16:11:37 UTC

(In reply to Luke Kenneth Casson Leighton from comment #13)
>  i have 16 GB of DDR4 2400 mhz RAM on my laptop... and because when
> that system goes into swap (it has an NVMe) its loadavg goes over 120
> and it is absolutely guaranteed to crash about 30 seconds later,
> adding more RAM is *not* the solution.
> 
>  however much more RAM is added, there *will* be a piece of software
> within 1-5 years which requires more RAM for the linker phase than any
> system provides.
> 

Please try if "-Wl,--no-keep-memory" works.

Comment 15 Luke Kenneth Casson Leighton 2018-03-16 04:12:29 UTC

(In reply to H.J. Lu from comment #14)
> (In reply to Luke Kenneth Casson Leighton from comment #13)
> >  i have 16 GB of DDR4 2400 mhz RAM on my laptop... and because when
> > that system goes into swap (it has an NVMe) its loadavg goes over 120
> > and it is absolutely guaranteed to crash about 30 seconds later,
> > adding more RAM is *not* the solution.
> > 
> >  however much more RAM is added, there *will* be a piece of software
> > within 1-5 years which requires more RAM for the linker phase than any
> > system provides.
> > 
> 
> Please try if "-Wl,--no-keep-memory" works.

i'll alert some people and see if they are in a position to try that.

Comment 16 Luke Kenneth Casson Leighton 2018-06-29 19:53:19 UTC

the following came up in a debian discussion and is copied here:

Florian Weimer <fw@deneb.enyo.de>
8:31 PM (14 minutes ago)

to Luke, Steve, ARM, debian-release, debian-admin, team, debian-gcc, debian-glibc 
* Luke Kenneth Casson Leighton:

>  that is not a surprise to hear: the massive thrashing caused by the
> linker phase not being possible to be RAM-resident will be absolutely
> hammering the drives beyond reasonable wear-and-tear limits.  which is
> why i'm recommending people try "-Wl,--no-keep-memory".

Note that ld will sometimes stuff everything into a single RWX segment
as a result, which is not desirable.

Unfortunately, without significant investment into historic linker
technologies (with external sorting and that kind of stuff), I don't
think it is viable to build 32-bit software natively in the near
future.  Maybe next year only a few packages will need exceptions, but
the number will grow with each month.  Building on 64-bit kernels will
delay the inevitable because more address space is available to user
space, but that's probably 12 to 18 month extended life-time for
native building.

Comment 17 Luke Kenneth Casson Leighton 2018-12-21 23:53:40 UTC

https://issues.guix.info/issue/33676

so we have a successful report that the advised option helps.

please note: the advised option is **NOT** repeat **NOT** a solution.
destroying all of the memory and throwing away useful information
cannot possibly be called a "solution".

ld *really does* need to make *optimal* use of memory, by restoring
the techniques that were used decades ago, and to use dynamic 
analysis of the amount of available *RESIDENT* memory, so as to
very very specifically avoid swapping.

Comment 18 H.J. Lu 2018-12-22 00:35:48 UTC

(In reply to Luke Kenneth Casson Leighton from comment #17)
> https://issues.guix.info/issue/33676
> 
> so we have a successful report that the advised option helps.
> 

Have you tried my users/hjl/pr18028 branch?

Comment 19 Luke Kenneth Casson Leighton 2018-12-22 01:24:43 UTC

(In reply to H.J. Lu from comment #18)
> (In reply to Luke Kenneth Casson Leighton from comment #17)
> > https://issues.guix.info/issue/33676
> > 
> > so we have a successful report that the advised option helps.
> > 
> 
> Have you tried my users/hjl/pr18028 branch?

 as i mentioned before, i (personally) do not have the resources
 to try anything out: i am acting as a go-between, to find people
 who *can* try out different branches.

 i took a look at the diffs:

 https://github.com/hjl-tools/binutils-gdb/compare/users/hjl/pr18028#diff-e65a96fc956244cba3a031705b7b737aR3484

 some comments:

 bfd/linker.c line 3492 - i see what's going on.  this is great,
 it *in principle* makes sure that the amount of memory used is
 not exceeded.

 bfd/linker.c line 3484 - this is completely arbitrary.  this is
 NOT repeat NOT, as i have already said, and repeat, NOT limited
 to 32 bit.  64-bit systems ALSO HAVE THE EXACT SAME PROBLEM.
 this test needs to be removed.

 ld/ldmain.c: line 275 - specifying half the memory is arbitrary.

 so, as i said: it is not enough.  what if the amount of memory
 used by other programs exceeeds half the available memory?

 conditions where that will occur immediately: make -j2.

 one ld process will take half the memory

 the other ld process will take half the memory.

 now BOTH processes will enter thrashing.

 as i said, right in the original report: it is necessary to DYNAMICALLY
 check the amount of available memory, just like gcc does.

 in that way, ld will remain DYNAMICALLY under the limit, it will STAY
 in resident memory.


 ld must be prevented from going into swap space, at all costs, basically.

Comment 20 Luke Kenneth Casson Leighton 2019-01-08 12:01:31 UTC

ok so i spoke to dr stallman a couple of weeks ago, and he confirmed that code that is near-identical to that which i described in the very first comment of this bugreport was REMOVED some time in the late 1990s, by persons not familiar with the type of issues that linking has to deal with.

the original code that dr stallman wrote did two things:

(1) checked to make absolutely sure that it stayed within the bounds of
    RESIDENT available memory, if it could.
(2) that it ONLY loaded into memory the maximum number of object files that
    would ensure that it remained within bounds of resident available
    memory, if it could.

this code is essential to research and restore its functionality.  this is
NOT a 32-bit-only problem.

Comment 21 Luke Kenneth Casson Leighton 2019-01-08 12:04:35 UTC

to emphasise that this is strategically becoming an absolutely critical bug:
https://lists.debian.org/debian-devel/2019/01/msg00081.html

here it has been reported that even when using -Wl,--no-keep-memory, firefox
completely fails to build on a 32-bit system.

the debian developers are presently testing cross-compiling 32-bit packages
from 64-bit hosts.

they report that ubuntu has *already* moved over to this procedure.

32 bit distributions are no longer self-hosting.

this bug is now a priority 1 critical bug.

Comment 22 Luke Kenneth Casson Leighton 2019-01-08 12:11:46 UTC

Created attachment 11522 [details]
repro test case

attached is a test file that can generate a Makefile and associated
header and c files that will easily exceed the capacity of a 64-bit
system to cope with.

here are arguments to the script that will cause GNU ld to attempt
to create an EIGHTEEN GIGABYTE executable.

$ python evil_linker_torture.py 3000 400 200 500000

a 32-bit system will be completely unable to cope with this, as it
hopelessly exceeds the 4GB resident limit by 450%.  when compiled on
a 64-bit system it was necessary to terminate it with prejudice, as
by the time it got to 9.5GB resident memory it was in danger of putting
the compile host into severe and irrecoverable swap thrashing.


if the Makefile is modified to include the option
"-Wl,--no-keep-memory", the following output is
generated and the errors result in the link phase terminating
unsuccessfully.


ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
ld: src9.o: in function `fn_9_0':
/home/lkcl/src/ld_torture/src9.c:3006:(.text+0x27): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1149_322' defined
in .text section in src1149.o
ld: /home/lkcl/src/ld_torture/src9.c:3008:(.text+0x41): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1387_379' defined
in .text section in src1387.o
ld: /home/lkcl/src/ld_torture/src9.c:3014:(.text+0x8f): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1821_295' defined
in .text section in src1821.o
ld: /home/lkcl/src/ld_torture/src9.c:3015:(.text+0x9c): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1082_189' defined
in .text section in src1082.o
ld: /home/lkcl/src/ld_torture/src9.c:3016:(.text+0xa9): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_183_330' defined
in .text section in src183.o
ld: /home/lkcl/src/ld_torture/src9.c:3024:(.text+0x111): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_162_394' defined
in .text section in src162.o
ld: /home/lkcl/src/ld_torture/src9.c:3026:(.text+0x12b): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_132_235' defined
in .text section in src132.o
ld: /home/lkcl/src/ld_torture/src9.c:3028:(.text+0x145): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1528_316' defined
in .text section in src1528.o
ld: /home/lkcl/src/ld_torture/src9.c:3029:(.text+0x152): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1178_357' defined
in .text section in src1178.o
ld: /home/lkcl/src/ld_torture/src9.c:3031:(.text+0x16c): relocation
truncated to fit: R_X86_64_PLT32 against symbol `fn_1180_278' defined
in .text section in src1180.o
ld: /home/lkcl/src/ld_torture/src9.c:3035:(.text+0x1a0): additional
relocation overflows omitted from the output
^Cmake: *** Deleting file `main'
make: *** [main] Interrupt

Comment 23 Nick Clifton 2019-01-09 17:27:34 UTC

(In reply to Luke Kenneth Casson Leighton from comment #22)

Hi Luke,

> $ python evil_linker_torture.py 3000 400 200 500000

Actually this ran OK on my system.  Admittedly it is a fairly big 
machine, and I am sure that you could suggest increased parameters
that would bring it to its knees.

I was a little bit confused as to why the "-g" flag appears three
times in both CFLAGS and LDFLAGS.  Is this really necessary ?

Anyway, my main question is - have you tried using the gold linker
instead of the bfd linker ?  (Ie adding "-fuse-ld=gold" to the final
command line).  The reason being that the bfd linker is very old, and
it is not wholly surprising that it does not cope well with modern,
very large, binaries.  The gold linker on the other hand is new, it
has been designed from the ground up with large ELF programs in mind,
and it does not have any of the cruft that encumbers the bfd linker.

Cheers
  Nick

PS.  Waving my "devil's advocate" flag for a moment.  It could be argued
that not linking these gigantic binaries might actually by a good thing,
as they are getting ridiculously large.  Such binaries are going to take
a huge amount of time (and resources) to link, and if linkers were to
refuse to link them, then the programmers might have to rethink their 
monolithic approach and maybe come up with a more modular design.  Which
might not be a bad thing at all...

Comment 24 Luke Kenneth Casson Leighton 2019-01-09 22:13:30 UTC

hiya nick, thanks for trying out the torture program.  basically the
parameters there generate a 6.1mb object file (with gcc 7.3), and 3000x
that equals an 18 gbytes executable.

so, it's possible to work out what needs to be done: increase the 2nd
or 3rd parameter directly proportionately so as to ensure that the object
file increases to where the available RAM will be exceeded.

regarding ld-gold:
https://lists.debian.org/debian-devel/2019/01/msg00069.html

so no, it doesn't work. mike hommey tried gnu gold for firefox
on debian 32-bit: everything he's tried has failed.  that leaves
cross-compiling using a 64-bit system as literally *the* only
option (which is completely unacceptable as a band-aid "solution")

regarding "-g -g -g": it increases the amount of debug information,
and consequently is a quick-hack way to increase the size of the
output binary.


regarding the evil idea of letting the limit be hit and weeding out
applications that try it, on the basis that it's pretty insane to
have such massive static executables: i really like it :)

... except... the first casualty is already being hit, and that's
*all* 32-bit hardware.  armhf, armel, i686, MIPS32 and a few more
besides.  all distros supporting 32-bit hardware are currently
going through hell, and/or are *DROPPING* 32-bit support entirely,
whilst 64-bit hardware continues to "accept" the insane inexorable
increase in static executable size.

so, perfectly good 32-bit hardware is being thrown into landfill
because there's absolutely no way they can get hold of a modern
distro that works on it...

... all because of this one bug that dates back to a short-sighted
decision from the late 1990s.

hence why i raised this to priority one critical level a couple of
days ago.

Comment 25 Ian Lance Taylor 2019-01-09 23:48:45 UTC

When using gold the key options are --no-mmap-output-file --no-map-whole-files --no-keep-files-mapped.  Can you confirm that those options--all of them together--were tried with gold?

Comment 26 Luke Kenneth Casson Leighton 2019-01-14 06:45:14 UTC

(In reply to Ian Lance Taylor from comment #25)
> When using gold the key options are --no-mmap-output-file
> --no-map-whole-files --no-keep-files-mapped.  Can you confirm that those
> options--all of them together--were tried with gold?

hi ian,

as i mentioned to hj, i personally do not have safe resources
(resource that would not be damaged by doing so).  i am acting as a
go-between, alerting various people to the nature of this bug.

i will contact them and alert them to the options that you describe.

Comment 27 Luke Kenneth Casson Leighton 2019-01-16 09:59:38 UTC

Created attachment 11540 [details]
updated version of liinker torturer

i decided to run an i386 debian chroot, using a variant of evil_linker_torture.py
and running these options:
$python evil_linker_torture.py 80 40 20 800000000

the results of the link (an error) are below:

$ make -j8 maing
i686-linux-gnu-ld.gold src0.o src1.o src10.o src11.o src12.o src13.o src14.o src15.o src16.o src17.o src18.o src19.o src2.o src20.o src21.o src22.o src23.o src24.o src25.o src26.o src27.o src28.o src29.o src3.o src30.o src31.o src32.o src33.o src34.o src35.o src36.o src37.o src38.o src39.o src4.o src40.o src41.o src42.o src43.o src44.o src45.o src46.o src47.o src48.o src49.o src5.o src50.o src51.o src52.o src53.o src54.o src55.o src56.o src57.o src58.o src59.o src6.o src60.o src61.o src62.o src63.o src64.o src65.o src66.o src67.o src68.o src69.o src7.o src70.o src71.o src72.o src73.o src74.o src75.o src76.o src77.o src78.o src79.o src8.o src9.o -g -g -g --no-mmap-output-file --no-map-whole-files --no-keep-files-mapped --no-keep-memory -o maing
i686-linux-gnu-ld.gold: internal error in convert_types, at ../../gold/gold.h:192

this is with the following version:
$ gold --version
GNU gold (GNU Binutils for Debian 2.28) 1.14

the most likely reason is that the size of the executable is over 6GB, and a 32-bit version of gold cannot cope.

when run on 64-bit it does fine, the only strange thing being that it still requires 7GB of resident RAM to link a 6GB executable.

ld-bfd - with "--no-keep-memory" - only requires 750 MB of resident RAM, to link the exact same 6GB executable.

Comment 28 Luke Kenneth Casson Leighton 2019-01-16 10:13:17 UTC

(In reply to Luke Kenneth Casson Leighton from comment #27)

> ld-bfd - with "--no-keep-memory" - only requires 750 MB of resident RAM, to
> link the exact same 6GB executable.

(and aside from a warning "i686-linux-gnu-ld.bfd: warning: cannot find entry symbol _start; defaulting to 0000000008048094",  succeeded.

correction: i had added one too many "0s" onto the evil python command.
however after correction, the results are exactly the same:

debian-i386-chroot$ python evil_linker_torture.py 80 40 20 80000000

* make maing FAILs
* make main SUCCEEDs (except only using 85mb for the linker phase)

so this is far more complex and involved than it first appears.

Comment 29 Luke Kenneth Casson Leighton 2019-01-16 12:18:21 UTC

i tried the same massive 6GB link as was carried out under an i386 (32-bit)
chroot.  this time both of them succeeded.  ld-bfd with --no-keep-memory
succeeded as before with a warning, using only 280mb during the linker
phase (the number of functions called had been increase:
python evil_linker_torture.py 800 400 20 8000000)

ld-gold *also* succeeded, once again requiring 6.5 GB of resident RAM
to carry out the link [on a 64-bit system].

it would appear that the options recommended to use in comment #25 do
not prevent ld-gold from mallocing the full memory of the full size
of the target executable.

consequently, attempting to link a 6 GB executable on a 32-bit system
(with an obvious limit of 4GB) is guaranteed to fail.

Comment 30 Luke Kenneth Casson Leighton 2019-01-26 16:31:00 UTC

cross-reference here, raised priority critical bug in the debian bugtracker
as well:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=919882

Comment 31 Alan Modra 2022-07-23 02:37:03 UTC

Putting priority and severity back where they belong.

Comment 32 luke.leighton 2022-07-23 10:08:15 UTC

(replying privately)

dealing with this one was deeply unpleasant. i gave up as people were not listening.  i refer people to it frequently whenever they encounter serious build problems.

the torture demo is dead easy to autogenerate programs that crash both ld and gold, for both 32 and 64 bit. for 64 bit just keep increasing the parameters until programs exceed 16 gbytes in size and in some cases they won't even link at all.

there are multiple complaints by distro builders that their 128 GB and 256 GB build farms actually kernel panic if they happen to accidentally have e.g. firefox, libreoffice and other massive linking occur simultaneously, due to thrashing.  with 128 GB of RAM!

i have had my very expensive laptop hit 1,200 loadavg due to this problem, it nearly lost me a year's work and took 25 minutes to get the cursor to move so i could hold down Ctrl-C and terminate the build.

it's exacerbated significantly by debug builds.

l.

On July 23, 2022 3:37:03 AM GMT+01:00, amodra at gmail dot com <sourceware-bugzilla@sourceware.org> wrote:
>https://sourceware.org/bugzilla/show_bug.cgi?id=22831
>
>Alan Modra <amodra at gmail dot com> changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>           Severity|critical                    |enhancement
>             Status|WAITING                     |NEW
>           Priority|P1                          |P3
>
>--- Comment #31 from Alan Modra <amodra at gmail dot com> ---
>Putting priority and severity back where they belong.
>
>-- 
>You are receiving this mail because:
>You reported the bug.

Comment 33 luke.leighton 2022-07-23 10:14:06 UTC

Created attachment 14226 [details]
attachment-2923945-0.html

that was supposed to be a private reply, bugzilla masked the email address "amodra <bugzilla@....>".  the comment still stands though. i apologise for the toppost context.

Comment 34 Alan Modra 2022-07-23 14:04:07 UTC

I'll note that the priority and severity fields in bugzilla are primarily for the use of maintainers, or at least that should be the way they are treated.  They are not for bug reporters to say "this bug is really, really important!"  That said, I've experienced exactly the pain you ran into with a machine swapping like crazy, in fact it used to happen to me quite regularly.  And some things we do, like trying to free memory before exit to pacify people crying "memory leak!" only make things worse when you run into swap.  I had one link take 30 minutes extra just freeing memory..

In putting the severity at "enhancement" I'm merely reflecting reality.  Using more memory than necessary is not a bug, at least not until you run out of memory.  Even with ideal memory usage you will always be able to generate a workload that is just too big to handle.

And "new algorithm needed" is really saying "rewrite the linker".  That's low priority.  Also, there are other linkers, eg. gold and lld, that are much newer than ld.bfd.  They don't do much better at memory usage, do they?

Comment 35 Luke Kenneth Casson Leighton 2022-07-23 14:44:28 UTC

On Sat, Jul 23, 2022 at 3:04 PM amodra at gmail dot com
<sourceware-bugzilla@sourceware.org> wrote:

> And "new algorithm needed" is really saying "rewrite the linker".

i mention this very early on in this bugreport: back in the early 90s
it was indeed rewritten,
to remove Dr Stallman's algorithms, on the flawed assumption

      "640k^H^H^H^H 4GB should be enough for anybody".

> That's low
> priority.  Also, there are other linkers, eg. gold and lld, that are much newer
> than ld.bfd.

gold suffers from similar problems - i was able to make it keel over just as
easily.

i've not heard of lld before: if it likewise makes the same flawed assumption
that going into swap is acceptable, it will likewise result in the exact same
problem.

>  They don't do much better at memory usage, do they?

if Dr Stallman's carefully-crafted original algorithms had been
left in place, which, just as in gcc, made *really certain* to only use
*resident* RAM,  we would not be having this conversation as
this bugreport would not need to be raised.

the fundamental flawed assumption is that it's "ok to use swap".

the sheer overwhelming amount of cross-referencing required in a
linker *100% guarantee* that even 10 kbytes over resident RAM
will result in thrashing.

any rewrite or redesign that does not take that into account is 100%
guaranteed to be problematic. this is just how it is: it's basic fundamental
computer science that a linker *has* to jump around across the
entirety of *all* of the objects it's trying to link.  this makes the
"Working Set" *equal* to 100% of the available Swap, which is
unfortunately the very definition of "thrash conditions".

Comment 36 Giovanni Lostumbo 2022-09-17 01:17:02 UTC

Binutils Version 0.001-2.13 (1988-2002)
http://www.mirrorservice.org/sites/sources.redhat.com/pub/binutils/old-releases/

Binutils 2.30 was released in 2018. As Luke mentions in comment #20, he spoke with Stallman back then, who confirmed that the code that helped ld stay within resident memory was removed in the late 1990s. If the source code is indeed in that University of Kent mirror ^ for all the legacy versions of binutils pre-2003, one should quickly be able to locate the last version of binutils with the original code that Stallman used. I cannot interpret code, but I'm entering this discussion from a hardware design viewpoint. Thrashing results in increased power consumption, and quickly depletes battery and disk life, if it is even successful at compiling.

To help organize a solution to this bug, I propose that the original algorithmic code be identified and analyzed here or somewhere where it can be compared and contrasted to the swap mechanism that replaced it. Then, if the code can be reimplemented (not rewritten, since Luke claims it was already working code- no need to reinvent the wheel here), it can be tested in both 32 bit and 64 bit systems. While I do not understand programming languages, I do understand that there is a possibility that the legacy code had algorithms intrinsic to 32 bit, and may require some adapting for 64 bit. I'm guessing it could also extrapolate to 64 bit, independent of architecture, but that is more of a mathematical question beyond my capabilities.

I can also imagine other use-cases where restoring the original ld algorithms could be immensely efficient/beneficial. Say one is testing an array of new builds, and is modifying a select number of lines in source to test a new functionality or performance. One might develop 20 copies of source, save for a few lines of experimental code. Compiling each may take 24-48 hrs for each compile if it goes into swap space. That's 3-7 weeks of running a laptop or desktop. But if it uses code that never runs into swap, it can complete the compile much faster and and with much less power. Now multiply that by 1000x users, with a server running 20,000 virtual machines (e.g Amazon EC2). The Kwh can add up very quickly, especially for those who cannot test their device locally, and can't afford to rent a server with that many VMs for 24-48 hrs. Swap space can also be less secure. Sensitive data stored on swap could get stuck, especially on a remote server, which could experience a power outage and be accessed at a later time. Data is relatively less vulnerable to theft in RAM.

If Stallman's code prevents the thrashing that arose out of the swap mechanism, then, this bug report would NOT an enhancement, it would be restoring the original, concisely operating functionality.

Comment 37 Giovanni Lostumbo 2022-09-19 10:08:34 UTC

Created attachment 14339 [details]
ld.c 3-29-1991 reads object files and libraries 2x to minimize memory

Comment 38 Giovanni Lostumbo 2022-09-19 10:10:08 UTC

I contacted Dr.(h.c) Richard Stallman the other day to inquire which of original GNU ld versions he wrote.

He replied (spelling errors included and annotated with "[sic]" by me),

"The original GNU ld was written ny[sic] me. I designed it to minimize
total memory usage by reading all the object files and libraries twice
in the right oder[sic].

Others wrote a different ld program in the early 1990s.
I think that was to support additional features and output formats.
However, machines' memory sizes were bigger and they did not preserve
what I had done to reduce the total memory requirement."

Upon checking the ld.c file in Binutils 1.9, Stallman wrote the first version of ld. "/* Written by Richard Stallman with some help from Eric Albert." He also wrote the 1988 and 1988 binutils files, which appear to be betas.

The ld file(s) in Binutils 1.94-beta were written by Steve Chamberlain, and Binutils 2.1 is a completely different version onwards.

Thus it appears that the only official version of binutils that contains the memory minimizing technique is 1.9. I have attached ld.c here along with the source (171KB). Link: http://www.mirrorservice.org/sites/sources.redhat.com/pub/binutils/old-releases/binutils-1.9.tar.bz2

What are the additional features? From an old GNU ld manual, "ld version 2 January 1994":

"This version of ld uses the general purpose BFD libraries to operate on object files. This allows ld to read, combine, and write object files in many different formats--for example, COFF or a.out. Different formats may be linked together to produce any available kind of object file. See section BFD, for more information.

Aside from its flexibility, the GNU linker is more helpful than other linkers in providing diagnostic information. Many linkers abandon execution immediately upon encountering an error; whenever possible, ld continues executing, allowing you to identify other errors (or, in some cases, to get an output file in spite of the error)." Source: https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_1.html#SEC1

Could the original version be adapted to retain memory minimizing techniques while supporting additional output formats? I don't know. It seems like most, if not all the new features result in additional memory usage (size of ld 1.9 is 171 KB uncompressed; size of ld folder in 2.1 is 696 KB-multiple ld files). Whether the original techniques were even tested to work with the new features is something worth exploring.