30724 – cygwin ld performance regression since 014a602b86

Bug 30724 - cygwin ld performance regression since 014a602b86

Summary: cygwin ld performance regression since 014a602b86

Status:	RESOLVED FIXED

Alias:	None

Product:	binutils
Classification:	Unclassified
Component:	ld (show other bugs)
Version:	2.41

Importance:	P2 normal
Target Milestone:	2.42
Assignee:	Alan Modra

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-08-05 09:16 UTC by Achim
Modified:	2023-08-17 18:18 UTC (History)
CC List:	4 users (show)

See Also:
Host:	cygwin
Target:
Build:
Last reconfirmed:	2023-08-06 00:00:00

Attachments
reinstate seek optimisation (1.54 KB, patch) 2023-08-06 23:32 UTC, Alan Modra	Details \| Diff
revised patch (1.55 KB, patch) 2023-08-07 22:21 UTC, Alan Modra	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Achim 2023-08-05 09:16:42 UTC

On Cygwin there is a massive performance regression during the link phase with version 2.41.  The degradation is superlinear in the number of objects involved and I have observed link operations that take more than 30× longer than with the previous version.

One of the changes suspected was 38395c77, but that doesn't seem to be the main contributor to the regression (at least not in isolation).

* cygport protobuf-21.12 compile

2.39: 1420.820u 143.747s  3:20.37 780.8%      0+0k 0+0io 41531073pf+0w
2.40: 1429.088u 140.548s  3:18.48 790.8%      0+0k 0+0io 41615637pf+0w
2.41: 1496.555u 524.457s 10:07.31 332.7%      0+0k 0+0io 41570112pf+0w

* only the linking of protobuf-21.12

2.39:   14.212u   2.614s  0:20.54 81.8%       0+0k 0+0io 1909884pf+0w
2.40:   13.371u   0.839s  0:20.46 69.4%       0+0k 0+0io 1910885pf+0w
2.41:   85.507u 373.960s  7:55.39 96.6%       0+0k 0+0io 1905021pf+0w
38395c77/pdb.c w/o call to qsort
        84.933u 363.715s  7:37.01 98.1%       0+0k 0+0io 1906464pf+0w
38395c77/pdb.c full revert
	82.964u 361.461s  7:30.39 98.6%       0+0k 0+0io 1906266pf+0w

Comment 1 Achim 2023-08-05 20:37:36 UTC

AFter a few false starts since it seems one really needs to freshly configure and compile the whole thing each time this got bisected to:

https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=014a602b86f08de96fc80ef3f96a87db6cccad56

Why and how this produces the effect I've reported is a massive head-scratcher, but I've confirmed that with this patch reverted 2.41 does not only link as fast as 2.40 again, it is actually faster:

2.41re:	10.494u   1.370s  0:17.01 69.7%       0+0k 0+0io 1904230pf+0w

Comment 2 Andrew Pinski 2023-08-05 20:42:43 UTC

(In reply to Achim from comment #1)
> AFter a few false starts since it seems one really needs to freshly
> configure and compile the whole thing each time this got bisected to:
> 
> https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;
> h=014a602b86f08de96fc80ef3f96a87db6cccad56
> 
> Why and how this produces the effect I've reported is a massive
> head-scratcher, but I've confirmed that with this patch reverted 2.41 does
> not only link as fast as 2.40 again, it is actually faster:
> 
> 2.41re:	10.494u   1.370s  0:17.01 69.7%       0+0k 0+0io 1904230pf+0w

That would be then a bug in cygwin stdio code I suspect ...

Comment 3 Achim 2023-08-05 21:20:31 UTC

(In reply to Andrew Pinski from comment #2)
> That would be then a bug in cygwin stdio code I suspect ...

Maybe, or maybe it actually has to be that way since the wording in POSIX seems to imply that when certain open modes are in effect, then an fseek requires flushing or the equivalent thereof (when the mode changes, which Cygwin may not be able to detect reliably, so it may have to assume the mode will change).

Comment 4 Achim 2023-08-06 17:21:33 UTC

As I suspected, the newlib fseek/fseeko implementation flushes the file under lock (because it punts to fseek_r/fseeko_r) when in append mode and also for SEEK_CUR before determining the current position (as it has no way of knowing if it was changed by something else).  An fflush forces the next read to go to system as mandated by POSIX.  There is read optimization when having an absolute position (SEEK_SET) and the stream is read-only.
So removing the optimisation for (not) seeking an unchanged file position is not a no-op at least for newlib targets.

Comment 5 Alan Modra 2023-08-06 23:32:48 UTC

Created attachment 15045 [details]
reinstate seek optimisation

I did wonder whether we'd need something like this when I removed the seek optimisation, which BTW was done for cygwin/mingw.  See thread https://sourceware.org/pipermail/binutils/2023-May/127578.html

Comment 6 Hannes Domani 2023-08-07 14:02:47 UTC

(In reply to Alan Modra from comment #5)
> Created attachment 15045 [details]
> reinstate seek optimisation

Ist that the full patch?
Because last_io is never set to bfd_io_read or bfd_io_write.

Comment 7 Achim 2023-08-07 18:49:43 UTC

That patch certainly looks more thorough than just reverting the change that caused the regression… is that on Git somewhere?  I can run a tests with it the upcoming weekend most likely.

Btw, Cygwin and MinGW are two completely different things except for the executasble format.

Comment 8 Alan Modra 2023-08-07 22:21:31 UTC

Created attachment 15046 [details]
revised patch

(In reply to Hannes Domani from comment #6)
> Because last_io is never set to bfd_io_read or bfd_io_write.
Oops, yes, there needs to be an obvious fix..

Comment 9 Alan Modra 2023-08-07 22:29:31 UTC

(In reply to Achim from comment #7)
> is that on Git somewhere?  I can run a tests with it
> the upcoming weekend most likely.
No, the patch is not upstream anywhere yet, I'll wait for someone to test it on cygwin before committing.  It applies with "git am" to either binutils master or binutils-2_41-branch.

Comment 10 Sourceware Commits 2023-08-08 23:25:33 UTC

The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f82ee0c8dc4ee32556e23e6cd83ef083618f704f

commit f82ee0c8dc4ee32556e23e6cd83ef083618f704f
Author: Alan Modra <amodra@gmail.com>
Date:   Mon Aug 7 08:28:55 2023 +0930

    PR30724, cygwin ld performance regression since 014a602b86
    
    According to the reporter of this bug the newlib fseek implementation
    is likely slowed down by locking and fflush, only attempting to
    optimise seeks when the file is opened read-only.  Thus when writing
    the output we get a dramatic slowdown due to commit 014a602b86.
    
            PR 30724
            * bfd.c (enum bfd_last_io): New.
            (struct bfd): Add last_io field.
            * bfd-in2.h: Regenerate.
            * bfd-io.c (bfd_bread, bfd_bwrite): Force seek if last_io is
            opposite direction.
            (bfd_seek): Reinstate optimisation for seek to same position.

Comment 11 Sourceware Commits 2023-08-11 03:33:47 UTC

The binutils-2_41-branch branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=226f2e6b924612ecbdb7dfe4f3ca611116ed77f4

commit 226f2e6b924612ecbdb7dfe4f3ca611116ed77f4
Author: Alan Modra <amodra@gmail.com>
Date:   Mon Aug 7 08:28:55 2023 +0930

    PR30724, cygwin ld performance regression since 014a602b86
    
    According to the reporter of this bug the newlib fseek implementation
    is likely slowed down by locking and fflush, only attempting to
    optimise seeks when the file is opened read-only.  Thus when writing
    the output we get a dramatic slowdown due to commit 014a602b86.
    
            PR 30724
            * bfd.c (enum bfd_last_io): New.
            (struct bfd): Add last_io field.
            * bfd-in2.h: Regenerate.
            * bfd-io.c (bfd_bread, bfd_bwrite): Force seek if last_io is
            opposite direction.
            (bfd_seek): Reinstate optimisation for seek to same position.
    
    (cherry picked from commit f82ee0c8dc4ee32556e23e6cd83ef083618f704f)

Comment 12 Alan Modra 2023-08-11 07:48:34 UTC

Fixed

Comment 13 Achim 2023-08-17 18:18:44 UTC

Thank you.

Going forward I'd suggest you revisit why you need to turn around the same stream and if it might not be more efficient to just have a read and a write stream (really just append I assume) to the same file and a read barrier that follows the write pointer when flushed.