On Cygwin there is a massive performance regression during the link phase with version 2.41. The degradation is superlinear in the number of objects involved and I have observed link operations that take more than 30× longer than with the previous version. One of the changes suspected was 38395c77, but that doesn't seem to be the main contributor to the regression (at least not in isolation). * cygport protobuf-21.12 compile 2.39: 1420.820u 143.747s 3:20.37 780.8% 0+0k 0+0io 41531073pf+0w 2.40: 1429.088u 140.548s 3:18.48 790.8% 0+0k 0+0io 41615637pf+0w 2.41: 1496.555u 524.457s 10:07.31 332.7% 0+0k 0+0io 41570112pf+0w * only the linking of protobuf-21.12 2.39: 14.212u 2.614s 0:20.54 81.8% 0+0k 0+0io 1909884pf+0w 2.40: 13.371u 0.839s 0:20.46 69.4% 0+0k 0+0io 1910885pf+0w 2.41: 85.507u 373.960s 7:55.39 96.6% 0+0k 0+0io 1905021pf+0w 38395c77/pdb.c w/o call to qsort 84.933u 363.715s 7:37.01 98.1% 0+0k 0+0io 1906464pf+0w 38395c77/pdb.c full revert 82.964u 361.461s 7:30.39 98.6% 0+0k 0+0io 1906266pf+0w
AFter a few false starts since it seems one really needs to freshly configure and compile the whole thing each time this got bisected to: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=014a602b86f08de96fc80ef3f96a87db6cccad56 Why and how this produces the effect I've reported is a massive head-scratcher, but I've confirmed that with this patch reverted 2.41 does not only link as fast as 2.40 again, it is actually faster: 2.41re: 10.494u 1.370s 0:17.01 69.7% 0+0k 0+0io 1904230pf+0w
(In reply to Achim from comment #1) > AFter a few false starts since it seems one really needs to freshly > configure and compile the whole thing each time this got bisected to: > > https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff; > h=014a602b86f08de96fc80ef3f96a87db6cccad56 > > Why and how this produces the effect I've reported is a massive > head-scratcher, but I've confirmed that with this patch reverted 2.41 does > not only link as fast as 2.40 again, it is actually faster: > > 2.41re: 10.494u 1.370s 0:17.01 69.7% 0+0k 0+0io 1904230pf+0w That would be then a bug in cygwin stdio code I suspect ...
(In reply to Andrew Pinski from comment #2) > That would be then a bug in cygwin stdio code I suspect ... Maybe, or maybe it actually has to be that way since the wording in POSIX seems to imply that when certain open modes are in effect, then an fseek requires flushing or the equivalent thereof (when the mode changes, which Cygwin may not be able to detect reliably, so it may have to assume the mode will change).
As I suspected, the newlib fseek/fseeko implementation flushes the file under lock (because it punts to fseek_r/fseeko_r) when in append mode and also for SEEK_CUR before determining the current position (as it has no way of knowing if it was changed by something else). An fflush forces the next read to go to system as mandated by POSIX. There is read optimization when having an absolute position (SEEK_SET) and the stream is read-only. So removing the optimisation for (not) seeking an unchanged file position is not a no-op at least for newlib targets.
Created attachment 15045 [details] reinstate seek optimisation I did wonder whether we'd need something like this when I removed the seek optimisation, which BTW was done for cygwin/mingw. See thread https://sourceware.org/pipermail/binutils/2023-May/127578.html
(In reply to Alan Modra from comment #5) > Created attachment 15045 [details] > reinstate seek optimisation Ist that the full patch? Because last_io is never set to bfd_io_read or bfd_io_write.
That patch certainly looks more thorough than just reverting the change that caused the regression… is that on Git somewhere? I can run a tests with it the upcoming weekend most likely. Btw, Cygwin and MinGW are two completely different things except for the executasble format.
Created attachment 15046 [details] revised patch (In reply to Hannes Domani from comment #6) > Because last_io is never set to bfd_io_read or bfd_io_write. Oops, yes, there needs to be an obvious fix..
(In reply to Achim from comment #7) > is that on Git somewhere? I can run a tests with it > the upcoming weekend most likely. No, the patch is not upstream anywhere yet, I'll wait for someone to test it on cygwin before committing. It applies with "git am" to either binutils master or binutils-2_41-branch.
The master branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f82ee0c8dc4ee32556e23e6cd83ef083618f704f commit f82ee0c8dc4ee32556e23e6cd83ef083618f704f Author: Alan Modra <amodra@gmail.com> Date: Mon Aug 7 08:28:55 2023 +0930 PR30724, cygwin ld performance regression since 014a602b86 According to the reporter of this bug the newlib fseek implementation is likely slowed down by locking and fflush, only attempting to optimise seeks when the file is opened read-only. Thus when writing the output we get a dramatic slowdown due to commit 014a602b86. PR 30724 * bfd.c (enum bfd_last_io): New. (struct bfd): Add last_io field. * bfd-in2.h: Regenerate. * bfd-io.c (bfd_bread, bfd_bwrite): Force seek if last_io is opposite direction. (bfd_seek): Reinstate optimisation for seek to same position.
The binutils-2_41-branch branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=226f2e6b924612ecbdb7dfe4f3ca611116ed77f4 commit 226f2e6b924612ecbdb7dfe4f3ca611116ed77f4 Author: Alan Modra <amodra@gmail.com> Date: Mon Aug 7 08:28:55 2023 +0930 PR30724, cygwin ld performance regression since 014a602b86 According to the reporter of this bug the newlib fseek implementation is likely slowed down by locking and fflush, only attempting to optimise seeks when the file is opened read-only. Thus when writing the output we get a dramatic slowdown due to commit 014a602b86. PR 30724 * bfd.c (enum bfd_last_io): New. (struct bfd): Add last_io field. * bfd-in2.h: Regenerate. * bfd-io.c (bfd_bread, bfd_bwrite): Force seek if last_io is opposite direction. (bfd_seek): Reinstate optimisation for seek to same position. (cherry picked from commit f82ee0c8dc4ee32556e23e6cd83ef083618f704f)
Fixed
Thank you. Going forward I'd suggest you revisit why you need to turn around the same stream and if it might not be more efficient to just have a read and a write stream (really just append I assume) to the same file and a read barrier that follows the write pointer when flushed.