Sources Bugzilla – Bug 10708
"out of file descriptors and couldn't close any" -- probably fd leak
Last modified: 2011-02-27 15:23:15 UTC
QtWebKit builds file with "normal" ld -- but trying to build it with gold fails,
/usr/bin/ld: fatal error: out of file descriptors and couldn't close any
QtWebKit links 1623 object files plus 12 shared libraries. According to
/proc/sys/fs/file-max, it should be possible to have 370804 fds open
Actually this is not caused by a shortage of system FDs, so something else is
causing gold to believe it's out of FDs:
# cat /proc/sys/fs/file-nr
6496 1037 370804
Can you confirm that this was with the development version of gold? There were
some bugs in this area fixed back February.
Otherwise, as far as I can see, this can only happen if open returns -1 with
errno set to ENFILE or EMFILE. Please check ulimit -n.
If this is repeatable, is there any chance that you can debug it a bit?
Yes, this is on a fairly current build:
$ ld --version
GNU gold (Linux/GNU Binutils 220.127.116.11.1.20090905) 1.9
Copyright 2008 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
ulimit -n is 1024, will try again after increasing it to 65536.
It is 100% reproducible, I can do a bit of debugging but don't have a lot of
time right now (insane day job schedule right now).
ulimit -n 65536 fixes it, but using > 1024 FDs still seems somewhat excessive
I have tried to recreate this problem, but failed. As far as I can tell, gold
will react correctly to a lack of file descriptors. I will need more
information on what could be causing the issue for you.
gold will try to open as many file descriptors as it needs. If you give it more
than 1024 input files, then it will open more than 1024 descriptors. However,
if an open fails with ENFILE or EMFILE it will close some descriptors, and will
not try to open that many again.
The error you are getting is the error that gold gives if it runs out of file
descriptors but can not find any to close. It's not reasonable that it would
need to keep 1024 descriptors open--unless perhaps you are running with a very
large number of threads. Are you passing any --thread option to gold?
I could reproduce the problem (also when linking QtWebKit) until a few months
ago. I'm always using the latest CVS version. At the time I simply used the old
ld when linking Qt instead of increasing the appropriate ulimit.
So it looks like this is fixed now.
Bernhard, are you still seeing the problem?
This is still reproducable while linking QtWebKit;
GNU gold (GNU Binutils for Ubuntu 2.20.51-system.20100710) 1.9
It tries to link ~1800 object files btw.
I've been trying to track down possible sources of file descriptor leakage.
I've found one:
In copy_relocs.cc, Copy_relocs::emit_copy_reloc():
typename elfcpp::Elf_types<size>::Elf_WXword addralign =
This, and probably other similar places where we go back to an ELF file for
some info, seems to be leaking file descriptors. The call to
section_addralign() creates an Object::View, and reopens the file descriptor,
but never releases it. Also, at least in this particular case, we're accessing
a different file from the one we currently have locked (the shared library that
contains the definition of the symbol), and we haven't locked the file. If we
had locked the file here, the descriptor would have been released, but I'm not
sure it's safe to lock the shared library at this point -- we're in a
Scan_relocs task, which isn't necessarily single threaded.
I'm wondering whether it would be better to just find and eradicate places
where we need to read a file outside of the times we normally have the file
I have no idea whether this is the cause of the problem reported here, but a
good way to tell is if you can rerun the link with -Wl,--debug=task. That would
give us an idea of where it is when you finally run out of file descriptors.
For this leakage to cause real problems, you'll need lots of shared libraries,
and COPY relocations into lots of them. It seems unlikely, but it's worth a
shot. It's also possible that there are other leakages similar to this that
would trigger under different conditions.
I found another leak that will explain the problem -- if you're using the
--no-keep-files-mapped option (or a 32-bit build of gold, for which that's the
default) and --gc-sections and/or --icf.
Can you try the patch below and let me know if it fixes the problem for you?
RCS file: /cvs/src/src/gold/gold.cc,v
retrieving revision 1.85
diff -u -p -r1.85 gold.cc
--- gold.cc 14 Oct 2010 22:10:22 -0000 1.85
+++ gold.cc 3 Nov 2010 23:39:44 -0000
@@ -359,6 +359,7 @@ queue_middle_tasks(const General_options
p != input_objects->relobj_end();
+ Task_lock_obj<Object> tlo(task, *p);
(*p)->layout(symtab, layout, NULL);
(In reply to comment #10)
> I found another leak that will explain the problem -- if you're using the
> --no-keep-files-mapped option (or a 32-bit build of gold, for which that's the
> default) and --gc-sections and/or --icf.
> Can you try the patch below and let me know if it fixes the problem for you?
> Index: gold.cc
> RCS file: /cvs/src/src/gold/gold.cc,v
> retrieving revision 1.85
> diff -u -p -r1.85 gold.cc
> --- gold.cc 14 Oct 2010 22:10:22 -0000 1.85
> +++ gold.cc 3 Nov 2010 23:39:44 -0000
> @@ -359,6 +359,7 @@ queue_middle_tasks(const General_options
> p != input_objects->relobj_end();
> + Task_lock_obj<Object> tlo(task, *p);
> (*p)->layout(symtab, layout, NULL);
I just found this behaviour when building Chromium on a 32 bit machine using
Gold (binutils 2.21) and Fedora 14. Building from the same source, but in a 64
bit environment was OK. (Ubuntu 10.10 binutils 2.21).
Chromium was OK some week ago with the Gold linker with the Fedora 14 (32-bit).
I tried again, but this time linking with the normal ld, that is, not using
Gold, and then, the build was successful.
I did not yet try your patch, but I will, and let you know.
The patch as proposed in comment #10 works. I can now again build Chromium on
my 32-bit machine.
Module name: src
Changes by: firstname.lastname@example.org 2011-02-27 15:17:29
gold : ChangeLog copy-relocs.cc gold.cc icf.cc
Backport from mainline:
2010-11-05 Cary Coutant <email@example.com>
* copy-relocs.cc (Copy_relocs::emit_copy_reloc): Hold a lock on the
object when reading from the file.
* gold.cc (queue_middle_tasks): Hold a lock on the object when doing
second layout pass.
* icf.cc (preprocess_for_unique_sections): Hold a lock on the object
when reading section contents.
* mapfile.cc (Mapfile::print_discarded_sections): Hold a lock on the
object when reading from the file.
* plugin.cc (Plugin_manager::layout_deferred_objects): Hold a lock on
the object when doing deferred section layout.
Seems to be fixed on mainline and in the upcoming 2.21.1 release.