Bug 25024 - dwz: Multifile temporary files too large
Summary: dwz: Multifile temporary files too large
Status: NEW
Alias: None
Product: dwz
Classification: Unclassified
Component: default (show other bugs)
Version: unspecified
: P2 enhancement
Target Milestone: ---
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-21 21:13 UTC by Jan Kratochvil
Modified: 2019-11-29 15:31 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
lldb-experimental.spec (2.72 KB, text/x-rpm-spec)
2019-09-21 21:13 UTC, Jan Kratochvil
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Kratochvil 2019-09-21 21:13:36 UTC
Created attachment 11998 [details]
lldb-experimental.spec

Tried to check how fast is DWZ but it refuses to run on LLDB debuginfo:

+ dwz -h -q -r -m .dwz/lldb-experimental-10.0.0-0.20190817snap5.fc30.x86_64 -l 1000000000 -L 2000000000 ./opt/lldb-experimental/bin/c-index-test-10.0.0-0.20190817snap5.fc30.x86_64.debug ...debug
dwz: Multifile temporary files too large

real    39m51.803s
user    35m22.622s
sys     1m38.881s
Comment 1 Tom de Vries 2019-09-24 22:58:29 UTC
(In reply to Jan Kratochvil from comment #0)
> Created attachment 11998 [details]
> lldb-experimental.spec
> 
> Tried to check how fast is DWZ but it refuses to run on LLDB debuginfo:
> 
> + dwz -h -q -r -m .dwz/lldb-experimental-10.0.0-0.20190817snap5.fc30.x86_64
> -l 1000000000 -L 2000000000
> ./opt/lldb-experimental/bin/c-index-test-10.0.0-0.20190817snap5.fc30.x86_64.
> debug ...debug
> dwz: Multifile temporary files too large
> 
> real    39m51.803s
> user    35m22.622s
> sys     1m38.881s

I installed a fedora-30 distro in a virtualbox VM, and I've been trying to build this spec today, but I run into:
...
collect2: fatal error: ld terminated with signal 9 [Killed]
compilation terminated.
...

Is there any way you can make available the files required to reproduce this?
Comment 2 Jan Kratochvil 2019-09-26 08:37:02 UTC
I think you just need more RAM, wasn't it OOM-killed? Running it on a 64GB host.
Maybe switching ld->gold would build it for you.
I can upload it but it is 30GB Uncompressed, 8GB xz -1.

But then from a higher point of view I do not think fixing this issue is too much important. xz should rather recode -fdebug-types-section as then the linker already no longer has to build some gigantic intermediate files.

Just -fdebug-types-section currently crashes GCC so I could not test that more:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91887
Comment 3 Tom de Vries 2019-09-26 10:15:33 UTC
(In reply to Jan Kratochvil from comment #2)
> I can upload it but it is 30GB Uncompressed, 8GB xz -1.

Yes please.

Please upload to ftp://ftp.suse.com/pub/incoming/ or provide me with a download link.
Comment 4 Jan Kratochvil 2019-09-26 10:39:39 UTC
https://www.jankratochvil.net/t/dwz-too-large.tar.xz
Comment 5 Jan Kratochvil 2019-09-26 10:41:34 UTC
dwz -h -q -r -m merged -l 1000000000 -L 2000000000 `find -name "*.debug"`
Comment 6 Tom de Vries 2019-09-26 12:40:04 UTC
(In reply to Jan Kratochvil from comment #4)
> https://www.jankratochvil.net/t/dwz-too-large.tar.xz
(In reply to Jan Kratochvil from comment #5)
> dwz -h -q -r -m merged -l 1000000000 -L 2000000000 `find -name "*.debug"`

Reproduced:
...
$ dwz -h -q -r -m merged -l 1000000000 -L 2000000000 $(find -name "*.debug")
dwz: Multifile temporary files too large
...

The offending file is /tmp/dwz.debug_info.LhYCFp at ~3.8 GB.
Comment 7 Jan Kratochvil 2019-09-26 12:47:21 UTC
Just the bug is dictated by the standard. I think it would need to switch DWZ to DWARF-5 as there is DW_FORM_ref_sup8; currently DW_FORM_GNU_*_alt are both 32-bit only.
Comment 8 Tom de Vries 2019-10-03 07:26:39 UTC
(In reply to Jan Kratochvil from comment #7)
> Just the bug is dictated by the standard. I think it would need to switch
> DWZ to DWARF-5 as there is DW_FORM_ref_sup8; currently DW_FORM_GNU_*_alt are
> both 32-bit only.

You're right about the limitation of DW_FORM_GNU_*_alt, but AFAIU that's not the boundary we're running into here.

In multifile mode, DWZ first processes one file at a time, optimizes it in single-file mode, and dumps multifile-eligible DWARF into temporary files, one per debug section. Subsequently, these temporary files are mmapped, and common DWARF is copied to the multifile.

We run here into a size limit for the temporary files.  The DW_FORM_GNU_*_alt restriction limits the size of the multifile.
Comment 9 Tom de Vries 2019-11-28 12:02:34 UTC
There are a few things to mention here:
- there is no clear reason why this should be an error, we could also warn
  about not being able to add to the temporary files, and continue the
  multifile optimization
- it should be possible to continue writing at the point we run into an error
  now, by switching to 64-bit dwarf, but reading 64-bit dwarf is currently not
  supported.
- the setup of collecting all the info for the multifile optimization into
  one temporary file per section is very convenient because it follows the way
  single-file optimization is done closely.  But things don't have to be setup
  like that. We could also have one temporary file per section per input file,
  which would fix the limitation we're running into here.  Of course we'd
  spent more effort juggling things around once we start reading in those files,
  but there is also the possibility that we'd be able to manage memory in
  a more fine-grained way, which could possibly reduce peak memory usage.

For now, I'm classifying this as enhancement, given that we're running into an implementation-defined limitation, which dwz is correctly reporting.
Comment 10 Tom de Vries 2019-11-29 09:41:04 UTC
(In reply to Tom de Vries from comment #9)
> There are a few things to mention here:
> - there is no clear reason why this should be an error, we could also warn
>   about not being able to add to the temporary files, and continue the
>   multifile optimization

I ran this again using --devel-trace to get a bit more information, and found out that actually there's no error.  The "dwz: Multifile temporary files too large" message is a warning, telling the user in a _very_ indirect way that dwz switches off multifile optimization and will continue to process the remaining files in single-file mode.

This is part of a larger issue in dwz where both warnings and errors are generated using a call to 'error', and consequently it's not immediately clear from the source code which is an error and which is a warning.

Furthermore, in write_multifile when running into this warning we immediately return with return value 1, suggesting that processing is stopped, but the return value is actually ignored.
Comment 11 Tom de Vries 2019-11-29 12:31:43 UTC
I'm currently trying out the patch:
...
diff --git a/dwz.c b/dwz.c
index 3c886d6..804746c 100644
--- a/dwz.c
+++ b/dwz.c
@@ -11942,7 +11942,7 @@ write_multifile (DSO *dso)
                  < multi_macro_off)
        {
          error (0, 0, "Multifile temporary files too large");
-         multifile = NULL;
          ret = 1;
        }
       else
...
that implements the 'continue multifile optimization' strategy.

Currently /tmp/dwz.debug_info.xxxxxx is 4180719431, which is at 97% of the maximum:
...
$ lsof | egrep '^COMMAND|/tmp/dwz.debug_info.'
COMMAND       PID     TID       USER   FD      TYPE             DEVICE   SIZE/OFF       NODE NAME
dwz         28874           tdevries    3u      REG                8,2 4180719431   23298280 /tmp/dwz.debug_info.G8DTik (deleted)
...
Comment 12 Tom de Vries 2019-11-29 14:07:04 UTC
(In reply to Tom de Vries from comment #11)
> I'm currently trying out the patch:
> that implements the 'continue multifile optimization' strategy.

Result:
...
$ ./reproduce.sh
maxmem: 27212104
real: 5215.06
user: 4418.86
system: 400.27
...
So, this took 87 minutes real (74 minutes user, 7 minutes sys) and 26 GB (on a server with 256GB).

The resulting multifile is 823MB:
...
$ du -h merged
823M    merged
...
Comment 13 Tom de Vries 2019-11-29 15:31:57 UTC
(In reply to Tom de Vries from comment #12)
> (In reply to Tom de Vries from comment #11)
> > I'm currently trying out the patch:
> > that implements the 'continue multifile optimization' strategy.
> 
> Result:
> ...
> $ ./reproduce.sh
> maxmem: 27212104
> real: 5215.06
> user: 4418.86
> system: 400.27
> ...
> So, this took 87 minutes real (74 minutes user, 7 minutes sys) and 26 GB (on
> a server with 256GB).
> 
> The resulting multifile is 823MB:
> ...
> $ du -h merged
> 823M    merged
> ...

Compared to without the patch:
...
maxmem: 17332160
real: 2539.16
user: 2063.37
system: 201.93
...
So, 42 minutes real (34 minutes user, 3 minutes user) and 16.5 GB.