Created attachment 13381 [details] Debug non-LTO firmware I recently upgraded my cross-compilation toolchain for ARM Cortex-M4F to these component versions: Binutils 2.36.1 GCC 10.3 GDB 10.1 For more details, this is the project I am using. It has both a makefile to build the cross-compilation toolchain, and the firmware I am building with it: https://github.com/rdiez/JtagDue I noticed that GDB is now using much more RAM than before, and not just for LTO builds. For a debug build (non LTO), firmware.elf weighs around 1.5 MB. Most of it is debug information, because the firmware.bin file only weighs 76 kB. A release build (with LTO) has similar sizes. For the debug build (non LTO): When GDB needs to load the symbols, because you are touching some C++ source code, there is a noticeable pause, and GDB uses 385 MiB of RAM. That is already too much for this smallish project. I have another, bigger firmware, where using GDB is becoming a pain, due to the longer pauses. For the release build (LTO): GDB starts using 100 % CPU time for a long time, and memory usage reaches 5 GiB. I got confirmation about this excessive CPU load and memory usage in the GDB mailing list. The start of the thread is: Greatly increased GDB memory and CPU usage with newest embedded ARM toolchain https://sourceware.org/pipermail/gdb/2021-April/049379.html Reducing the debug information level when compiling with GCC from -g3 to -g2 fixes the problem, at the expense of the extra information that -g3 adds, like preprocessor macros . I am attaching 2 .elf files to this bug built with -g3, so that you can reproduce the problem at leisure. You need to build a cross-debugger GDB with: configure --target=arm-none-eabi Then load one of the .elf files in the attachment like this. There is no need to have any ARM CPU available: ./arm-none-eabi-gdb firmware-release-lto.elf Now issue this GDB command: print StartOfUserCode You should see an output like this: $1 = {void (void)} 0x866d8 <StartOfUserCode()> Now you can check GDB's memory usage. The release build (with LTO) is the one that shows a really massive memory usage.
Created attachment 13382 [details] Release LTO firmware
"maint space 1" reports a big difference alright. Good: (gdb) print StartOfUserCode $1 = {void (void)} 0x866d8 <StartOfUserCode()> Space used: 208310272 (+204247040 for this command) Bad: (gdb) print StartOfUserCode Space used: 5955563520 (+5949935616 for this command) No symbol "StartOfUserCode" in current context. For me it also fails, but that's because this symbol isn't defined in the release ELF. gdb seems to be expanding every CU here. > That is already too much for this smallish project. Agreed, though I didn't dig into where it all goes. It definitely seems excessive given that the DWARF only seems to be 240k or so.
The .debug_macro contents seem pretty weird to me. This probably explains both the time and space problems. For example I see sequences like: DW_MACRO_import - offset : 0x0 DW_MACRO_end_file DW_MACRO_import - offset : 0x0 DW_MACRO_end_file DW_MACRO_import - offset : 0x0 DW_MACRO_end_file This is importing the macro sequence at offset 0 three times. That one also imports other things. I suspect this is a compiler bug.
> For me it also fails, but that's because this > symbol isn't defined in the release ELF. Yes, I'm sorry, with the release ELF you can try another symbol instead, like this one: (gdb) print BareMetalSupport_Reset_Handler $1 = {void (void)} 0x86a98 <BareMetalSupport_Reset_Handler()> That rises the GDB memory usage to a whopping 5,6 GiB. In fact, I do not understand why StartOfUserCode is not defined in the release build, because it is the same source code after all. The same routine is used in the same way. I dumped all symbols like this and I compared them: arm-none-eabi-objdump --syms firmware-debug-non-lto.elf arm-none-eabi-objdump --syms firmware-release-lto.elf The release ELF seems to have lost most C++ symbols, and there are many entries like this: 00010d6b l .debug_info 00000000 00010d6b l .debug_info 00000000 00010d6b l .debug_info 00000000 00010d6b l .debug_info 00000000 With GCC's -g2 there is no memory problem anymore, but the C++ symbols are still missing, and those weird entries above are still there. It may be a compiler problem indeed. But I do not know enough about GCC's LTO and GDB to tell. I am really only a user. For example, I do not know that a CU is.
From the email thread: My high water mark reported my massif for the non-lto build was 197MB - most of that seems to come from decoding the macro information (dwarf_decode_macros). ... Oh, managed to get the lto case to stop & valgrind to log it. Yeah... peaked out over 5GB of memory usage, and: 74.12% (4,372,282,848B) 0x3A4D42: macro_alloc So it seems to be something in the way gdb is storing macro info (consistent with the "-g2 rather than -g3 significantly reduces the memory usage" data)
Note that .debug_macro is output at the GCC compile-stage, and thus DW_AT_GNU_macro is only present in the "abstract" CUs. That might lead to multiple inclusions of macro sections in case gdb follows DW_AT_abstract_origins from concrete CUs to the abstract ones (and in case it doesn't employ any "caching" of already read macro sections ...) .debug_macro of the LTO build is [15] .debug_macro PROGBITS 00000000 153940 028492 while the non-LTO build shows [11] .debug_macro PROGBITS 00000000 07a360 027965 that's almost the same size (as expected). So the issue must be with how gdb interprets this info. The non-LTO .debug_macro has DW_MACRO_import - offset : 0x0 DW_MACRO_end_file as well, whatever that means.
I have created a bug against GCC about this issue: GDB has problems reading GCC's debugging info level -g3 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100446
It was stated in the previous comments that symbol StartOfUserCode was not defined in the release ELF (built with LTO). That is not correct. "objdump --syms" does not show that symbol, probably because the routine was inlined, but "readelf --debug-dump" does show it. So it does look like GDB is not reading debug information properly.
(In reply to Richard Biener from comment #6) > The non-LTO .debug_macro has > > DW_MACRO_import - offset : 0x0 > DW_MACRO_end_file > > as well, whatever that means. The problem with LTO is that the output is pathological. For example I see this sequence: DW_MACRO_import - offset : 0x0 DW_MACRO_end_file DW_MACRO_import - offset : 0x0 DW_MACRO_end_file DW_MACRO_import - offset : 0x0 DW_MACRO_end_file This says to import the macros from offset 0 three times in succession. While this is technically ok, it's also absurd. Is this really intentional? This file imports the unit at offset 0x0 multiple times -- 108 in fact. We can probably work around it in gdb somehow. My first thought is to have it simply skip multiple imports of the same unit. This could in theory yield the wrong answer sometimes, though. Looks vaguely related to bug#26303, in the "suspicious import" sense.
> My first thought is > to have it simply skip multiple imports of the same unit. This could > in theory yield the wrong answer sometimes, though. I don't think this is actually true.
(In reply to Tom Tromey from comment #9) > (In reply to Richard Biener from comment #6) > > The non-LTO .debug_macro has > > > > DW_MACRO_import - offset : 0x0 > > DW_MACRO_end_file > > > > as well, whatever that means. > > The problem with LTO is that the output is pathological. > For example I see this sequence: > > DW_MACRO_import - offset : 0x0 > DW_MACRO_end_file > DW_MACRO_import - offset : 0x0 > DW_MACRO_end_file > DW_MACRO_import - offset : 0x0 > DW_MACRO_end_file > > This says to import the macros from offset 0 three times in succession. > While this is technically ok, it's also absurd. Is this really > intentional? This file imports the unit at offset 0x0 multiple times > -- 108 in fact. It does look odd. It appears that it might be a bug with relocation handling - the imports should be resolved from .rel[a].debug_macro. I notice the imports at offset zero all appear at the "end" of .debug_macro. The first CU with offset zero imports is Offset: 0x11f01 Version: 4 Offset size: 4 Offset into .debug_line: 0xc814 Referred from <0><10d76>: Abbrev Number: 1 (DW_TAG_compile_unit) <10d77> DW_AT_producer : (indirect string, offset: 0x5ea6c): GNU C++17 10.3.0 -mcpu=cortex-m3 -mthumb -mfloat-abi=soft -march=armv7-m -g3 -O3 -std=gnu+ +17 -flto -fno-fat-lto-objects -ffunction-sections -fdata-sections --param=max-i nline-insns-single=500 <10d7b> DW_AT_language : 4 (C++) <10d7c> DW_AT_name : (indirect string, offset: 0x5f012): /home/rdie z/rdiez/arduino/JtagDue/Project/BareMetalSupport/Miscellaneous.cpp <10d80> DW_AT_comp_dir : (indirect string, offset: 0x80ec4): /home/rdie z/rdiez/arduino/JtagDue/BuildOutput/JtagDue-obj-release <10d84> DW_AT_stmt_list : 0xc814 <10d88> DW_AT_GNU_macros : 0x11f01 The exact same zero offset macro imports happen in the Debug non-LTO firmware btw. (as said, .debug_macro is generate at compile, not at link time). That said, a smaller example to reproduce those repeated offset zero imports would be nice to have. Unfortunately "preprocessed source" won't do it ... It might be that GCC simply misses something here. > We can probably work around it in gdb somehow. My first thought is > to have it simply skip multiple imports of the same unit. This could > in theory yield the wrong answer sometimes, though. > > Looks vaguely related to bug#26303, in the "suspicious import" sense.
See https://gcc.gnu.org/PR99618 and bz#27590.
I am encountering something similar, although I don't use LTO. My symptom is that when I do "thread apply all bt", there's a noticeable pause of 2-3 seconds between each thread. I added a bit of logging, that time is spent expanding a single CU. I profiled, and all that time is spent in dwarf_decode_macros. I'll attach a binary (lt-lttng-sessiond). I am experimenting with: $ ./gdb --data-directory=data-directory -nx /home/simark/build/lttng-tools-noasan/src/bin/lttng-sessiond/.libs/lt-lttng-sessiond -ex "maint expand register.c" -batch Which expands the CU at 0x6a1c9. That reads the macros at 0x2919d in .debug_macro. Here are the first few lines: DW_MACRO_import - offset : 0x0 DW_MACRO_start_file - lineno: 0 filenum: 9 filename: /home/simark/src/lttng-tools/src/bin/lttng-sessiond/register.c DW_MACRO_start_file - lineno: 0 filenum: 41 filename: /usr/include/stdc-predef.h DW_MACRO_import - offset : 0x0 DW_MACRO_end_file DW_MACRO_start_file - lineno: 0 filenum: 42 filename: ../../../include/config.h DW_MACRO_import - offset : 0x0 DW_MACRO_end_file DW_MACRO_start_file - lineno: 10 filenum: 10 filename: /usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/stddef.h DW_MACRO_import - offset : 0x0 DW_MACRO_end_file DW_MACRO_start_file - lineno: 11 filenum: 43 filename: /usr/include/stdlib.h DW_MACRO_define_strp - lineno : 24 macro : __GLIBC_INTERNAL_STARTING_HEADER_IMPLEMENTATION DW_MACRO_start_file - lineno: 25 filenum: 44 filename: /usr/include/bits/libc-header-start.h DW_MACRO_undef_strp - lineno : 31 macro : __GLIBC_INTERNAL_STARTING_HEADER_IMPLEMENTATION DW_MACRO_start_file - lineno: 33 filenum: 45 filename: /usr/include/features.h DW_MACRO_import - offset : 0x0 DW_MACRO_start_file - lineno: 473 filenum: 46 filename: /usr/include/sys/cdefs.h DW_MACRO_import - offset : 0x0 DW_MACRO_start_file - lineno: 462 filenum: 47 filename: /usr/include/bits/wordsize.h DW_MACRO_import - offset : 0x0 DW_MACRO_end_file DW_MACRO_start_file - lineno: 463 filenum: 48 filename: /usr/include/bits/long-double.h ... The imports at offset 0 look wrong. First, there are a ton of them, one at pretty much every included file. But also because the macro informations at offset 0 in .debug_macro are those of a totally unrelated CU. When I check the relocations in the corresponding .o, I do see: Relocation section '.rela.debug_macro' at offset 0x25cb0 contains 412 entries: Offset Info Type Symbol's Value Symbol's Name + Addend 0000000000000003 0000002b0000000a R_X86_64_32 0000000000000000 .debug_line + 0 0000000000000008 0000002d0000000a R_X86_64_32 0000000000000000 .debug_macro + 0 0000000000000013 0000002e0000000a R_X86_64_32 0000000000000000 .debug_macro + 0 000000000000001c 0000002f0000000a R_X86_64_32 0000000000000000 .debug_macro + 0 0000000000000025 000000300000000a R_X86_64_32 0000000000000000 .debug_macro + 0 So the relocations are really asking for ".debug_macro + 0".
Created attachment 13443 [details] lt-lttng-sessiond
So, IIUC, this is a GNU ld bug, fixed in git. It doesn't seem to affect lld or gold. Is that accurate? If so, I tend to think we should WONTFIX this bug.
Whether it is really fixed needs to be verified, I'm not sure if the commits didn't handle only the LTO special sections rather than plain .debug_macro that linker broke too.
(In reply to Tom Tromey from comment #15) > So, IIUC, this is a GNU ld bug, fixed in git. > It doesn't seem to affect lld or gold. I probably don't understand how this all works. When I see a relocation ".debug_macro + 0" in a .o file, what does that mean. The value should be replaced with the start of the .debug_macro section, or the start of the .debug_macro contribution from that .o? But you are right, when I link with gold, the imports have sensible offsets and not 0. > > Is that accurate? > If so, I tend to think we should WONTFIX this bug. From GDB's point a view, I agree.
I just did another test after upgrading the toolchain to these versions: BINUTILS_VERSION := 2.37 GMP_VERSION := 6.2.1 MPFR_VERSION := 4.1.0 MPC_VERSION := 1.2.1 GCC_VERSION := 10.3.0 NEWLIB_VERSION := 4.1.0 GDB_VERSION := 11.1 I rebuilt the JtagDue firmware, release LTO -g3, and checked in the build log that -g3 was actually being passed to GCC. Then I loaded the firmware with: "$HOME/rdiez/arduino/JtagDue/SelfTestOutput/CurrentToolchain/bin/arm-none-eabi-gdb" "firmware-release-lto-g3.elf" The GDB commands I typed were: print BareMetalSupport_Reset_Handler info macro STACK_SIZE There was neither a delay nor a high memory consumption. But the #define macros were not visible with GDB. Command "info macro" yields the following error message: The symbol `STACK_SIZE' has no definition as a C/C++ preprocessor macro I then built a debug build (without LTO) and the results were the same: no #define macros were visible with GDB. Am I doing something wrong? I cannot remember how I viewed C preprocessor macros inside GDB in the past. I checked, and macros are present in the debug information, as this command confirms: readelf --debug-dump "firmware-release-lto-g3.elf" | grep STACK_SIZE The output is: DW_MACRO_define_strp - lineno : 64 macro : STACK_SIZE ( 4 * 1024 ) I'll try to attach the ELF files after this comment. I cannot test with GDB 11.1 or 11.2 because of the cross-compilation toolchain build problem described in this bug comment: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98324#c7
Created attachment 13871 [details] ELF files for comment 18
(In reply to rdiezmail-binutils from comment #18) > I just did another test after upgrading the toolchain to these versions: > > BINUTILS_VERSION := 2.37 > GMP_VERSION := 6.2.1 > MPFR_VERSION := 4.1.0 > MPC_VERSION := 1.2.1 > GCC_VERSION := 10.3.0 > NEWLIB_VERSION := 4.1.0 > GDB_VERSION := 11.1 > > I rebuilt the JtagDue firmware, release LTO -g3, and checked in the build > log that -g3 was actually being passed to GCC. > > Then I loaded the firmware with: > > "$HOME/rdiez/arduino/JtagDue/SelfTestOutput/CurrentToolchain/bin/arm-none- > eabi-gdb" "firmware-release-lto-g3.elf" > > The GDB commands I typed were: > > print BareMetalSupport_Reset_Handler > info macro STACK_SIZE > > There was neither a delay nor a high memory consumption. But the #define > macros were not visible with GDB. Command "info macro" yields the following > error message: > > The symbol `STACK_SIZE' has no definition as a C/C++ preprocessor macro > > I then built a debug build (without LTO) and the results were the same: no > #define macros were visible with GDB. > > Am I doing something wrong? I cannot remember how I viewed C preprocessor > macros inside GDB in the past. > > I checked, and macros are present in the debug information, as this command > confirms: > > readelf --debug-dump "firmware-release-lto-g3.elf" | grep STACK_SIZE > > The output is: > > DW_MACRO_define_strp - lineno : 64 macro : STACK_SIZE ( 4 * 1024 ) > > I'll try to attach the ELF files after this comment. > > I cannot test with GDB 11.1 or 11.2 because of the cross-compilation > toolchain build problem described in this bug comment: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98324#c7 This works: $./gdb -nx -q --data-directory=data-directory /tmp/yo/firmware-debug-g3.elf -ex "list SysTick_Handler" -ex "info macro STACK_SIZE" -batch 297 /home/rdiez/rdiez/arduino/JtagDue/Project/JtagFirmware/Main.cpp: No such file or directory. Defined at /home/rdiez/rdiez/arduino/JtagDue/Project/JtagFirmware/Main.cpp:64 #define STACK_SIZE ( 4 * 1024 ) You "print BareMetalSupport_Reset_Handler" doesn't set the "current location". I don't see a DWARF symbol for that, just an ELF symbol. "list SysTick_handler" does it. It doesn't work with the LTO binary, I don't have time to dig into it: $ ./gdb -nx -q --data-directory=data-directory /tmp/yo/firmware-release-lto-g3.elf -ex "list SysTick_Handler" -ex "info macro STACK_SIZE" -batch file: "/home/rdiez/rdiez/arduino/JtagDue/Project/JtagFirmware/Main.cpp", line number: 342, symbol: "SysTick_Handler" 337 /home/rdiez/rdiez/arduino/JtagDue/Project/JtagFirmware/Main.cpp: No such file or directory. file: "/home/rdiez/rdiez/arduino/xdk-asf-3.51.0/thirdparty/CMSIS/Include/cmsis_gcc.h", line number: 142, symbol: "SysTick_Handler()" 137 /home/rdiez/rdiez/arduino/xdk-asf-3.51.0/thirdparty/CMSIS/Include/cmsis_gcc.h: No such file or directory. The symbol `STACK_SIZE' has no definition as a C/C++ preprocessor macro at <user-defined>:-1
It looks like for LTO builds the .debug_macro section is missing: $ gcc -g3 -flto -ogdb-28589-g3-lto.exe gdb-28589.c $ objdump --section=.debug_macro --headers gdb-28589-g3-lto.exe gdb-28589-g3-lto.exe: file format pei-x86-64 Sections: Idx Name Size VMA LMA File off Algn objdump: section '.debug_macro' mentioned in a -j option, but not found in any input file It's there without -flto: $ gcc -g3 -ogdb-28589-g3.exe gdb-28589.c $ objdump --section=.debug_macro --headers gdb-28589-g3.exe gdb-28589-g3.exe: file format pei-x86-64 Sections: Idx Name Size VMA LMA File off Algn 18 .debug_macro 0000687f 0000000140046000 0000000140046000 0003ba00 2**0 CONTENTS, READONLY, DEBUGGING
*** Bug 29188 has been marked as a duplicate of this bug. ***
*** Bug 29702 has been marked as a duplicate of this bug. ***
Maybe we should consider working around this in gdb. Maybe it could reject import of 0x0 and issue some sort of warning? I'm not sure.
As more users keep hitting this problem, I suggest mentioning it in some "known problems" or "caveats" section in the documentation.
I concur with Tom and rdiezmail-binutils@yahoo.de 1. A workaround in gdb will be very helpful 2. Yes, this issue should be documented appropriately sooner than later.
(In reply to Tom Tromey from comment #10) > > My first thought is > > to have it simply skip multiple imports of the same unit. This could > > in theory yield the wrong answer sometimes, though. > > I don't think this is actually true. Did you mean that you no longer think that "this could in theory yield the wrong answer sometimes, though." ? This issue just came up again in a discussion Simon and I were having, and I was also wondering whether we could make GDB skip reimporting the same offset. It wouldn't fix the bad macro debug info in the patological case (no worse than today), but at least it'd get us past the performance degradation.
(In reply to Pedro Alves from comment #27) > Did you mean that you no longer think that "this could in theory yield the > wrong answer sometimes, though." ? I don't really remember what I meant, but it seems to me that, while ignoring imports of 0x0 is maybe technically wrong, on the other hand it's a workaround for what appears to be a reasonably common problem. > This issue just came up again in a discussion Simon and I were having, and I > was also wondering whether we could make GDB skip reimporting the same > offset. Maybe just ignoring 0x0 is enough and we don't need to worry about the other cases? Anyway it seems fine to me to do this.