Summary: | gas-2.39 started adding 0-sized DIEs to functions without .size | ||
---|---|---|---|
Product: | binutils | Reporter: | Sergei Trofimovich <slyich> |
Component: | gas | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | jbeulich, mark, mliska, nickc, sam |
Priority: | P2 | ||
Version: | 2.39 | ||
Target Milestone: | --- | ||
See Also: | https://sourceware.org/bugzilla/show_bug.cgi?id=29450 | ||
Host: | Target: | ||
Build: | Last reconfirmed: |
Description
Sergei Trofimovich
2022-08-06 08:10:16 UTC
Started with: commit 591cc9fbbfd6d51131c0f1d4a92e7893edcc7a28 Author: Jan Beulich <jbeulich@suse.com> Date: Thu Apr 7 08:18:00 2022 +0200 gas/Dwarf: record functions (In reply to Sergei Trofimovich from comment #0) > Discovered in https://sourceware.org/PR29450 where gas-2.38 did not attach > 0-sized DIE for glibc's _init assembly-written function and gas-2.39 did. > > <1><22>: Abbrev Number: 2 (DW_TAG_subprogram) > <23> DW_AT_name : (strp) (offset: 0x49): _init > <27> DW_AT_external : (flag) 1 > <28> DW_AT_low_pc : (addr) 0x0 > <2c> DW_AT_high_pc : (addr) 0x0 > > Would it be fair to say it's a bug to assign zero size here via DW_AT_low_pc > / DW_AT_high_pc? Yes it is clearly a bug. Both low_pc and high_pc are invalid values. The definition of low_pc/high_pc is: The value of the DW_AT_low_pc attribute is the address of the first instruction associated with the entity. If the value of the DW_AT_high_pc is of class address, it is the address of the first location past the last instruction associated with the entity; So this doesn't even describe a valid contiguous range. (In reply to Sergei Trofimovich from comment #0) > <28> DW_AT_low_pc : (addr) 0x0 > <2c> DW_AT_high_pc : (addr) 0x0 > Would it be fair to say it's a bug to assign zero size here via DW_AT_low_pc > / DW_AT_high_pc? Actually it *is* fair to give them 0 values. (The values are addresses, not sizes). The reason is that the address of the start and end of the __x86.get_pc_thunk.bx function has not been assigned yet. (This is an object file, not a fully linked executable). If you look at the relocations for the crti.o file you find: Relocation section '.rel.debug_info' at offset 0x288 contains 12 entries: Offset Info Type Sym. Value Symbol's Name [...] 00000036 00000901 R_386_32 00000000 __x86.get_pc_thunk.bx 0000003a 00000901 R_386_32 00000000 __x86.get_pc_thunk.bx So when this file is linked in with object files and these relocations are resolved the correct values for the __x86.get_pc_thunk.bx symbol will be installed into the .debug_info section, and everything should work. (In reply to Nick Clifton from comment #3) > (In reply to Sergei Trofimovich from comment #0) > > > <28> DW_AT_low_pc : (addr) 0x0 > > <2c> DW_AT_high_pc : (addr) 0x0 > > > Would it be fair to say it's a bug to assign zero size here via DW_AT_low_pc > > / DW_AT_high_pc? > > Actually it *is* fair to give them 0 values. (The values are addresses, not > sizes). The reason is that the address of the start and end of the > __x86.get_pc_thunk.bx function has not been assigned yet. (This is an > object file, not a fully linked executable). > > If you look at the relocations for the crti.o file you find: > > Relocation section '.rel.debug_info' at offset 0x288 contains 12 entries: > Offset Info Type Sym. Value Symbol's Name > [...] > 00000036 00000901 R_386_32 00000000 __x86.get_pc_thunk.bx > 0000003a 00000901 R_386_32 00000000 __x86.get_pc_thunk.bx > > So when this file is linked in with object files and these relocations are > resolved the correct values for the __x86.get_pc_thunk.bx symbol will be > installed into the .debug_info section, and everything should work. In https://sourceware.org/PR29450 we observed zero-size on a final executable. Is zero a reasonable value here? `__x86.get_pc_thunk.bx` lives in a separate section compared to `_init`. Nick, are you sure you are looking at the `_init` debug relocations and not `__x86.get_pc_thunk.bx`? I think both are present and are slightly different. Looking at the specific offsets: $ readelf -aW --debug-dump crti.o ... Contents of the .debug_info section: Compilation Unit @ offset 0x0: Length: 0x3b (32-bit) Version: 2 Abbrev Offset: 0x0 Pointer Size: 4 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit) <c> DW_AT_stmt_list : (data4) 0x0 <10> DW_AT_ranges : (data4) 0x0 <14> DW_AT_name : (strp) (offset: 0x0): crti.S.S <18> DW_AT_comp_dir : (strp) (offset: 0x9): /tmp <1c> DW_AT_producer : (strp) (offset: 0xe): GNU AS 2.39 <20> DW_AT_language : (data2) 32769 (MIPS assembler) <1><22>: Abbrev Number: 2 (DW_TAG_subprogram) <23> DW_AT_name : (strp) (offset: 0x1a): _init <27> DW_AT_external : (flag) 1 <28> DW_AT_low_pc : (addr) 0x0 ; <<<--- <2c> DW_AT_high_pc : (addr) 0x0 ; <<<--- ... The concern is `DW_AT_low_pc / DW_AT_high_pc` at 0x28/0xac. $ objdump -Dr crti.o 00000000 <.debug_info>: ... 26: 00 01 add %al,(%ecx) ... 28: R_386_32 _init ; <<<--- 2c: R_386_32 _init ; <<<--- 30: 02 20 add (%eax),%ah These are DW_AT_low_pc=_init, DW_AT_high_pc=_init. AFAIU DW_AT_high_pc should be somehting like _init+2 to at least account for `ud2`. >> So when this file is linked in with object files and these relocations are >> resolved the correct values for the __x86.get_pc_thunk.bx symbol will be >> installed into the .debug_info section, and everything should work. > > In https://sourceware.org/PR29450 we observed zero-size on a final executable. Maybe it is useful to attach the final binary to this bug and show how to generate it. Given the code that introduced this issues (commit 591cc9fbbfd6d51131c0f1d4a92e7893edcc7a28 see comment #1) it might also be interesting to show the symbol table entry for _init. Also maybe try building with -gdwarf-4 or -gdwarf-5 to see what/how high_pc is encoded then (as offset from low_pc). > These are DW_AT_low_pc=_init, DW_AT_high_pc=_init. AFAIU DW_AT_high_pc > should be somehting like _init+2 to at least account for `ud2`. Right, the DW_AT_high_pc should at least be one bigger than low_pc or it doesn't encode a valid single range. (In reply to Mark Wielaard from comment #5) > >> So when this file is linked in with object files and these relocations are > >> resolved the correct values for the __x86.get_pc_thunk.bx symbol will be > >> installed into the .debug_info section, and everything should work. > > > > In https://sourceware.org/PR29450 we observed zero-size on a final executable. > > Maybe it is useful to attach the final binary to this bug and show how to > generate it. > > Given the code that introduced this issues (commit > 591cc9fbbfd6d51131c0f1d4a92e7893edcc7a28 see comment #1) it might also be > interesting to show the symbol table entry for _init. Note before the revision there was no debug info entry for _init: $ as --version && gcc /tmp/crti.S -c -g --verbose -dwarf4 && readelf --debug-dump crti.o GNU assembler (GNU Binutils; openSUSE Tumbleweed) 2.38.20220525-6 ... <0><c>: Abbrev Number: 1 (DW_TAG_compile_unit) <d> DW_AT_stmt_list : 0x0 <11> DW_AT_ranges : 0xc <15> DW_AT_name : (indirect string, offset: 0x0): /tmp/crti.S <19> DW_AT_comp_dir : (indirect string, offset: 0xc): /home/marxin <1d> DW_AT_producer : (indirect string, offset: 0x19): GNU AS 2.38 <21> DW_AT_language : 32769 (MIPS assembler) (no change with -dwarf4 option). And the symbol has size == 0 in symbol table: $ readelf -s crti.o Symbol table '.symtab' contains 11 entries: Num: Value Size Type Bind Vis Ndx Name ... 9: 0000000000000000 0 FUNC GLOBAL HIDDEN 5 _init With 2.39 one gets: $ as --version && gcc /tmp/crti.S -c -g --verbose -dwarf4 && readelf --debug-dump crti.o GNU assembler (GNU Binutils; home:marxin:branches:devel:gcc / openSUSE_Factory) 2.39.0.20220808-0 ... <0><c>: Abbrev Number: 1 (DW_TAG_compile_unit) <d> DW_AT_stmt_list : 0x0 <11> DW_AT_ranges : 0xc <15> DW_AT_name : (indirect string, offset: 0x0): /tmp/crti.S <19> DW_AT_comp_dir : (indirect string, offset: 0xc): /home/marxin/Programming/binutils <1d> DW_AT_producer : (indirect string, offset: 0x2e): GNU AS 2.39.0 <21> DW_AT_language : 32769 (MIPS assembler) <1><23>: Abbrev Number: 2 (DW_TAG_subprogram) <24> DW_AT_name : (indirect string, offset: 0x3c): _init <28> DW_AT_external : 1 <28> DW_AT_low_pc : 0x0 <30> DW_AT_high_pc : 0 <1><31>: Abbrev Number: 2 (DW_TAG_subprogram) <32> DW_AT_name : (indirect string, offset: 0x42): __x86.get_pc_thunk.bx <36> DW_AT_external : 1 <36> DW_AT_low_pc : 0x0 <3e> DW_AT_high_pc : 2 <1><3f>: Abbrev Number: 0 and the symbol size is also 0 in the table: $ readelf -s crti.o Symbol table '.symtab' contains 11 entries: Num: Value Size Type Bind Vis Ndx Name ... 9: 0000000000000000 0 FUNC GLOBAL HIDDEN 5 _init > and the symbol size is also 0 in the table: > $ readelf -s crti.o > > Symbol table '.symtab' contains 11 entries: > Num: Value Size Type Bind Vis Ndx Name > ... > 9: 0000000000000000 0 FUNC GLOBAL HIDDEN 5 _init So if the size really is zero than high_pc should be one larger than low_pc (if expressed as addr in DWARF < 4) and 1 if expressed as unsigned constant (for DWARF >=4) Having the addresses equal or high_pc zero does not express the zero sized region. It is a little questionable that the low_pc address is also zero, is that because it doesn't have relocations applied yet? (In reply to Mark Wielaard from comment #7) > > and the symbol size is also 0 in the table: > > $ readelf -s crti.o > > > > Symbol table '.symtab' contains 11 entries: > > Num: Value Size Type Bind Vis Ndx Name > > ... > > 9: 0000000000000000 0 FUNC GLOBAL HIDDEN 5 _init > > So if the size really is zero than high_pc should be one larger than low_pc > (if expressed as addr in DWARF < 4) and 1 if expressed as unsigned constant > (for DWARF >=4) > > Having the addresses equal or high_pc zero does not express the zero sized > region. Doesn't high_pc being one larger than low_pc express a 1-byte region? The commit in question actually tries to avoid emitting zero-sized regions, so the question is why if (S_GET_SIZE (symp) == 0) { if (!IS_ELF || symbol_get_obj (symp)->size == NULL) continue; } doesn't have the intended effect in the case at hand. With no .size directive I don't see why / how ->size could be non-NULL. (In reply to Jan Beulich from comment #9) > The commit in question actually tries to avoid emitting zero-sized regions, > so the question is why > > if (S_GET_SIZE (symp) == 0) > { > if (!IS_ELF || symbol_get_obj (symp)->size == NULL) > continue; > } From the debugging session, I can confirm the continue branch is taken for: (gdb) p *symp $2 = { flags = { local_symbol = 0, written = 0, resolved = 0, resolving = 0, used_in_reloc = 0, used = 0, volatil = 0, forward_ref = 0, forward_resolved = 0, mri_common = 0, weakrefr = 0, weakrefd = 0, removed = 0, multibyte_warned = 0 }, hash = 3929827014, name = 0x609c50 "_init", frag = 0x60b6f8, bsym = 0x5daed8, x = 0x609c88 } (gdb) p symbol_get_obj (symp)->size $3 = (expressionS *) 0x0 > > doesn't have the intended effect in the case at hand. With no .size > directive I don't see why / how ->size could be non-NULL. (In reply to Jan Beulich from comment #8) > (In reply to Mark Wielaard from comment #7) > > > and the symbol size is also 0 in the table: > > > $ readelf -s crti.o > > > > > > Symbol table '.symtab' contains 11 entries: > > > Num: Value Size Type Bind Vis Ndx Name > > > ... > > > 9: 0000000000000000 0 FUNC GLOBAL HIDDEN 5 _init > > > > So if the size really is zero than high_pc should be one larger than low_pc > > (if expressed as addr in DWARF < 4) and 1 if expressed as unsigned constant > > (for DWARF >=4) > > > > Having the addresses equal or high_pc zero does not express the zero sized > > region. > > Doesn't high_pc being one larger than low_pc express a 1-byte region? Yes, you are right. Technically since high_pc is one past the last address it describes just a single address (of low_pc). To describe a DIE without size you should only emit a low_pc attribute and no high_pc attribute. According to the DWARF spec a DIE that has a machine code address or range of machine code addresses associated has: • A DW_AT_low_pc attribute for a single address, • A DW_AT_low_pc and DW_AT_high_pc pair of attributes for a single contiguous range of addresses, or • A DW_AT_ranges attribute for a non-contiguous range of addresses. Where the high_pc expresses the first location past the last instruction associated with the entity. Yeah, I can certainly see my thinko. Making a patch ... The master branch has been updated by Jan Beulich <jbeulich@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d7abcbcea5ddd40a3bf28758b62f35933c59f996 commit d7abcbcea5ddd40a3bf28758b62f35933c59f996 Author: Jan Beulich <jbeulich@suse.com> Date: Wed Aug 10 10:30:46 2022 +0200 gas/Dwarf: properly skip zero-size functions PR gas/29451 While out_debug_abbrev() properly skips such functions, out_debug_info() mistakenly didn't. It needs to calculate the high_pc expression ahead of time, in order to skip emitting any data for the function if the value is zero. The one case which would still leave a zero-size entry is when symbol_get_obj(symp)->size ends up evaluating to zero. I hope we can expect that to not be the case, otherwise we'd need to have a way to post-process .debug_info contents between resolving expressions and actually writing the data out to the file. Even then it wouldn't be entirely obvious in which way to alter the data. The binutils-2_39-branch branch has been updated by Jan Beulich <jbeulich@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=e8cf73215187b0c08679d726a5cc7c019fa3ea2e commit e8cf73215187b0c08679d726a5cc7c019fa3ea2e Author: Jan Beulich <jbeulich@suse.com> Date: Wed Aug 10 10:34:22 2022 +0200 gas/Dwarf: properly skip zero-size functions PR gas/29451 While out_debug_abbrev() properly skips such functions, out_debug_info() mistakenly didn't. It needs to calculate the high_pc expression ahead of time, in order to skip emitting any data for the function if the value is zero. The one case which would still leave a zero-size entry is when symbol_get_obj(symp)->size ends up evaluating to zero. I hope we can expect that to not be the case, otherwise we'd need to have a way to post-process .debug_info contents between resolving expressions and actually writing the data out to the file. Even then it wouldn't be entirely obvious in which way to alter the data. (cherry picked from commit d7abcbcea5ddd40a3bf28758b62f35933c59f996) Should be taken care of then. |