Bug 25755 - Means to not keep decls in symtab
Summary: Means to not keep decls in symtab
Status: NEW
Alias: None
Product: gdb
Classification: Unclassified
Component: symtab (show other bugs)
Version: HEAD
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-01 08:13 UTC by Tom de Vries
Modified: 2020-04-02 23:14 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2020-04-01 08:13:07 UTC
I.

Consider the following test-case using source files test.c:
...
extern int aaa;

int
main (void)
{
  return 0;
}
...
and test2.c:
...
int aaa = 33;
...

If we compile with debug info, we can print the value of aaa using the proper type with both gdb and lldb:
...
$ gcc test.c test2.c -g
$ gdb -batch a.out -ex "print aaa"
$1 = 33
$ lldb -batch a.out -o "print aaa"
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) print aaa
(int) $0 = 33
...


II.

If we compile without debug info, we can print the value with gdb provided we cast to the proper type:
...
$ gcc test.c test2.c
$ gdb -batch a.out -ex "print aaa"
'aaa' has unknown type; cast it to its declared type
$ gdb -batch a.out -ex "print (int)aaa"
$1 = 33
...
and with lldb we get a typeless value:
...
$ lldb -batch a.out -o "print aaa"
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) print aaa
(void *) $0 = 0x0000000000000021
...
and can also cast it to a type (which seems to require long int rather than int):
...
$ lldb -batch a.out -o "print (int)aaa"
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) print (int)aaa
error: warning: got name from symbols: aaa
error: cast from pointer to smaller type 'int' loses information
$ lldb -batch a.out -o "print (long int)aaa"
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) print (long int)aaa
(long) $0 = 33
...


III.

Now consider compiling with debug info only for test.c:
...
$ gcc -c test.c -g; gcc -c test2.c; gcc test.o test2.o -g
...

Gdb managed to print with type:
...
$ gdb -batch a.out -ex "print aaa"
$1 = 33
...
while lldb falls back on the typeless print:
...
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) print aaa
(void *) $0 = 0x0000000000000021
...

This is a feature of gdb, (which apparently at least this version of lldb doesn't have) where gdb keeps track of variable declarations:
...
Blockvector:

block #000, object at 0x555570721f70, 1 syms/buckets in 0x400497..0x4004a2
 int aaa; unresolved
 int main(void); block object 0x555570721e60, 0x400497..0x4004a2 section .text
  block #001, object at 0x555570721ec0 under 0x555570721f70, 1 syms/buckets in 0x400497..0x4004a2
   typedef int int; 
    block #002, object at 0x555570721e60 under 0x555570721ec0, 0 syms/buckets in 0x400497..0x4004a2, function main
...
and combines those with addresses found in minimal symbol info:
...
$ nm a.out | grep aaa
0000000000601028 D aaa
...


IV.

This is a nice feature, but comes with a few issues.

There's a recently fixed issue where a decl using an incomplete type shadows the def using the complete type (fixed in commit 93e55f0a03 "[gdb/symtab] Prefer var def over decl"). [ And there's an open issue to fix this better: PR25260 - "Handle decl before def more robustly"  ( https://sourceware.org/bugzilla/show_bug.cgi?id=25260 ). ]

And there's the open issue PR 24985 - "Cannot print value of global variable because decl in one CU shadows def in other"  ( https://sourceware.org/bugzilla/show_bug.cgi?id=24985 ).

Furthermore, it costs memory to keep track of the decls, while that is not always useful.


V.

Consider a simpler test-case, test3.c:
...
extern int aaa;

int aaa;

int
main (void)
{
  return 0;
}
...
compiled with debug info, with an older gcc:
...
$ gcc-4.8 -g test3.c
...

There's just one DIE describing the variable:
...
 <1><118>: Abbrev Number: 4 (DW_TAG_variable)
    <119>   DW_AT_name        : aaa
    <11d>   DW_AT_decl_file   : 1
    <11e>   DW_AT_decl_line   : 3
    <11f>   DW_AT_type        : <0x111>
    <123>   DW_AT_external    : 1
    <123>   DW_AT_location    : 9 byte block: 3 2c 10 60 0 0 0 0 0      (DW_OP_addr: 60102c)
...

But with a more recent gcc (7.5.0), we have a def and a decl:
...
 <1><f4>: Abbrev Number: 2 (DW_TAG_variable)
    <f5>   DW_AT_name        : aaa
    <f9>   DW_AT_decl_file   : 1
    <fa>   DW_AT_decl_line   : 1
    <fb>   DW_AT_type        : <0xff>
    <ff>   DW_AT_external    : 1
    <ff>   DW_AT_declaration : 1
 <1><106>: Abbrev Number: 4 (DW_TAG_variable)
    <107>   DW_AT_specification: <0xf4>
    <10b>   DW_AT_decl_line   : 3
    <10c>   DW_AT_location    : 9 byte block: 3 2c 10 60 0 0 0 0 0      (DW_OP_addr: 60102c)
...

This more accurately describes the source, but gdb makes a symbol for both the def and the decl:
...
Blockvector:

block #000, object at 0x560017e71f40, 1 syms/buckets in 0x400497..0x4004a2
 int aaa; unresolved
 int aaa; static at 0x60102c section .bss
 int main(void); block object 0x560017e71e30, 0x400497..0x4004a2 section .text
  block #001, object at 0x560017e71e90 under 0x560017e71f40, 1 syms/buckets in 0x400497..0x4004a2
   typedef int int; 
    block #002, object at 0x560017e71e30 under 0x560017e71e90, 0 syms/buckets in 0x400497..0x4004a2, function main
...
which is not useful at all.


VI.

This situation is further aggravated by -flto, which for a test-case test4.c:
...
int aaa;

int
main (void)
{
  return 0;
}
... 
compiled like this:
...
$ gcc-8 -O0 test4.c -g -flto -flto-partition=none -ffat-lto-objects
...
generates a def and a decl:
...
 <0><d2>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <d8>   DW_AT_name        : <artificial>
 <1><110>: Abbrev Number: 4 (DW_TAG_variable)
    <111>   DW_AT_abstract_origin: <0x13d>
    <115>   DW_AT_location    : 9 byte block: 3 2c 10 60 0 0 0 0 0      (DW_OP_addr: 60102c)
 <0><12b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <131>   DW_AT_name        : test4.c
 <1><13d>: Abbrev Number: 2 (DW_TAG_variable)
    <13e>   DW_AT_name        : aaa
    <142>   DW_AT_decl_file   : 1
    <143>   DW_AT_decl_line   : 1
    <144>   DW_AT_decl_column : 5
    <145>   DW_AT_type        : <0x149>
    <149>   DW_AT_external    : 1
...
even though there's no seperate decl in the file, and gdb again keeps two entries in the symbol tables:
...
Symtab for file test4.c

Blockvector:

block #000, object at 0x555e7afafb70, 1 syms/buckets in 0x0..0x0
 int aaa; unresolved
  block #001, object at 0x555e7afafac0 under 0x555e7afafb70, 1 syms/buckets in 0x0..0x0
   typedef int int; 


Symtab for file <artificial>

Blockvector:

block #000, object at 0x555e7afaf7d0, 1 syms/buckets in 0x400492..0x40049e
 int main(void); block object 0x555e7afaf6c0, 0x400492..0x40049e section .text
 int aaa; static at 0x60102c section .bss
  block #001, object at 0x555e7afaf770 under 0x555e7afaf7d0, 0 syms/buckets in 0x400492..0x40049e
    block #002, object at 0x555e7afaf6c0 under 0x555e7afaf770, 0 syms/buckets in 0x400492..0x40049e, function main
...


VII.

Concluding, with the feature having some known issues, and costing memory, and that cost problem possibly getting worse with recent gcc and lto executables, it would be good to have a means to switch off the feature (by not keeping the decls in the symbol table), say:
...
(gdb) maint set symbol-store-decls off
...

This would alllow us to:
- more easily identify problems related to the feature
- work around such problems
- assess memory impact of feature
- more fairly compare memory usage with lldb versions that do not
  support this feature.
Comment 1 Tom de Vries 2020-04-01 13:29:35 UTC
(In reply to Tom de Vries from comment #0)

> V.
> 
> Consider a simpler test-case, test3.c:
> ...
> extern int aaa;
> 
> int aaa;
> 
> int
> main (void)
> {
>   return 0;
> }
> ...
> compiled with debug info, with an older gcc:
> ...
> $ gcc-4.8 -g test3.c
> ...
> 
> There's just one DIE describing the variable:
> ...
>  <1><118>: Abbrev Number: 4 (DW_TAG_variable)
>     <119>   DW_AT_name        : aaa
>     <11d>   DW_AT_decl_file   : 1
>     <11e>   DW_AT_decl_line   : 3
>     <11f>   DW_AT_type        : <0x111>
>     <123>   DW_AT_external    : 1
>     <123>   DW_AT_location    : 9 byte block: 3 2c 10 60 0 0 0 0 0     
> (DW_OP_addr: 60102c)
> ...
> 
> But with a more recent gcc (7.5.0), we have a def and a decl:
> ...
>  <1><f4>: Abbrev Number: 2 (DW_TAG_variable)
>     <f5>   DW_AT_name        : aaa
>     <f9>   DW_AT_decl_file   : 1
>     <fa>   DW_AT_decl_line   : 1
>     <fb>   DW_AT_type        : <0xff>
>     <ff>   DW_AT_external    : 1
>     <ff>   DW_AT_declaration : 1
>  <1><106>: Abbrev Number: 4 (DW_TAG_variable)
>     <107>   DW_AT_specification: <0xf4>
>     <10b>   DW_AT_decl_line   : 3
>     <10c>   DW_AT_location    : 9 byte block: 3 2c 10 60 0 0 0 0 0     
> (DW_OP_addr: 60102c)
> ...
> 
> This more accurately describes the source, but gdb makes a symbol for both
> the def and the decl:
> ...
> Blockvector:
> 
> block #000, object at 0x560017e71f40, 1 syms/buckets in 0x400497..0x4004a2
>  int aaa; unresolved
>  int aaa; static at 0x60102c section .bss
>  int main(void); block object 0x560017e71e30, 0x400497..0x4004a2 section
> .text
>   block #001, object at 0x560017e71e90 under 0x560017e71f40, 1 syms/buckets
> in 0x400497..0x4004a2
>    typedef int int; 
>     block #002, object at 0x560017e71e30 under 0x560017e71e90, 0
> syms/buckets in 0x400497..0x4004a2, function main
> ...
> which is not useful at all.
> 
> 
> VI.
> 
> This situation is further aggravated by -flto, which for a test-case test4.c:
> ...
> int aaa;
> 
> int
> main (void)
> {
>   return 0;
> }
> ... 
> compiled like this:
> ...
> $ gcc-8 -O0 test4.c -g -flto -flto-partition=none -ffat-lto-objects
> ...
> generates a def and a decl:
> ...
>  <0><d2>: Abbrev Number: 1 (DW_TAG_compile_unit)
>     <d8>   DW_AT_name        : <artificial>
>  <1><110>: Abbrev Number: 4 (DW_TAG_variable)
>     <111>   DW_AT_abstract_origin: <0x13d>
>     <115>   DW_AT_location    : 9 byte block: 3 2c 10 60 0 0 0 0 0     
> (DW_OP_addr: 60102c)
>  <0><12b>: Abbrev Number: 1 (DW_TAG_compile_unit)
>     <131>   DW_AT_name        : test4.c
>  <1><13d>: Abbrev Number: 2 (DW_TAG_variable)
>     <13e>   DW_AT_name        : aaa
>     <142>   DW_AT_decl_file   : 1
>     <143>   DW_AT_decl_line   : 1
>     <144>   DW_AT_decl_column : 5
>     <145>   DW_AT_type        : <0x149>
>     <149>   DW_AT_external    : 1
> ...
> even though there's no seperate decl in the file, and gdb again keeps two
> entries in the symbol tables:
> ...
> Symtab for file test4.c
> 
> Blockvector:
> 
> block #000, object at 0x555e7afafb70, 1 syms/buckets in 0x0..0x0
>  int aaa; unresolved
>   block #001, object at 0x555e7afafac0 under 0x555e7afafb70, 1 syms/buckets
> in 0x0..0x0
>    typedef int int; 
> 
> 
> Symtab for file <artificial>
> 
> Blockvector:
> 
> block #000, object at 0x555e7afaf7d0, 1 syms/buckets in 0x400492..0x40049e
>  int main(void); block object 0x555e7afaf6c0, 0x400492..0x40049e section
> .text
>  int aaa; static at 0x60102c section .bss
>   block #001, object at 0x555e7afaf770 under 0x555e7afaf7d0, 0 syms/buckets
> in 0x400492..0x40049e
>     block #002, object at 0x555e7afaf6c0 under 0x555e7afaf770, 0
> syms/buckets in 0x400492..0x40049e, function main
> ...
> 

I've filed a PR to ignore these useless symbols: PR25759 - "Remove useless decls from symtab".
Comment 2 Tom de Vries 2020-04-02 12:01:15 UTC
Yet another issue related to this feature: PR25764 - "LOC_UNRESOLVED symbol missing from partial symtab".

I'm starting to wonder if switching this feature off by default would be a large inconvenience for gdb users.
Comment 3 Tom de Vries 2020-04-02 23:14:20 UTC
I tried out this patch and ran the testsuite:
...
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index f94c66b4f1..3d13e00554 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -20267,6 +20267,8 @@ new_symbol (struct die_info *die, struct type *type, struct dwarf2_cu *cu,
                       ? cu->get_builder ()->get_global_symbols ()
                       : cu->list_in_scope);
 
+                 suppress_add = 1;
+
                  SYMBOL_ACLASS_INDEX (sym) = LOC_UNRESOLVED;
                }
              else if (!die_is_declaration (die, cu))
...

These were the new fails:
...
FAIL: gdb.base/symbol-alias.exp: p g_var_s_alias
FAIL: gdb.dwarf2/dw2-bad-unresolved.exp: ptype var
FAIL: gdb.dwarf2/dw2-bad-unresolved.exp: print var
FAIL: gdb.dwarf2/dw2-cu-size.exp: ptype noloc
FAIL: gdb.dwarf2/dw2-linkage-name-trust.exp: p c.membername
FAIL: gdb.dwarf2/dw2-linkage-name-trust.exp: p c.membername ()
FAIL: gdb.dwarf2/dw2-noloc.exp: no-run: print file_extern_locno_resolvable
FAIL: gdb.dwarf2/dw2-noloc.exp: no-run: ptype file_extern_locno_resolvable
FAIL: gdb.dwarf2/dw2-noloc.exp: in-main: print file_extern_locno_resolvable
FAIL: gdb.dwarf2/dw2-noloc.exp: in-main: ptype file_extern_locno_resolvable
FAIL: gdb.dwarf2/dw2-noloc.exp: print main_extern_locno_resolvable
FAIL: gdb.dwarf2/dw2-noloc.exp: ptype main_extern_locno_resolvable
FAIL: gdb.dwarf2/dw2-unresolved.exp: print/d var
FAIL: gdb.dwarf2/opaque-type-lookup.exp: p variable_a
FAIL: gdb.dwarf2/opaque-type-lookup.exp: p variable_b
...