Bug 31295 - Slow symbol lookup due to missing DW_AT_const_value attribute
Summary: Slow symbol lookup due to missing DW_AT_const_value attribute
Status: UNCONFIRMED
Alias: None
Product: gdb
Classification: Unclassified
Component: symtab (show other bugs)
Version: 14.1
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 29366
  Show dependency treegraph
 
Reported: 2024-01-25 16:24 UTC by Dmitry Neverov
Modified: 2024-02-16 02:32 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
pack.tgz (1.78 KB, application/x-compressed-tar)
2024-01-26 12:28 UTC, Tom de Vries
Details
repro without grepping symtabs (8.12 KB, application/x-compressed)
2024-01-26 13:28 UTC, Dmitry Neverov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Neverov 2024-01-25 16:24:29 UTC
DWARF for npos in the following program 

class wxString
{
public:
  static const size_t npos;
};

const size_t wxString::npos = (size_t) -1;

int main() {
  return 0;
}

doesn't have the DW_AT_const_value attribute:

< 2><0x000005cd>      DW_TAG_variable
                        DW_AT_name                  npos
                        DW_AT_decl_file             0x00000001 /path/to/main.cpp
                        DW_AT_decl_line             0x00000006
                        DW_AT_decl_column           0x00000017
                        DW_AT_linkage_name          _ZN8wxString4nposE
                        DW_AT_type                  <0x0000003a>
                        DW_AT_external              yes(1)
                        DW_AT_accessibility         DW_ACCESS_public
                        DW_AT_declaration           yes(1)

If the wxString is used in many CUs, listing fields of the
wxString variable will lookup the wxString::npos name, which
triggers CU expansion. But the dwarf2_add_field (read.c:11699)
doesn't create a symbol if the DW_AT_const_value is missing, so
expansion is done, but the symbol is not found. As a result all
CUs with the symbol are expanded and symbol lookup takes around a
minute on a 500Mb .so file.
Comment 1 Dmitry Neverov 2024-01-25 18:58:39 UTC
There is no slowdown when gdb-indexes are used.
Comment 2 Tom de Vries 2024-01-26 05:22:44 UTC
I build the gdb-14.1-release tag.  Then I tried constructing a 2-CU example out of comment 0.  But I don't manage to reproduce the described behaviour, I see only one CU being expanded.  I tried compiling with both gcc-12 and clang 15.0.7:
...
$ cat test.h
#include <cstddef>

class wxString
{
public:
  static const size_t npos;
};
$ cat test.cpp
#include "test.h"

const size_t wxString::npos = (size_t) -1;

extern int foo (void);

int
main (void)
{
  return foo ();
}
$ cat test-2.cpp
#include "test.h"

extern int foo (void);

int
foo (void)
{
  return wxString::npos + 1;
}
$ g++-12 test.cpp test-2.cpp -g
$ gdb -q -batch a.out -ex "set trace-commands on" -ex "maint info symtab" -ex "ptype wxString" -ex "maint info symtab"
+maint info symtab
+ptype wxString
type = class wxString {
  public:
    static const size_t npos;
}
+maint info symtab
{ objfile /data/vries/gdb/a.out ((struct objfile *) 0x48beb90)
  { ((struct compunit_symtab *) 0x48b62a0)
    debugformat DWARF 4
    producer GNU C++17 12.3.0 -mtune=generic -march=x86-64 -g
    name test.cpp
    dirname /home/vries/gdb
    blockvector ((struct blockvector *) 0x48f82a0)
    user ((struct compunit_symtab *) (null))
	{ symtab test.cpp ((struct symtab *) 0x48b6320)
	  fullname (null)
	  linetable ((struct linetable *) 0x48f82d0)
	}
	{ symtab /usr/include/c++/12/x86_64-suse-linux/bits/c++config.h ((struct symtab *) 0x48b6360)
	  fullname (null)
	  linetable ((struct linetable *) 0x0)
	}
	{ symtab /usr/lib64/gcc/x86_64-suse-linux/12/include/stddef.h ((struct symtab *) 0x48b63a0)
	  fullname (null)
	  linetable ((struct linetable *) 0x0)
	}
	{ symtab /usr/include/c++/12/cstddef ((struct symtab *) 0x48b63e0)
	  fullname (null)
	  linetable ((struct linetable *) 0x0)
	}
	{ symtab test.h ((struct symtab *) 0x48b6420)
	  fullname (null)
	  linetable ((struct linetable *) 0x0)
	}
  }
}
...
Comment 3 Dmitry Neverov 2024-01-26 09:31:02 UTC
Sorry, I didn't describe it properly, the setup is more
complicated. The wxString should be in shared library.

$ cat str.h
#ifndef STR_H
#define STR_H
#include <cstdlib>

class wxString
{
public:
    static const size_t npos;
    char *data;
    size_t len;
};

#endif //STR_H

$ cat str.cpp
#include "str.h"

const size_t wxString::npos = (size_t) -1;

$ cat util.h
#ifndef UTIL_H
#define UTIL_H

int f();

#endif //UTIL_H

$ cat util.cpp
#include "util.h"
#include "str.h"

int f() {
  wxString s = {};
  return s.len == wxString::npos ? 1 : 0;
}

$ cat main.cpp
#include <iostream>
#include "str.h"
#include "util.h"

int main() {
  wxString str = {};
  std::cout << wxString::npos << std::endl;
  std::cout << f();
  return 0;
}

$ g++ -g -fPIC -c str.cpp
$ g++ -shared -o libstr.so str.o
$ g++ -g -c util.cpp
$ g++ -g -I. -L. -o main main.cpp util.o -lstr
$ gdb -q --batch main -ex "set trace-commands on" -ex "set logging enabled on" -ex "b main.cpp:7" -ex "run" -ex "maint info symtab" -ex "print str" -ex "maint info symtab"

The last command shows that "print str" at main.cpp:7 loads
util.cpp.
Comment 4 Tom de Vries 2024-01-26 12:28:37 UTC
Created attachment 15334 [details]
pack.tgz

(In reply to Dmitry Neverov from comment #3)
> The last command shows that "print str" at main.cpp:7 loads
> util.cpp.

Still can't reproduce that.

I've attached source files, script, and gdb output files in a .tgz file.

I had to add -rpath to allow the exec to find the shared lib, I assume that doesn't influence results.
Comment 5 Dmitry Neverov 2024-01-26 13:28:05 UTC
Created attachment 15335 [details]
repro without grepping symtabs

Maybe I'm misinterpreting results, but I think I can reproduce it if I drop grep in your script: util.cpp appears after 'print str'.
Comment 6 Tom de Vries 2024-01-26 15:43:25 UTC
(In reply to Dmitry Neverov from comment #5)
> Created attachment 15335 [details]
> repro without grepping symtabs
> 
> Maybe I'm misinterpreting results, but I think I can reproduce it if I drop
> grep in your script: util.cpp appears after 'print str'.

Thanks for posting your logs, that helped me figure it out.

FWIW, AFAIU the grep was not the problem (but dropping the grep gave me the info I needed).

Your three logs all use gcc 12, as far as I can tell from the producer strings.

Since gcc-11, dwarf-5 is the default.  But the gcc-12 package I'm using defaults to dwarf-4.

So, after hardcoding -gdwarf-5, I was able to reproduce it (with trunk btw).

I've looked at the difference, and with dwarf4 for main we have:
...
 <1><2445>: Abbrev Number: 10 (DW_TAG_class_type)
    <2446>   DW_AT_name        : wxString
    <244a>   DW_AT_byte_size   : 16
 <2><2452>: Abbrev Number: 80 (DW_TAG_member)
    <2453>   DW_AT_name        : npos
    <245a>   DW_AT_type        : <0xd2d>
    <245e>   DW_AT_external    : 1
    <245e>   DW_AT_accessibility: 1     (public)
    <245f>   DW_AT_declaration : 1
...
and with dwarf5:
...
 <1><2a96>: Abbrev Number: 27 (DW_TAG_class_type)
    <2a97>   DW_AT_name        : wxString
    <2a9b>   DW_AT_byte_size   : 16
 <2><2aa3>: Abbrev Number: 28 (DW_TAG_variable)
    <2aa4>   DW_AT_name        : npos
    <2aab>   DW_AT_linkage_name: _ZN8wxString4nposE
    <2aaf>   DW_AT_type        : <0x2510>
    <2ab3>   DW_AT_external    : 1
    <2ab3>   DW_AT_accessibility: 1     (public)
    <2ab4>   DW_AT_declaration : 1
...

There are two difference: DW_TAG_member vs DW_TAG_variable, and the presence/absence of DW_AT_linkage_name.

So, with dwarf4, we have no npos entries in the cooked index:
...
$ gdb -q -batch main -ex "maint print objfiles" | grep npos
$
...
and with dwarf5, we do:
...
$ gdb -q -batch main -ex "maint print objfiles" | grep npos
    name:       _ZN8wxString4nposE
    canonical:  _ZN8wxString4nposE
    qualified:  _ZN8wxString4nposE
    name:       npos
    canonical:  npos
    qualified:  wxString::npos
    name:       _ZN8wxString4nposE
    canonical:  _ZN8wxString4nposE
    qualified:  _ZN8wxString4nposE
    name:       npos
    canonical:  npos
    qualified:  wxString::npos
$
...

FWIW, in both cases we have the linkage symbol in main, dwarf4:
...
$ nm main | grep npos
0000000000401da0 D _ZN8wxString4nposE
...
and dwarf5:
...
$ nm main | grep npos
0000000000401da0 D _ZN8wxString4nposE
...
though it's a dynamic one.
Comment 7 Tom Tromey 2024-01-28 20:23:23 UTC
Between this and the discrepancy fixed in my domain rewrite series,
I think it's probably time to implement some kind of
cooked index checker that compares the result against the full symtab.
It's slightly hard to do that though because symbol names differ,
some discussion of this in bug #29398.
Comment 8 Tom Tromey 2024-01-29 01:10:30 UTC
This approach to DW_TAG_variable in the full reader seems weird.
It takes different paths when the tag appears inside a class
and when it appears outside, even if in the latter case
it uses the specification to point back into the class.
Comment 9 Tom Tromey 2024-02-11 17:52:21 UTC
Bug#11534 is vaguely related.
Comment 10 Tom Tromey 2024-02-11 20:37:00 UTC
While working on this I wrote a patch that was
nearly a backout of:

commit f4c4456eb4d826f39abb2575ce5c2c4640bb16f3
Author: Tom de Vries <tdevries@suse.de>
Date:   Fri Jul 21 08:25:25 2023 +0200

    [gdb/symtab] Add optimized out static var to cooked index
Comment 11 Tom Tromey 2024-02-11 21:15:33 UTC
This commit also touches this area:

commit 317d2668d01c7ddc9545029bf56d2b8c4d2bbfeb
Author: Tom de Vries <tdevries@suse.de>
Date:   Wed Apr 22 08:38:44 2020 +0200

    [gdb/symtab] Store external var decls in psymtab


I am starting to think we need to look more carefully
at exactly which variables ought to be supported.
Comment 12 Tom de Vries 2024-02-12 15:44:38 UTC
(In reply to Tom Tromey from comment #11)
> This commit also touches this area:
> 
> commit 317d2668d01c7ddc9545029bf56d2b8c4d2bbfeb
> Author: Tom de Vries <tdevries@suse.de>
> Date:   Wed Apr 22 08:38:44 2020 +0200
> 
>     [gdb/symtab] Store external var decls in psymtab
> 
> 
> I am starting to think we need to look more carefully
> at exactly which variables ought to be supported.

In PR25755 I proposed to make keeping decls in the psymtab conditional.
Comment 13 Tom Tromey 2024-02-12 20:28:16 UTC
BTW my simple WIP patch here also regressed the Ada pragma import/export
test.

I'm sympathetic to bug#25755.  I was planning to propose essentially
the same: a variable that is just a declaration should not show
up in the index or the symtab.  If this means that a variable
that's in the minsyms doesn't have a type -- then we just say that's
fine, since trying to cope with the missing debuginfo is expensive.

The Ada test puts a wrinkle in this though.
Comment 14 Tom Tromey 2024-02-16 02:32:35 UTC
We probably can't totally eliminate full symbols for
declarations, due to C code like

void f() {
  int g;
  {
     extern int g;
     ... use g


Here the inner 'g' is in scope but will be a decl.