This is the mail archive of the gdb-prs@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug symtab/23288] Symbol loading performance regression.


https://sourceware.org/bugzilla/show_bug.cgi?id=23288

--- Comment #3 from Michael Sartain <mikesart at fastmail dot com> ---
Some notes for this chromium test case.

There are 11,680,768 total symbol names
4,749,945 of these are single char symbol names like '\001', '\003', '\026'.
Seem invalid? Skipping these saves ~1.5 seconds.

That leaves 6,930,823 total symbol names
There are 20,700,321 total symbol names when you include sub-components.

Removing the cp_find_first_component inner loop and only adding full symbol
names saves ~10 seconds.

The std::sort takes ~7 seconds.

These symbols are a hell case for string compares.
  ~12k symbols start with "base::BindOnce<void ("
  ~1400 symbols start with "absl::optional_internal::optional_data_base"
  ~160 start with "(anonymous namespace)::GenerateValue<(lambda at
gen/chrome/service/service_process_catalog_source.cc"

There are 72,028 symbols over 500 bytes.
The longest symbol is 6,181 bytes, and has 5 components.

Couple friends (hat tip Fabian/Charles/Sean) recommended looking at multi-key
sorts and adaptive radix trees. I've printed the symbols out to files and will
try implementing these in a test app. Also thinking/hoping going wild matching
for C++ will allow us to eliminate the cp_find_first_component() calls.

void mapped_index_base::build_name_components ()
{
...

  auto count = this->symbol_name_count ();
  for (offset_type idx = 0; idx < count; idx++)
    {
      if (this->symbol_name_slot_invalid (idx))
        continue;

      const char *name = this->symbol_name_at (idx);

      //$$$ mikesart: 4,749,945 of 11,680,768 total symbols are single char
non-alpha?
      // This condition saves ~1.5 seconds
      if (!name[0] || (!name[1] && !isalpha(name[0])))
        continue;

      /* Add each name component to the name component table.  */
      unsigned int previous_len = 0;

      //$$$ mikesart: skipping this loop saves ~10 seconds
      for (unsigned int current_len = cp_find_first_component (name);
           name[current_len] != '\0';
           current_len += cp_find_first_component (name + current_len))
        {
          gdb_assert (name[current_len] == ':');

          this->name_components.push_back ({previous_len, idx});

          /* Skip the '::'.  */
          current_len += 2;
          previous_len = current_len;
        }

      this->name_components.push_back ({previous_len, idx});
    }

...

  //$$$ total sort time is ~7 seconds
  std::sort (this->name_components.begin (),
             this->name_components.end (),
             name_comp_compare);

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]