This is the mail archive of the
gdb-prs@sourceware.org
mailing list for the GDB project.
[Bug symtab/23288] Symbol loading performance regression.
- From: "mikesart at fastmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: gdb-prs at sourceware dot org
- Date: Wed, 20 Jun 2018 16:44:19 +0000
- Subject: [Bug symtab/23288] Symbol loading performance regression.
- Auto-submitted: auto-generated
- References: <bug-23288-4717@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=23288
--- Comment #3 from Michael Sartain <mikesart at fastmail dot com> ---
Some notes for this chromium test case.
There are 11,680,768 total symbol names
4,749,945 of these are single char symbol names like '\001', '\003', '\026'.
Seem invalid? Skipping these saves ~1.5 seconds.
That leaves 6,930,823 total symbol names
There are 20,700,321 total symbol names when you include sub-components.
Removing the cp_find_first_component inner loop and only adding full symbol
names saves ~10 seconds.
The std::sort takes ~7 seconds.
These symbols are a hell case for string compares.
~12k symbols start with "base::BindOnce<void ("
~1400 symbols start with "absl::optional_internal::optional_data_base"
~160 start with "(anonymous namespace)::GenerateValue<(lambda at
gen/chrome/service/service_process_catalog_source.cc"
There are 72,028 symbols over 500 bytes.
The longest symbol is 6,181 bytes, and has 5 components.
Couple friends (hat tip Fabian/Charles/Sean) recommended looking at multi-key
sorts and adaptive radix trees. I've printed the symbols out to files and will
try implementing these in a test app. Also thinking/hoping going wild matching
for C++ will allow us to eliminate the cp_find_first_component() calls.
void mapped_index_base::build_name_components ()
{
...
auto count = this->symbol_name_count ();
for (offset_type idx = 0; idx < count; idx++)
{
if (this->symbol_name_slot_invalid (idx))
continue;
const char *name = this->symbol_name_at (idx);
//$$$ mikesart: 4,749,945 of 11,680,768 total symbols are single char
non-alpha?
// This condition saves ~1.5 seconds
if (!name[0] || (!name[1] && !isalpha(name[0])))
continue;
/* Add each name component to the name component table. */
unsigned int previous_len = 0;
//$$$ mikesart: skipping this loop saves ~10 seconds
for (unsigned int current_len = cp_find_first_component (name);
name[current_len] != '\0';
current_len += cp_find_first_component (name + current_len))
{
gdb_assert (name[current_len] == ':');
this->name_components.push_back ({previous_len, idx});
/* Skip the '::'. */
current_len += 2;
previous_len = current_len;
}
this->name_components.push_back ({previous_len, idx});
}
...
//$$$ total sort time is ~7 seconds
std::sort (this->name_components.begin (),
this->name_components.end (),
name_comp_compare);
--
You are receiving this mail because:
You are on the CC list for the bug.