This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH 26/40] Optimize .gdb_index symbol name searching

From: Pedro Alves <palves at redhat dot com>
To: Keith Seitz <keiths at redhat dot com>, gdb-patches at sourceware dot org
Date: Wed, 8 Nov 2017 16:14:39 +0000
Subject: Re: [PATCH 26/40] Optimize .gdb_index symbol name searching
Authentication-results: sourceware.org; auth=none
References: <1496406158-12663-1-git-send-email-palves@redhat.com> <1496406158-12663-27-git-send-email-palves@redhat.com> <e81dbc08-c253-6547-a3fa-07225242ab50@redhat.com>
On 08/08/2017 09:31 PM, Keith Seitz wrote:
> On 06/02/2017 05:22 AM, Pedro Alves wrote:
> 
>> I got, before the previous patch (-O2, x86-64):
>>
>>  real    0m1.773s
>>  user    0m1.737s
>>  sys     0m0.040s
>>
>> and after this patch:
>>
>>  real    0m1.361s
>>  user    0m1.315s
>>  sys     0m0.040s
> 
> The results on my computer are slightly more dramatic, running with no
> optimization, your test case (using Fedora 21 system gdb w/index debuginfo)
> goes from about 15 seconds down to about 2.5 seconds. Very nice!
> 
>> That resulted in 1351355 name_components.  Each entry takes 8 bytes,
>> so that's 10810840 bytes (ignoring std::vector overhead), or ~10.3 MB.
>> That's IMO too small to worry about, given GDB was using over 7400MB
>> total at that point.  I.e., we're talking about 0.1% increase.
> 
> Indeed. I'd sacrifice that kind of memory for the kind of speed increase
> you've achieved -- in a heartbeat!
> 
>> with only 8-bit and 32-bit tables, that'd be:
>>
>>  1349057 * 1 + 2298 * 4 + 4 * 1351355 = 6763669 bytes, or ~6.5MB.
>>
>> I don't think we need to bother though.
> 
> I'm all for memory usage optimization and whatnot, but since the benefit is
> so small (55% of these new tables saved but only 0.06% of total memory),
> I prefer simplicity. So you won't get anything but agreement from me on this.
> 
>> I also timed:
>>
>>  $ time gdb --batch -q -p `pidof firefox`
>>  $ time gdb --batch -q -p `pidof firefox` -ex "b main"
>>  $ time gdb --batch -q -p `pidof firefox` -ex "set max-completion unlimited" -ex "complete b "
> 
> I'd like to reproduce this, but my computer is incapable of running this test.
> I'll take your word for it. ;-)

:-)

> 
>> gdb/ChangeLog:
>> yyyy-mm-dd  Pedro Alves  <palves@redhat.com>
>>
>> 	* dwarf2read.c 
>> 	(mapped_index::name_components): New field.
>> 	(mapped_index::symbol_name_at): New method.
> 
> Silly nit: Isn't the form most are using "(tag name) <field>: New field."?
> I know I've relied on this several times to find changes in the ChangeLog.

That made sense in C, since <part> is used to specify the part
of (foo) that changed:

  https://www.gnu.org/prep/standards/html_node/Indicating-the-Part-Changed.html#Indicating-the-Part-Changed

... and '::' does not exist in C.  But it exists in C++, and it
just feels natural to use it, since that's the way in C++
you'd name the 'foo' being referred to in '(foo)'.  If you look
at the GCCs ChangeLogs, you'll see they've been using this
form too:

 grep "::" gcc/ChangeLog  libstdc++-v3/ChangeLog

>> 	(create_addrmap_from_index): Call mapped_index ctor.
> 
> I don't see any changes to this function in the patch -- attributed to wrong
> function?

Indeed.  It should have been dwarf2_read_index.  Fixed, thanks.

> 
>> diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
>> index f523326..e955131 100644
>> --- a/gdb/dwarf2read.c
>> +++ b/gdb/dwarf2read.c
>> @@ -178,6 +178,51 @@ DEF_VEC_I (offset_type);
> [snip]
>> +
>> +/* An index into a (C++) symbol name component in a symbol name as
>> +   recorded in the mapped_index's symbol table.  For each C++ symbol
>> +   in the symbol table, we record one entry for the start of each
>> +   component in the symbol in a table of name components, and then
>> +   sort the table, in order to be able to binary search symbol names,
>> +   ignoring leading namespaces, both completion and regular look up.
>> +   For example, for symbol "A::B::C", we'll have an entry that points
>> +   to "A::B::C", another that points to "B::C", and another for "C".
>> +   Note that function symbols in GDB index have no parameter
>> +   information, just the function/method names.  You can convert a
>> +   name_component to a "const char *" using the
>> +   'mapped_index::symbol_name_at(offset_type)' method.  */
> 
> missing nl?
> 

Fixed.

>> +struct name_component
>> +{
>> +  /* Offset in the symbol name where the component starts.  Stored as
>> +     a (32-bit) offset instead of a pointer to save memory and improve
>> +     locality on 64-bit architectures.  */
>> +  offset_type name_offset;
>> +
>> +  /* The symbol's index in the symbol and constant pool tables of a
>> +     mapped_index.  */
>> +  offset_type idx;
>> +};
>> +
>>  /* A description of the mapped index.  The file format is described in
>>     a comment by the code that writes the index.  */
>>  struct mapped_index
>> @@ -3390,6 +3424,7 @@ dwarf2_read_index (struct objfile *objfile)
>>    create_addrmap_from_index (objfile, &local_map);
>>  
>>    map = XOBNEW (&objfile->objfile_obstack, struct mapped_index);
>> +  map = new (map) mapped_index ();
>>    *map = local_map;
>>  
>>    dwarf2_per_objfile->index_table = map;
> 
> This function (dwarf2_read_index) is not mentioned in the ChangeLog. Could
> this actually be incorrectly listed in the ChangeLog under
> create_addrmap_from_index?

Indeed.  Fixed.

> 
>> @@ -4095,6 +4134,22 @@ gdb_index_symbol_name_matcher::matches (const char *symbol_name)
>>  }
>>  
>>  static void
>> +dw2_expand_marked_cus
>> +  (mapped_index &index, offset_type idx,
>> +   struct objfile *objfile,
>> +   gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
>> +   gdb::function_view<expand_symtabs_exp_notify_ftype> expansion_notify,
>> +   search_domain kind);
>> +
>> +static void
>> +dw2_expand_symtabs_matching_symbol
>> +  (mapped_index &index,
>> +   const lookup_name_info &lookup_name_in,
>> +   gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
>> +   enum search_domain kind,
>> +   gdb::function_view<void (offset_type)> on_match);
> 
> Isn't it rather unusual for us to have forward decls in the middle of a file?
> 

Right, I should have mentioned -- I'd like to move
the new dw2_expand_symtabs_matching_symbol and dw2_expand_marked_cus
above dw2_expand_symtabs_matching, but I didn't do that here to
make the patch easier to read and maintain.  Likewise with the
reindenting dw2_expand_marked_cus where I had marked with:

  /* XXX reindent code below before pushing.  */

I'll move the code around in the following patch (a new patch),
getting rid of the forward declarations.  Since I'm doing that
as a separate patch, it was also easier to leave the
reindentation to that patch too.

>> +
>> +static void
>>  dw2_expand_symtabs_matching
>>    (struct objfile *objfile,
>>     gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
>> @@ -4186,30 +4239,214 @@ dw2_expand_symtabs_matching
> [snip]
>> +static void
>> +dw2_expand_symtabs_matching_symbol
>> +  (mapped_index &index,
>> +   const lookup_name_info &lookup_name,
>> +   gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
>> +   enum search_domain kind,
>> +   gdb::function_view<void (offset_type)> match_callback)
>> +{
> [snip]
>> +
>> +      /* Sort name_comp elements by name.   */
> 
> I presume that "name_comp" is really "name_components"?

Indeed.  Fixed.

> I admit, I'm a little surprised by the number of steps involved here: push
> every element in the range into a vector, sort, then de-dup & perform callback.
> Had I implemented this, my initial attempt would have been to use a htab_up
> and take the sorting time on insertion.
> I can imagine that for very large ranges, your approach could outperform
> an htab; do you expect these ranges to be that large? Have I overlooked
> something? [I'm just curious. Not suggesting any changes need to be made.]

Yeah, this is a hot path in completion lookup, and I'm trying to squeeze
every cycle here.

But the complexity is also needed because for completion, we need the symbols
sorted, while an htab doesn't have a predictable iteration order.
We want to find the lower / upper bounds starting with a needle that
doesn't exist in the name_components vector/set.  E.g., looking for what 
symbols complete "fun", when you have symbols named 
{... "bar", "func", "function", "xfunc" ... },
we want to consider "func", "function" as potential matches without
wasting time with "bar" and "xfunc" (and others).  

So while using something like a std::set instead of a std::vector + sort
would take the sorting out of view, I don't think that'd simplify that
much, while it'd certainly perform worse.
(See <https://stackoverflow.com/questions/15638024/stdsort-vs-inserting-into-an-stdset>
for example.)

For the record, below's what I pushed.

>From 3f563c840a2c891ec2868b3e08bfaecb6f7aa57f Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Wed, 8 Nov 2017 14:22:32 +0000
Subject: [PATCH] Optimize .gdb_index symbol name searching

As mentioned in the previous patch, .gdb_index name lookup got
significantly slower with the previous patch.

This patch addresses that, and in the process makes .gdb_index name
searching faster than what we had before the previous patch, even.
Using the same test:

 $ cat script.cmd
 set pagination off
 set $count = 0
 while $count < 400
   complete b string_prin
   printf "count = %d\n", $count
   set $count = $count + 1
 end

 $ time gdb --batch -q ./gdb-with-index -ex "source script.cmd"

I got, before the previous patch (-O2, x86-64):

 real    0m1.773s
 user    0m1.737s
 sys     0m0.040s

and after this patch:

 real    0m1.361s
 user    0m1.315s
 sys     0m0.040s

The basic idea here is simple: instead of always iterating over all
the symbol names in the index, we build an accelerator/sorted name
table and binary search names in it.

Later in the series, we'll want to support wild matching for C++ too,
so this mechanism already considers that.  For example, say that
you're looking up functions/methods named "func", no matter the
containing namespace/class.  If we sorted the table by qualified name,
then we obviously wouldn't be able to find those symbols with a binary
search:

  func
  ns1::a::b::func
  ns1::b::func
  ns2::func

(function symbol names in .gdb_index have no parameter info, like psymbols)

To address that out, we put an entry for each name component in the
sorted table.  something like this:

  Table Entry       Actual symbol
  ---------------------------------
  func              func

  func              ns1::a::b::func
  b::func           ns1::a::b::func
  a::b::func        ns1::a::b::func
  ns1::a::b::func   ns1::a::b::func

  func              ns1::b::func
  b::func           ns1::b::func
  ns1::b::func      ns1::b::func

  func              ns2::func
  ns2::func         ns2::func

Which sorted results in this:

  Table Entry       Actual symbol
  ---------------------------------
  a::b::func        ns1::a::b::func
  b::func           ns1::a::b::func
  b::func           ns1::b::func
  func              func
  func              ns1::a::b::func
  func              ns1::b::func
  func              ns2::func
  ns1::a::b::func   ns1::a::b::func
  ns1::b::func      ns1::b::func
  ns2::func         ns2::func

And we can binary search this.

Note that a binary search approach works for both completion and
regular lookup, while a name hashing approach only works for normal
symbol looking, since obviously "fun" and "func" have different
hashes.

At first I was a bit wary of these tables potentially growing GDB's
memory significantly.  But I did an experiment that convinced it's not
a worry at all.  I hacked gdb to count the total number of entries in
all the tables, attached that gdb to my system/Fedora's Firefox
(Fedora's debug packages uses .gdb_index), did "set max-completions
unlimited", and then hit "b [TAB]" to cause everything to expand.

That resulted in 1351355 name_components.  Each entry takes 8 bytes,
so that's 10810840 bytes (ignoring std::vector overhead), or ~10.3 MB.
That's IMO too small to worry about, given GDB was using over 7400MB
total at that point.  I.e., we're talking about 0.1% increase.

dw2_expand_symtabs_matching unit tests covering this will be added in
a follow up patch.

If the size of this table turns out to be a concern, I have an idea to
reduce the size of the table further at the expense of a bit more code
-- the vast majority of the name offsets are either 0 or fit in
8-bits:

 total name_component = 1351355, of which,
 name_component::name_offset instances need  0 bits = 679531
 name_component::name_offset instances need  8 bits = 669526
 name_component::name_offset instances need 16 bits = 2298
 name_component::name_offset instances need 32 bits = 0
 name_component::idx instances need 0 bits  = 51
 name_component::idx instances need 8 bits  = 8361
 name_component::idx instances need 16 bits = 280329
 name_component::idx instances need 32 bits = 1062614

so we could have separate tables for 0 name_offset, 8-bit name_offset
and 32-bit name_offset.  That'd give us roughly:

 679531 * 0 + 669526 * 1 + 2298 * 4 + 1062614 * 4 = 4929174, or ~4.7MB

with only 8-bit and 32-bit tables, that'd be:

 1349057 * 1 + 2298 * 4 + 4 * 1351355 = 6763669 bytes, or ~6.5MB.

I don't think we need to bother though.

I also timed:

 $ time gdb --batch -q -p `pidof firefox`
 $ time gdb --batch -q -p `pidof firefox` -ex "b main"
 $ time gdb --batch -q -p `pidof firefox` -ex "set max-completion unlimited" -ex "complete b "

and compared before previous patch vs this patch, and I didn't see a
significant difference, seemingly because time to read debug info
dominates.  The "complete b " variant of the test takes ~2min
currently...  (I have a follow up series that speeds that up
somewhat.)

gdb/ChangeLog:
2017-11-08  Pedro Alves  <palves@redhat.com>

	* dwarf2read.c (byte_swap, MAYBE_SWAP): Move higher up in file.
	(struct name_component): New.
	(mapped_index::name_components): New field.
	(mapped_index::symbol_name_at): New method.
	(dwarf2_read_index): Call mapped_index ctor.
	(dw2_map_matching_symbols): Add comment about name_components
	table.
	(dw2_expand_symtabs_matching): Factor part to...
	(dw2_expand_symtabs_matching_symbol): ... this new function.
	Build name components table, and lookup symbols in it before
	calling the name matcher.
	(dw2_expand_marked_cus): New, factored out from
	dw2_expand_symtabs_matching.
	(dwarf2_per_objfile_free): Call the mapped_index's dtor.
---
 gdb/ChangeLog    |  17 +++
 gdb/dwarf2read.c | 323 +++++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 298 insertions(+), 42 deletions(-)

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 06ababc..deb1614 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,20 @@
+2017-11-08  Pedro Alves  <palves@redhat.com>
+
+	* dwarf2read.c (byte_swap, MAYBE_SWAP): Move higher up in file.
+	(struct name_component): New.
+	(mapped_index::name_components): New field.
+	(mapped_index::symbol_name_at): New method.
+	(dwarf2_read_index): Call mapped_index ctor.
+	(dw2_map_matching_symbols): Add comment about name_components
+	table.
+	(dw2_expand_symtabs_matching): Factor part to...
+	(dw2_expand_symtabs_matching_symbol): ... this new function.
+	Build name components table, and lookup symbols in it before
+	calling the name matcher.
+	(dw2_expand_marked_cus): New, factored out from
+	dw2_expand_symtabs_matching.
+	(dwarf2_per_objfile_free): Call the mapped_index's dtor.
+
 2017-11-08   Pedro Alves  <palves@redhat.com>
 
 	* ada-lang.c (ada_encode): Rename to ..
diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
index f37d51f..6f88091 100644
--- a/gdb/dwarf2read.c
+++ b/gdb/dwarf2read.c
@@ -182,6 +182,53 @@ DEF_VEC_I (offset_type);
     GDB_INDEX_CU_SET_VALUE((cu_index), (value)); \
   } while (0)
 
+#if WORDS_BIGENDIAN
+
+/* Convert VALUE between big- and little-endian.  */
+
+static offset_type
+byte_swap (offset_type value)
+{
+  offset_type result;
+
+  result = (value & 0xff) << 24;
+  result |= (value & 0xff00) << 8;
+  result |= (value & 0xff0000) >> 8;
+  result |= (value & 0xff000000) >> 24;
+  return result;
+}
+
+#define MAYBE_SWAP(V)  byte_swap (V)
+
+#else
+#define MAYBE_SWAP(V) static_cast<offset_type> (V)
+#endif /* WORDS_BIGENDIAN */
+
+/* An index into a (C++) symbol name component in a symbol name as
+   recorded in the mapped_index's symbol table.  For each C++ symbol
+   in the symbol table, we record one entry for the start of each
+   component in the symbol in a table of name components, and then
+   sort the table, in order to be able to binary search symbol names,
+   ignoring leading namespaces, both completion and regular look up.
+   For example, for symbol "A::B::C", we'll have an entry that points
+   to "A::B::C", another that points to "B::C", and another for "C".
+   Note that function symbols in GDB index have no parameter
+   information, just the function/method names.  You can convert a
+   name_component to a "const char *" using the
+   'mapped_index::symbol_name_at(offset_type)' method.  */
+
+struct name_component
+{
+  /* Offset in the symbol name where the component starts.  Stored as
+     a (32-bit) offset instead of a pointer to save memory and improve
+     locality on 64-bit architectures.  */
+  offset_type name_offset;
+
+  /* The symbol's index in the symbol and constant pool tables of a
+     mapped_index.  */
+  offset_type idx;
+};
+
 /* A description of the mapped index.  The file format is described in
    a comment by the code that writes the index.  */
 struct mapped_index
@@ -206,6 +253,15 @@ struct mapped_index
 
   /* A pointer to the constant pool.  */
   const char *constant_pool;
+
+  /* The name_component table (a sorted vector).  See name_component's
+     description above.  */
+  std::vector<name_component> name_components;
+
+  /* Convenience method to get at the name of the symbol at IDX in the
+     symbol table.  */
+  const char *symbol_name_at (offset_type idx) const
+  { return this->constant_pool + MAYBE_SWAP (this->symbol_table[idx]); }
 };
 
 typedef struct dwarf2_per_cu_data *dwarf2_per_cu_ptr;
@@ -2160,26 +2216,6 @@ line_header_eq_voidp (const void *item_lhs, const void *item_rhs)
 }
 
 
-#if WORDS_BIGENDIAN
-
-/* Convert VALUE between big- and little-endian.  */
-static offset_type
-byte_swap (offset_type value)
-{
-  offset_type result;
-
-  result = (value & 0xff) << 24;
-  result |= (value & 0xff00) << 8;
-  result |= (value & 0xff0000) >> 8;
-  result |= (value & 0xff000000) >> 24;
-  return result;
-}
-
-#define MAYBE_SWAP(V)  byte_swap (V)
-
-#else
-#define MAYBE_SWAP(V) static_cast<offset_type> (V)
-#endif /* WORDS_BIGENDIAN */
 
 /* Read the given attribute value as an address, taking the attribute's
    form into account.  */
@@ -3444,6 +3480,7 @@ dwarf2_read_index (struct objfile *objfile)
   create_addrmap_from_index (objfile, &local_map);
 
   map = XOBNEW (&objfile->objfile_obstack, struct mapped_index);
+  map = new (map) mapped_index ();
   *map = local_map;
 
   dwarf2_per_objfile->index_table = map;
@@ -4070,7 +4107,11 @@ dw2_map_matching_symbols (struct objfile *objfile,
 
      Since each language has its own symbol name matching algorithm,
      and we don't know which language is the right one, we must match
-     each symbol against all languages.
+     each symbol against all languages.  This would be a potential
+     performance problem if it were not mitigated by the
+     mapped_index::name_components lookup table, which significantly
+     reduces the number of times we need to call into this matcher,
+     making it a non-issue.
 
    - Symbol names in the index have no overload (parameter)
      information.  I.e., in C++, "foo(int)" and "foo(long)" both
@@ -4148,6 +4189,22 @@ gdb_index_symbol_name_matcher::matches (const char *symbol_name)
 }
 
 static void
+dw2_expand_marked_cus
+  (mapped_index &index, offset_type idx,
+   struct objfile *objfile,
+   gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
+   gdb::function_view<expand_symtabs_exp_notify_ftype> expansion_notify,
+   search_domain kind);
+
+static void
+dw2_expand_symtabs_matching_symbol
+  (mapped_index &index,
+   const lookup_name_info &lookup_name_in,
+   gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
+   enum search_domain kind,
+   gdb::function_view<void (offset_type)> on_match);
+
+static void
 dw2_expand_symtabs_matching
   (struct objfile *objfile,
    gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
@@ -4158,14 +4215,12 @@ dw2_expand_symtabs_matching
 {
   int i;
   offset_type iter;
-  struct mapped_index *index;
 
   dw2_setup (objfile);
 
   /* index_table is NULL if OBJF_READNOW.  */
   if (!dwarf2_per_objfile->index_table)
     return;
-  index = dwarf2_per_objfile->index_table;
 
   if (file_matcher != NULL)
     {
@@ -4239,30 +4294,212 @@ dw2_expand_symtabs_matching
 	}
     }
 
-  gdb_index_symbol_name_matcher lookup_name_matcher (lookup_name);
+  mapped_index &index = *dwarf2_per_objfile->index_table;
 
-  for (iter = 0; iter < index->symbol_table_slots; ++iter)
+  dw2_expand_symtabs_matching_symbol (index, lookup_name,
+				      symbol_matcher,
+				      kind, [&] (offset_type idx)
     {
-      offset_type idx = 2 * iter;
-      const char *name;
-      offset_type *vec, vec_len, vec_idx;
-      int global_seen = 0;
+      dw2_expand_marked_cus (index, idx, objfile, file_matcher,
+			     expansion_notify, kind);
+    });
+}
 
-      QUIT;
+/* Helper for dw2_expand_symtabs_matching that works with a
+   mapped_index instead of the containing objfile.  This is split to a
+   separate function in order to be able to unit test the
+   name_components matching using a mock mapped_index.  For each
+   symbol name that matches, calls MATCH_CALLBACK, passing it the
+   symbol's index in the mapped_index symbol table.  */
 
-      if (index->symbol_table[idx] == 0 && index->symbol_table[idx + 1] == 0)
-	continue;
+static void
+dw2_expand_symtabs_matching_symbol
+  (mapped_index &index,
+   const lookup_name_info &lookup_name,
+   gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
+   enum search_domain kind,
+   gdb::function_view<void (offset_type)> match_callback)
+{
+  gdb_index_symbol_name_matcher lookup_name_matcher
+    (lookup_name);
+
+  auto *name_cmp = case_sensitivity == case_sensitive_on ? strcmp : strcasecmp;
+
+  /* Build the symbol name component sorted vector, if we haven't yet.
+     The code below only knows how to break apart components of C++
+     symbol names (and other languages that use '::' as
+     namespace/module separator).  If we add support for wild matching
+     to some language that uses some other operator (E.g., Ada, Go and
+     D use '.'), then we'll need to try splitting the symbol name
+     according to that language too.  Note that Ada does support wild
+     matching, but doesn't currently support .gdb_index.  */
+  if (index.name_components.empty ())
+    {
+      for (size_t iter = 0; iter < index.symbol_table_slots; ++iter)
+	{
+	  offset_type idx = 2 * iter;
+
+	  if (index.symbol_table[idx] == 0
+	      && index.symbol_table[idx + 1] == 0)
+	    continue;
+
+	  const char *name = index.symbol_name_at (idx);
+
+	  /* Add each name component to the name component table.  */
+	  unsigned int previous_len = 0;
+	  for (unsigned int current_len = cp_find_first_component (name);
+	       name[current_len] != '\0';
+	       current_len += cp_find_first_component (name + current_len))
+	    {
+	      gdb_assert (name[current_len] == ':');
+	      index.name_components.push_back ({previous_len, idx});
+	      /* Skip the '::'.  */
+	      current_len += 2;
+	      previous_len = current_len;
+	    }
+	  index.name_components.push_back ({previous_len, idx});
+	}
 
-      name = index->constant_pool + MAYBE_SWAP (index->symbol_table[idx]);
+      /* Sort name_components elements by name.  */
+      auto name_comp_compare = [&] (const name_component &left,
+				    const name_component &right)
+	{
+	  const char *left_qualified = index.symbol_name_at (left.idx);
+	  const char *right_qualified = index.symbol_name_at (right.idx);
+
+	  const char *left_name = left_qualified + left.name_offset;
+	  const char *right_name = right_qualified + right.name_offset;
+
+	  return name_cmp (left_name, right_name) < 0;
+	};
+
+      std::sort (index.name_components.begin (),
+		 index.name_components.end (),
+		 name_comp_compare);
+    }
+
+  const char *cplus
+    = lookup_name.cplus ().lookup_name ().c_str ();
 
-      if (!lookup_name_matcher.matches (name)
-	  || (symbol_matcher != NULL && !symbol_matcher (name)))
+  /* Comparison function object for lower_bound that matches against a
+     given symbol name.  */
+  auto lookup_compare_lower = [&] (const name_component &elem,
+				   const char *name)
+    {
+      const char *elem_qualified = index.symbol_name_at (elem.idx);
+      const char *elem_name = elem_qualified + elem.name_offset;
+      return name_cmp (elem_name, name) < 0;
+    };
+
+  /* Comparison function object for upper_bound that matches against a
+     given symbol name.  */
+  auto lookup_compare_upper = [&] (const char *name,
+				   const name_component &elem)
+    {
+      const char *elem_qualified = index.symbol_name_at (elem.idx);
+      const char *elem_name = elem_qualified + elem.name_offset;
+      return name_cmp (name, elem_name) < 0;
+    };
+
+  auto begin = index.name_components.begin ();
+  auto end = index.name_components.end ();
+
+  /* Find the lower bound.  */
+  auto lower = [&] ()
+    {
+      if (lookup_name.completion_mode () && cplus[0] == '\0')
+	return begin;
+      else
+	return std::lower_bound (begin, end, cplus, lookup_compare_lower);
+    } ();
+
+  /* Find the upper bound.  */
+  auto upper = [&] ()
+    {
+      if (lookup_name.completion_mode ())
+	{
+	  /* The string frobbing below won't work if the string is
+	     empty.  We don't need it then, anyway -- if we're
+	     completing an empty string, then we want to iterate over
+	     the whole range.  */
+	  if (cplus[0] == '\0')
+	    return end;
+
+	  /* In completion mode, increment the last character because
+	     we want UPPER to point past all symbols names that have
+	     the same prefix.  */
+	  std::string after = cplus;
+
+	  gdb_assert (after.back () != 0xff);
+	  after.back ()++;
+
+	  return std::upper_bound (lower, end, after.c_str (),
+				   lookup_compare_upper);
+	}
+      else
+	return std::upper_bound (lower, end, cplus, lookup_compare_upper);
+    } ();
+
+  /* Now for each symbol name in range, check to see if we have a name
+     match, and if so, call the MATCH_CALLBACK callback.  */
+
+  /* The same symbol may appear more than once in the range though.
+     E.g., if we're looking for symbols that complete "w", and we have
+     a symbol named "w1::w2", we'll find the two name components for
+     that same symbol in the range.  To be sure we only call the
+     callback once per symbol, we first collect the symbol name
+     indexes that matched in a temporary vector and ignore
+     duplicates.  */
+  std::vector<offset_type> matches;
+  matches.reserve (std::distance (lower, upper));
+
+  for (;lower != upper; ++lower)
+    {
+      const char *qualified = index.symbol_name_at (lower->idx);
+
+      if (!lookup_name_matcher.matches (qualified)
+	  || (symbol_matcher != NULL && !symbol_matcher (qualified)))
 	continue;
 
-      /* The name was matched, now expand corresponding CUs that were
-	 marked.  */
-      vec = (offset_type *) (index->constant_pool
-			     + MAYBE_SWAP (index->symbol_table[idx + 1]));
+      matches.push_back (lower->idx);
+    }
+
+  std::sort (matches.begin (), matches.end ());
+
+  /* Finally call the callback, once per match.  */
+  ULONGEST prev = -1;
+  for (offset_type idx : matches)
+    {
+      if (prev != idx)
+	{
+	  match_callback (idx);
+	  prev = idx;
+	}
+    }
+
+  /* Above we use a type wider than idx's for 'prev', since 0 and
+     (offset_type)-1 are both possible values.  */
+  static_assert (sizeof (prev) > sizeof (offset_type), "");
+}
+
+/* Helper for dw2_expand_matching symtabs.  Called on each symbol
+   matched, to expand corresponding CUs that were marked.  IDX is the
+   index of the symbol name that matched.  */
+
+static void
+dw2_expand_marked_cus
+  (mapped_index &index, offset_type idx,
+   struct objfile *objfile,
+   gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
+   gdb::function_view<expand_symtabs_exp_notify_ftype> expansion_notify,
+   search_domain kind)
+{
+  const char *name;
+  offset_type *vec, vec_len, vec_idx;
+  bool global_seen = false;
+
+      vec = (offset_type *) (index.constant_pool
+			     + MAYBE_SWAP (index.symbol_table[idx + 1]));
       vec_len = MAYBE_SWAP (vec[0]);
       for (vec_idx = 0; vec_idx < vec_len; ++vec_idx)
 	{
@@ -4278,7 +4515,7 @@ dw2_expand_symtabs_matching
 	     and indices >= 7 may elide them for certain symbols
 	     (gold does this).  */
 	  int attrs_valid =
-	    (index->version >= 7
+	    (index.version >= 7
 	     && symbol_kind != GDB_INDEX_SYMBOL_KIND_NONE);
 
 	  /* Work around gold/15646.  */
@@ -4287,7 +4524,7 @@ dw2_expand_symtabs_matching
 	      if (!is_static && global_seen)
 		continue;
 	      if (!is_static)
-		global_seen = 1;
+		global_seen = true;
 	    }
 
 	  /* Only check the symbol's kind if it has one.  */
@@ -4338,7 +4575,6 @@ dw2_expand_symtabs_matching
 		}
 	    }
 	}
-    }
 }
 
 /* A helper for dw2_find_pc_sect_compunit_symtab which finds the most specific
@@ -23362,6 +23598,9 @@ dwarf2_per_objfile_free (struct objfile *objfile, void *d)
 
   if (data->dwz_file && data->dwz_file->dwz_bfd)
     gdb_bfd_unref (data->dwz_file->dwz_bfd);
+
+  if (data->index_table != NULL)
+    data->index_table->~mapped_index ();
 }
 
 
-- 
2.5.5
Follow-Ups:
- [pushed] Reorder/reindent dw2_expand_symtabs_matching & friends (Re: [PATCH 26/40] Optimize .gdb_index symbol name searching)
  - From: Pedro Alves
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]