[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[committed] Avoid htab_expand for off_htab



Hi,

Every time the number of elements in a hash table from hashtab.c grow beyond
75% occupancy, the hash table grows in size.

Using a debug patch to print this growth for off_htab, we get:
...
$ dwz cc1 -o 1 -lnone
Allocating htab with size            131071
Growing htab from size   131071 to   524287
Growing htab from size   524287 to  2097143
Growing htab from size  2097143 to  4194301
Growing htab from size  4194301 to 16777213
$ dwz vmlinux -o 1 -lnone
Allocating htab with size            131071
Growing htab from size   131071 to   524287
Growing htab from size   524287 to  2097143
Growing htab from size  2097143 to  4194301
Growing htab from size  4194301 to 16777213
Growing htab from size 16777213 to 67108859
...

There is a cost associated to this growing.

Every time the hash table grows:
- a new array is allocated, after which the old array is is freed.  This can
  cause memory fragmentation.
- the elements need to be copied from the old to the new array.

Also, hash table collisions will be higher due to operating on a hash table
with on average higher occupancy.

In the case of off_htab, we can estimate the final size and use that as
initial size, which skips the growth costs.

I've tested this on cc1 and vmlinux, and the gain is ~6%:
...
cc1:
real:  mean:  6985.50  100.00%  stddev:  34.55
       mean:  6531.40   93.50%  stddev:  18.82
user:  mean:  6816.70  100.00%  stddev:  30.84
       mean:  6374.00   93.51%  stddev:  28.43
sys:   mean:   167.20  100.00%  stddev:  17.77
       mean:   156.80   93.78%  stddev:  17.05

vmlinux:
real:  mean:  30748.60  100.00%  stddev:  377.31
       mean:  28773.90   93.58%  stddev:   61.33
user:  mean:  30338.40  100.00%  stddev:  139.17
       mean:  28499.70   93.94%  stddev:   49.44
sys:   mean:    336.50  100.00%  stddev:  102.51
       mean:    272.00   80.83%  stddev:   23.55
...

As for the collisions (measured using another debug patch), there's a clear
reduction in collision rate for both cc1 (-0.13):
...
 $ dwz cc1 -o 1 -lnone 2>&1 | sort -V | tail -n 1
-searches: 34256212, collisions: 0.684419, occupancy: 0.607308, line: 11031
+searches: 34256212, collisions: 0.551075, occupancy: 0.607308, line: 11086
...
as well as vmlinux (-0.57):
...
 $ dwz vmlinux -o 1 -lnone 2>&1 | sort -V | tail -n 1
-searches: 83198411, collisions: 0.672008, occupancy: 0.260028, line: 11031
+searches: 83198411, collisions: 0.100454, occupancy: 0.260028, line: 11086
...

To confirm, I've tried a yet larger example, clang (with cc1 at 10.2 million
DIEs, vmlinux at 17.5 million DIES, and clang at 111.4 million DIEs):
...
real:  mean:  127261.67  100.00%  stddev:  422.04
       mean:  112483.33   88.39%  stddev:  443.20
user:  mean:  116865.67  100.00%  stddev:  101.59
       mean:  102581.33   87.78%  stddev:  344.42
sys:   mean:    5881.33  100.00%  stddev:  261.95
       mean:    5218.00   88.72%  stddev:  144.81
...
Here we see an even higher performance gain, of ~11%.

We don't do this optimization when operating in low-mem or optimize_multifile
mode.  In either mode, not all DIEs will be present in off_htab after parsing
the entire .debug_info, so it's hard to estimate what the final size will be.

Also, we don't do this optimization if we're suspecting that we'll run into
the low-mem limit (again using the estimated number of DIEs).  There's no
need to allocate that much memory if we don't think we'll end up using all
of it.

Likewise, we don't do this optimization if we're suspecting that we'll run into
the max-die limit (again using the estimated number of DIEs).

Committed to trunk.

Thanks,
- Tom

Avoid htab_expand for off_htab

2019-11-20  Tom de Vries  <tdevries@suse.de>

	* dwz.c (off_htab_add_die): Allocate off_htab with estimated final
	size.

---
 dwz.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/dwz.c b/dwz.c
index db134ab..84a7080 100644
--- a/dwz.c
+++ b/dwz.c
@@ -1279,7 +1279,21 @@ off_htab_add_die (dw_cu_ref cu, dw_die_ref die)
 
   if (off_htab == NULL)
     {
-      off_htab = htab_try_create (100000, off_hash, off_eq, NULL);
+      unsigned int estimated_nr_dies = estimate_nr_dies ();
+      size_t default_initial_size = 100000;
+      size_t initial_size;
+      if (low_mem
+	  || op_multifile
+	  || estimated_nr_dies >= low_mem_die_limit
+	  || estimated_nr_dies >= max_die_limit)
+	initial_size = default_initial_size;
+      else
+	{
+	  size_t estimated_final_hashtab_size
+	    = emulate_htab (default_initial_size, estimated_nr_dies);
+	  initial_size = estimated_final_hashtab_size;
+	}
+      off_htab = htab_try_create (initial_size, off_hash, off_eq, NULL);
       if (off_htab == NULL)
 	dwz_oom ();
       if (rd_multifile)