Bug 11843 - ld long link times due to compute_bucket_count() choosing hash table size
Summary: ld long link times due to compute_bucket_count() choosing hash table size
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: binutils (show other bugs)
Version: 2.21
: P2 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-26 21:33 UTC by Todd Veldhuizen
Modified: 2010-08-12 16:24 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Todd Veldhuizen 2010-07-26 21:33:20 UTC
In bfd/elflink.c, in compute_bucket_count(), there is a $\Theta(n^2)$ loop (with
n=nsyms) to find a good hash table size.  This is okay for n=50, but we have
links taking 15 minutes for a library with n=162099 symbols.  The code in
compute_bucket_count() is a bit pointless for large values of n.  

Perhaps a different strategy for n > 1024 is needed, e.g., just return size
nsyms/2, or choose the best of (say) sizes n0...n0+k where n0 is the target size
(with k=4 for example, just to eliminate the possibility that the first choice
is astronomically unlucky).
Comment 1 H.J. Lu 2010-07-28 14:41:34 UTC
Can you provide a testcase?
Comment 2 Todd Veldhuizen 2010-07-28 22:05:52 UTC
Here is a C++ program that will create a bunch of .c files, compile and then
link them, hitting the bug.  It links very quickly without the -O1 option, and
with the -O1 option it takes a very long time.


---==== cut-here

#include <stdlib.h>
#include <stdio.h>
#include <string>
#include <map>

int main()
{
   const int num_symbols = 199933;
   const int ref_symbol_delta = 37199;
   const int num_files = 25;
   char tbuf[128];

   srand(1);

   std::string objfiles;
   int ref_symbol = 0;
   int i = 0;
   for (int j=0; j < num_files; ++j)
   {
      int symbols_ceil = (num_symbols*(j+1))/num_files;
      sprintf(tbuf, "many_symbols%03d.c", j+1);
      FILE* dst = fopen(tbuf, "w");

      std::map<int,int> sites;

      for (; i < symbols_ceil; ++i)
      {
         ref_symbol = (ref_symbol + ref_symbol_delta) % num_symbols;
         sites[i] = ref_symbol;
         fprintf(dst, "void foo%d();\n", ref_symbol);
      }

      for (std::map<int,int>::iterator iter = sites.begin();
         iter != sites.end(); ++iter)
      {
         fprintf(dst, "void foo%d() { foo%d(); }\n", iter->first,
            iter->second);
      }
      fclose(dst);

      sprintf(tbuf, "gcc -c -fPIC many_symbols%03d.c -o many_symbols%03d.o",
         j+1, j+1);
      printf("%s\n", tbuf);
      system(tbuf);

      sprintf(tbuf, "many_symbols%03d.o", j+1);
      objfiles += ((j > 0) ? " " : "") + std::string(tbuf);
   }

   system("time ld -O1 many_symbols???.o -shared -o libxxx.so");
}

---==== cut-here
Comment 3 Todd Veldhuizen 2010-07-28 22:09:47 UTC
Here some tracing for compute_bucket_count() that illustrates the futility of
searching large numbers of hash table sizes.

Index: bfd/elflink.c
===================================================================
RCS file: /cvs/src/src/bfd/elflink.c,v
retrieving revision 1.372
diff -r1.372 elflink.c
31a32
> #include <sys/time.h>
5345a5347,5354
> 
> static double wallTime(void)
> {
>    struct timeval t;
>    gettimeofday(&t, 0);
>    return t.tv_sec + t.tv_usec/1000000.0;
> }
> 
5361a5371,5373
>   double startTime;
> 
>   startTime = wallTime();
5368a5381
>       long insert_count;
5377a5391,5392
>       insert_count=0;
> 
5419a5435,5436
>           insert_count += nsyms;
> 
5458a5476,5481
>               printf("elapsed=%7.4lfs size=%ld max=%lf (%lf%%) after %ld
inserts\n", 
>                  wallTime() - startTime,
>                  (long)i, 
>                  (double)max, ((double)max-best_chlen)/best_chlen*100.0, 
>                  insert_count);
>               insert_count=0;
5461a5485,5491
>           else if (insert_count > 1000000000L)
>           {
>             printf("elapsed=%7.4lfs No improvement after %ld inserts\n",
>                wallTime() - startTime,
>                insert_count);
>             insert_count=0;
>           }




Its output for the test case, on my machine, shows that after trying the first
three hash table sizes, no improvements of more than 1% occur, and after the
first 200 hash table sizes, no improvements whatsoever occur.

-bash-3.2$ ld -O1 many_symbols*.o -shared -o libxxx.so 
elapsed= 0.0045s size=99984 max=43186095512.000000 (-100.000000%) after 399936
inserts
elapsed= 0.0087s size=99985 max=34095909512.000000 (-21.048872%) after 399936
inserts
elapsed= 0.0129s size=99986 max=33926687032.000000 (-0.496313%) after 399936 inserts
elapsed= 0.0170s size=99987 max=33836351808.000000 (-0.266266%) after 399936 inserts
elapsed= 0.0586s size=99997 max=33830992776.000000 (-0.015838%) after 3999360
inserts
elapsed= 0.0628s size=99998 max=33828995144.000000 (-0.005905%) after 399936 inserts
elapsed= 0.0836s size=100003 max=33823059872.000000 (-0.017545%) after 1999680
inserts
elapsed= 0.1293s size=100014 max=33802046320.000000 (-0.062128%) after 4399296
inserts
elapsed= 0.1625s size=100022 max=33791424296.000000 (-0.031424%) after 3199488
inserts
elapsed= 0.1833s size=100027 max=33740062104.000000 (-0.151998%) after 1999680
inserts
elapsed= 0.2167s size=100035 max=33725848184.000000 (-0.042128%) after 3199488
inserts
elapsed= 0.4254s size=100085 max=33716109728.000000 (-0.028875%) after 19996800
inserts
elapsed= 0.5258s size=100109 max=33682995136.000000 (-0.098216%) after 9598464
inserts
elapsed= 0.6515s size=100139 max=33669972112.000000 (-0.038663%) after 11998080
inserts
elapsed= 0.7270s size=100157 max=33644252600.000000 (-0.076387%) after 7198848
inserts
elapsed=11.1804s No improvement after 1000239936 inserts
elapsed=21.6568s No improvement after 1000239936 inserts
elapsed=32.1744s No improvement after 1000239936 inserts
elapsed=42.7346s No improvement after 1000239936 inserts
elapsed=53.3344s No improvement after 1000239936 inserts
elapsed=63.9556s No improvement after 1000239936 inserts
elapsed=74.6303s No improvement after 1000239936 inserts
elapsed=85.3426s No improvement after 1000239936 inserts
elapsed=96.0896s No improvement after 1000239936 inserts
elapsed=106.8647s No improvement after 1000239936 inserts
elapsed=117.6750s No improvement after 1000239936 inserts
elapsed=128.5248s No improvement after 1000239936 inserts
elapsed=139.4285s No improvement after 1000239936 inserts
elapsed=150.3337s No improvement after 1000239936 inserts
elapsed=161.2662s No improvement after 1000239936 inserts
elapsed=172.2233s No improvement after 1000239936 inserts
elapsed=183.2224s No improvement after 1000239936 inserts
elapsed=194.2606s No improvement after 1000239936 inserts
elapsed=205.3407s No improvement after 1000239936 inserts
elapsed=216.4314s No improvement after 1000239936 inserts
elapsed=227.5811s No improvement after 1000239936 inserts
elapsed=238.7598s No improvement after 1000239936 inserts
elapsed=249.9598s No improvement after 1000239936 inserts
etc.

Comment 4 Todd Veldhuizen 2010-07-28 22:23:05 UTC
Here is a conservative patch.

Index: bfd/elflink.c
===================================================================
RCS file: /cvs/src/src/bfd/elflink.c,v
retrieving revision 1.372
diff -u -r1.372 elflink.c
--- bfd/elflink.c       14 Apr 2010 08:29:33 -0000      1.372
+++ bfd/elflink.c       28 Jul 2010 22:21:53 -0000
@@ -5374,6 +5374,7 @@
       const struct elf_backend_data *bed = get_elf_backend_data (dynobj);
       unsigned long int *counts;
       bfd_size_type amt;
+      unsigned int no_improvement_count = 0;
 
       /* Possible optimization parameters: if we have NSYMS symbols we say
         that the hashing table must at least have NSYMS/4 and at most
@@ -5458,7 +5459,10 @@
            {
              best_chlen = max;
              best_size = i;
+              no_improvement_count = 0;
            }
+          else if (++no_improvement_count == 100)
+            break;
        }
 
       free (counts);
Comment 5 H.J. Lu 2010-07-30 22:35:53 UTC
(In reply to comment #4)
> Here is a conservative patch.
> 
> Index: bfd/elflink.c
> ===================================================================
> RCS file: /cvs/src/src/bfd/elflink.c,v
> retrieving revision 1.372
> diff -u -r1.372 elflink.c
> --- bfd/elflink.c       14 Apr 2010 08:29:33 -0000      1.372
> +++ bfd/elflink.c       28 Jul 2010 22:21:53 -0000
> @@ -5374,6 +5374,7 @@
>        const struct elf_backend_data *bed = get_elf_backend_data (dynobj);
>        unsigned long int *counts;
>        bfd_size_type amt;
> +      unsigned int no_improvement_count = 0;
>  
>        /* Possible optimization parameters: if we have NSYMS symbols we say
>          that the hashing table must at least have NSYMS/4 and at most
> @@ -5458,7 +5459,10 @@
>             {
>               best_chlen = max;
>               best_size = i;
> +              no_improvement_count = 0;
>             }
> +          else if (++no_improvement_count == 100)
> +            break;
>         }
>  
>        free (counts);
> 

It looks good to me. Please post it to binutils mailing list. Thanks.
Comment 6 H.J. Lu 2010-07-30 23:11:01 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > Here is a conservative patch.
> > 
> > Index: bfd/elflink.c
> > ===================================================================
> > RCS file: /cvs/src/src/bfd/elflink.c,v
> > retrieving revision 1.372
> > diff -u -r1.372 elflink.c
> > --- bfd/elflink.c       14 Apr 2010 08:29:33 -0000      1.372
> > +++ bfd/elflink.c       28 Jul 2010 22:21:53 -0000
> > @@ -5374,6 +5374,7 @@
> >        const struct elf_backend_data *bed = get_elf_backend_data (dynobj);
> >        unsigned long int *counts;
> >        bfd_size_type amt;
> > +      unsigned int no_improvement_count = 0;
> >  
> >        /* Possible optimization parameters: if we have NSYMS symbols we say
> >          that the hashing table must at least have NSYMS/4 and at most
> > @@ -5458,7 +5459,10 @@
> >             {
> >               best_chlen = max;
> >               best_size = i;
> > +              no_improvement_count = 0;
> >             }
> > +          else if (++no_improvement_count == 100)
> > +            break;
> >         }
> >  
> >        free (counts);
> > 
> 
> It looks good to me. Please post it to binutils mailing list. Thanks.

My second though. This patch gives up way too early.
Comment 7 H.J. Lu 2010-07-31 16:13:55 UTC
(In reply to comment #6)
> > 
> > It looks good to me. Please post it to binutils mailing list. Thanks.
> 
> My second though. This patch gives up way too early.

100 is good enough. We may spend more time with little gain.
Comment 8 Sourceware Commits 2010-08-12 16:23:50 UTC
Subject: Bug 11843

CVSROOT:	/cvs/src
Module name:	src
Changes by:	nickc@sourceware.org	2010-08-12 16:23:33

Modified files:
	bfd            : ChangeLog elflink.c 

Log message:
	PR ld/11843
	* elflink.c (compute_bucket_count): Avoid futile long searches for
	the best bucket size.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/ChangeLog.diff?cvsroot=src&r1=1.5101&r2=1.5102
http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/elflink.c.diff?cvsroot=src&r1=1.373&r2=1.374

Comment 9 Nick Clifton 2010-08-12 16:24:47 UTC
Hi Todd,

  I have checked in your patch - thanks very much for creating it.

Cheers
  Nick

bfd/ChangeLog
2010-08-12  Todd Veldhuizen  <todd.veldhuizen@logicblox.com>

	PR ld/11843
	* elflink.c (compute_bucket_count): Avoid futile long searches for
	the best bucket size.