This is the mail archive of the libc-alpha@sourceware.cygnus.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

[Various] libc/1712: Performance loss with glibc-2.1.[23] vs 2.1.1



Hi,

we've got the appended bug report - and I don't really understand the
slow down.  I can even reproduce it somehow on my system (Pentium III
with 500 Mhz):

The installed glibc 2.1.3:
$ time ./pr1712 
 F(n)=  1.44779895E+09  2.72745061E+09  1.94683947E+09

real    1m2.668s
user    2m2.200s
sys     0m0.060s

Current glibc 2.2 CVS version:
$ time LD_LIBRARY_PATH=.:math:elf elf/ld-linux.so.2  /tmp/pr/pr1712
 F(n)=  1.44779895E+09  2.72745061E+09  1.94683947E+09

real    0m20.489s
user    0m39.720s
sys     0m0.060s

Has anybody an idea what's broken or what else to test?

Andreas



Topics:
   libc/1712: Performance loss with glibc-2.1.[23] vs 2.1.1
   Re: libc/1712: Performance loss with glibc-2.1.[23] vs 2.1.1


----------------------------------------------------------------------

Date: Sun, 30 Apr 2000 09:32:35 +0200
From: Milan Hodoscek <milan@ala.cmm.ki.si>
To: bugs@gnu.org
Subject: libc/1712: Performance loss with glibc-2.1.[23] vs 2.1.1
Message-Id: <200004300732.e3U7WZc07141@ala.cmm.ki.si>


>Number:         1712
>Category:       libc
>Synopsis:       Performance loss with glibc-2.1.[23] vs 2.1.1
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    libc-gnats
>State:          open
>Class:          sw-bug
>Submitter-Id:   unknown
>Arrival-Date:   Sun Apr 30 03:40:02 EDT 2000
>Last-Modified:
>Originator:     Milan Hodoscek
>Organization:
>
>Release:        libc-2.1.3
>Environment:
	
Host type: i386-pc-linux-gnu
System: Linux ala 2.3.99-pre6 #1 Thu Apr 27 12:44:42 CEST 2000 i686 unknown
Architecture: i686

Addons: crypt linuxthreads nss-v1

Build CC: gcc
Compiler version: 2.95.2 20000220 (Debian GNU/Linux)
Kernel headers: UTS_RELEASE
Symbol versioning: yes
Build static: yes
Build shared: yes
Build pic-default: no
Build profile: yes
Build omitfp: no
Build bounded: no
Build static-nss: no
Stdio: libio

>Description:
	

I am working with the program CHARMM (~0.5M lines, molecular dynamics)
and I noticed there are some performance problems with it when using
both 2.1.2 and 2.1.3 versions of glibc. With glibc-2.1.1 it works at
full speed (3 times faster!!). Since the program is pretty big and
there is no easy model of this performance problem I had to profile it
and found few routines which show this performance problem with the
new library (2.1.[23] vs 2.1.1). I picked the smallest one and
simplified it so it can be run as a standalone program. I am sure it
is possible to optimize this small model program, but it would be more
difficult in the real situation. I just want to show that there is a
problem with the library no matter how badly the program is written.

This is what I tested:

I am using Debian (woody) and machine I tested on is PII-450MHz. I
tried also Athlon-700MHz and the performance difference is similar. I
also tried a variety of optimization options, and different compilers
(pgcc). The difference is always significant and only when I switch
the libraries. I also recompiled glibc-2.1.3 on my own with all the
optimization flags on and the problem is still there.

>How-To-Repeat:
	

So if you run the program below with the glibc-2.1.1 then the timing
is like this:

ala:~/test $ time gs1
 F(n)=  1.44779895E+09  2.72745061E+09  1.94683947E+09

real	0m22.105s
user	0m22.100s
sys	0m0.010s

if it runs with glibc-2.1.3, then it is:

ala:~/test $ time gs1
 F(n)=  1.44779895E+09  2.72745061E+09  1.94683947E+09

real	1m8.860s
user	1m7.720s
sys	0m0.760s


I compiled the program wit the following script:

#! /bin/sh -x

FFLAGS="-O6 -m486 -malign-double -ffast-math -fomit-frame-pointer -funroll-loops -funroll-all-loops -mcpu=pentiumpro -march=pentiumpro -ffloat-store -fforce-mem -frerun-cse-after-loop -fexpensive-optimizations -fugly-complex -fno-backslash -fno-globals -Wno-globals"

g77 -o gs1 $FFLAGS gs1.f

And the program itself is this:

C     This is to figure out why libc6-2.1.2 is so slow ????
C     The most simplified version made after GRAD_SUM from nbonds/pme.src
C
      implicit none
C
      INTEGER ORDER,max,iseed
      parameter (max=100,order=200)
      REAL*8 ZERO,RECIP(9)
      REAL*8 FX(max),FY(max),FZ(max)
      REAL*8 THETA1(ORDER,max),THETA2(ORDER,max),
     $     THETA3(ORDER,max),CHARGE(max)
      REAL*8 DTHETA1(ORDER,max),DTHETA2(ORDER,max),
     $     DTHETA3(ORDER,max)
      REAL*8 Q(max)
C
      integer ig,igood,nfft1,nfft2,nfft3
      REAL*8 VAL1,VAL2,VAL3,VAL1A,VAL2A,VAL3A
      INTEGER IPT1,IPT2,IPT3
C
      INTEGER N,ITH1,ITH2,ITH3,I,J,K
      REAL*8 F1,F2,F3,TERM,CFACT,random
C
      CFACT=300.0d0
      igood=max
      zero=0.0d0
      nfft1=64
      nfft2=64
      nfft3=64
      iseed=1
C
C     Initialize all the arrays with the random numbers...
C
      do i=1,9
         recip(i)=random(iseed)
      enddo
C
      do i=1,max
         q(i)=random(iseed)
         charge(i)=random(iseed)
         do j=1,order
            theta1(j,i)=random(iseed)
            dtheta1(j,i)=random(iseed)
            theta2(j,i)=random(iseed)
            dtheta2(j,i)=random(iseed)
            theta3(j,i)=random(iseed)
            dtheta3(j,i)=random(iseed)
         enddo
      enddo
C
      ipt2=0
      do ig = 1,igood
         n=ig
         F1 = ZERO
         F2 = ZERO
         F3 = ZERO
         ipt2=ipt2+1
C     
         DO ITH3 = 1,ORDER
            VAL1A = CHARGE(N) * NFFT1 * THETA3(ITH3,ig)
            VAL2A = CHARGE(N) * NFFT2 * THETA3(ITH3,ig)
            VAL3A = CHARGE(N) * NFFT3 * DTHETA3(ITH3,ig)
C     
            DO ITH2 = 1,ORDER
C     
               VAL1= VAL1A * THETA2(ITH2,ig)
               VAL2= VAL2A * DTHETA2(ITH2,ig)
               VAL3= VAL3A * THETA2(ITH2,ig)
C     
               DO ITH1 = 1,ORDER
C
                  F1 = F1 - VAL1 * Q(IPT2) * DTHETA1(ITH1,ig)
                  F2 = F2 - VAL2 * Q(IPT2) * THETA1(ITH1,ig)
                  F3 = F3 - VAL3 * Q(IPT2) * THETA1(ITH1,ig)
C     
               ENDDO 
            ENDDO
         ENDDO
C     
         FX(N) = FX(N) - CFACT*(RECIP(1)*F1+RECIP(4)*F2+RECIP(7)*F3)
         FY(N) = FY(N) - CFACT*(RECIP(2)*F1+RECIP(5)*F2+RECIP(8)*F3)
         FZ(N) = FZ(N) - CFACT*(RECIP(3)*F1+RECIP(6)*F2+RECIP(9)*F3)
      ENDDO
C
      write(*,*)'F(n)=',fx(max),fy(max),fz(max)
C
      RETURN
      END
C
      REAL*8 FUNCTION RANDOM(ISEED)
      implicit none
      INTEGER ISEED
      REAL*8 DSEED,DIVIS,DENOM,MULTIP
C
      DATA  DIVIS/2147483647.D0/
      DATA  DENOM /2147483711.D0/
      DATA  MULTIP/16807.D0/
C
      IF(ISEED.LE.1) ISEED=314159
      DSEED=MULTIP*ISEED
      DSEED=MOD(DSEED,DIVIS)
      RANDOM=DSEED/DENOM
      ISEED=DSEED
C
      RETURN
      END
>Fix:
	
>Audit-Trail:
>Unformatted:


------------------------------

Date: 30 Apr 2000 13:53:43 +0200
From: Andreas Jaeger <aj@arthur.rhein-neckar.de>
To: Milan Hodoscek <milan@ala.cmm.ki.si>
Cc: bugs@gnu.org
Subject: Re: libc/1712: Performance loss with glibc-2.1.[23] vs 2.1.1
Message-ID: <u8vh0z4xfc.fsf@gromit.rhein-neckar.de>
References: <200004300732.e3U7WZc07141@ala.cmm.ki.si>
Content-Type: text/plain; charset=us-ascii

>>>>> Milan Hodoscek writes:

>> Number:         1712
>> Category:       libc
>> Synopsis:       Performance loss with glibc-2.1.[23] vs 2.1.1
[...]	

 > I am working with the program CHARMM (~0.5M lines, molecular dynamics)
 > and I noticed there are some performance problems with it when using
 > both 2.1.2 and 2.1.3 versions of glibc. With glibc-2.1.1 it works at
 > full speed (3 times faster!!). Since the program is pretty big and
 > there is no easy model of this performance problem I had to profile it
 > and found few routines which show this performance problem with the
 > new library (2.1.[23] vs 2.1.1). I picked the smallest one and
 > simplified it so it can be run as a standalone program. I am sure it
 > is possible to optimize this small model program, but it would be more
 > difficult in the real situation. I just want to show that there is a
 > problem with the library no matter how badly the program is written.

 > This is what I tested:

 > I am using Debian (woody) and machine I tested on is PII-450MHz. I
 > tried also Athlon-700MHz and the performance difference is similar. I
 > also tried a variety of optimization options, and different compilers
 > (pgcc). The difference is always significant and only when I switch
 > the libraries. I also recompiled glibc-2.1.3 on my own with all the
 > optimization flags on and the problem is still there.

I don't have both libraries installed on my system and therefore can't
test this.

I profiled the usage of libc.so and libm.so of your program to see
which functions are the culprit with:

$ LD_PROFILE=libc.so.6 ./pr1712
$ sprof /lib/libc.so.6 /var/tmp/libc.so.6.profile
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
100.00      0.02     0.02   120212     0.17           isnan
  0.00      0.02     0.00       56     0.00           __overflow
  0.00      0.02     0.00       51     0.00           strncmp
  0.00      0.02     0.00       19     0.00           __errno_location
  0.00      0.02     0.00       16     0.00           flockfile
  0.00      0.02     0.00       16     0.00           funlockfile
[...]

(libm showed even less output)

So most of the time is actually spent in your program - and not in
glibc.  

It might be that the stack is somehow misaglined for doubles and
therefore you get such a slowdown - but looking at our glibc changes
between glibc 2.1.2 and 2.1.3 I don't see any change which might
affect this.

I would appreciate if you could try to locate the exact cause of this
slowdown.

I'll forward your email and my comments to the glibc list.

Andreas
- - 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.rhein-neckar.de

------------------------------

End of forward_IAP9F Digest
***************************



-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.rhein-neckar.de

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]