This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Csum and csum copyroutines benchmark


>>>>> "Denis" == Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> writes:

Denis> /me said:
>> I'm experimenting with different csum_ routines in userspace now.

Denis> Short conclusion: 
Denis> 1. It is possible to speed up csum routines for AMD processors by 30%.
Denis> 2. It is possible to speed up csum_copy routines for both AMD and Intel
Denis>    three times or more. Roy, do you like that? ;)

Additional data point:

Short summary:
1. Checksum - kernelpii_csum is ~19% faster
2. Copy - lernelpii_csum is ~6% faster

Dual Pentium III, 1266Mhz, 512K cache, 2G SDRAM (133Mhz, ECC)

The only changes I made were to decrease the buffer size to 1K (as I
think this is more representative to a network packet size, correct me
if I'm wrong) and increase the runs to 1024. Max values are worthless
indeed.


Csum benchmark program
buffer size: 1 K
Each test tried 1024 times, max and min CPU cycles are reported.
Please disregard max values. They are due to system interference only.
csum tests:
                     kernel_csum - took   941 max,  740 min cycles per kb. sum=0x44000077
                     kernel_csum - took   748 max,  742 min cycles per kb. sum=0x44000077
                     kernel_csum - took 60559 max,  742 min cycles per kb. sum=0x44000077
                  kernelpii_csum - took 52804 max,  601 min cycles per kb. sum=0x44000077
                kernelpiipf_csum - took 12930 max,  601 min cycles per kb. sum=0x44000077
                        pfm_csum - took 10161 max, 1402 min cycles per kb. sum=0x44000077
                       pfm2_csum - took   864 max,  838 min cycles per kb. sum=0x44000077
copy tests:
                     kernel_copy - took   339 max,  239 min cycles per kb. sum=0x44000077
                     kernel_copy - took   239 max,  239 min cycles per kb. sum=0x44000077
                     kernel_copy - took   239 max,  239 min cycles per kb. sum=0x44000077
                  kernelpii_copy - took   244 max,  225 min cycles per kb. sum=0x44000077
                      ntqpf_copy - took 10867 max,  512 min cycles per kb. sum=0x44000077
                     ntqpfm_copy - took   710 max,  403 min cycles per kb. sum=0x44000077
                        ntq_copy - took  4535 max,  443 min cycles per kb. sum=0x44000077
                     ntqpf2_copy - took   563 max,  555 min cycles per kb. sum=0x44000077
Done


HOWEVER ...

sometimes (say 1/30) I get the following output:

Csum benchmark program
buffer size: 1 K
Each test tried 1024 times, max and min CPU cycles are reported.
Please disregard max values. They are due to system interference only.
csum tests:
                     kernel_csum - took   958 max,  740 min cycles per kb. sum=0x44000077
                     kernel_csum - took   748 max,  740 min cycles per kb. sum=0x44000077
                     kernel_csum - took   752 max,  740 min cycles per kb. sum=0x44000077
                  kernelpii_csum - took   624 max,  600 min cycles per kb. sum=0x44000077
                kernelpiipf_csum - took 877211 max,  601 min cycles per kb. sum=0x44000077
Bad sum
Aborted

which is to say that pfm_csum and pfm2_csum results are not to be
trusted (at least on PIII (or my kernel CONFIG_MPENTIUMIII=y
config?)).

~velco


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]