*From*: Momchil Velikov <velco at fadata dot bg>
*Date*: 25 Oct 2002 10:48:10 +0300
*Subject*: Re: Csum and csum copyroutines benchmark

>>>>> "Denis" == Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> writes: Denis> /me said: >> I'm experimenting with different csum_ routines in userspace now. Denis> Short conclusion: Denis> 1. It is possible to speed up csum routines for AMD processors by 30%. Denis> 2. It is possible to speed up csum_copy routines for both AMD and Intel Denis> three times or more. Roy, do you like that? ;) Additional data point: Short summary: 1. Checksum - kernelpii_csum is ~19% faster 2. Copy - lernelpii_csum is ~6% faster Dual Pentium III, 1266Mhz, 512K cache, 2G SDRAM (133Mhz, ECC) The only changes I made were to decrease the buffer size to 1K (as I think this is more representative to a network packet size, correct me if I'm wrong) and increase the runs to 1024. Max values are worthless indeed. Csum benchmark program buffer size: 1 K Each test tried 1024 times, max and min CPU cycles are reported. Please disregard max values. They are due to system interference only. csum tests: kernel_csum - took 941 max, 740 min cycles per kb. sum=0x44000077 kernel_csum - took 748 max, 742 min cycles per kb. sum=0x44000077 kernel_csum - took 60559 max, 742 min cycles per kb. sum=0x44000077 kernelpii_csum - took 52804 max, 601 min cycles per kb. sum=0x44000077 kernelpiipf_csum - took 12930 max, 601 min cycles per kb. sum=0x44000077 pfm_csum - took 10161 max, 1402 min cycles per kb. sum=0x44000077 pfm2_csum - took 864 max, 838 min cycles per kb. sum=0x44000077 copy tests: kernel_copy - took 339 max, 239 min cycles per kb. sum=0x44000077 kernel_copy - took 239 max, 239 min cycles per kb. sum=0x44000077 kernel_copy - took 239 max, 239 min cycles per kb. sum=0x44000077 kernelpii_copy - took 244 max, 225 min cycles per kb. sum=0x44000077 ntqpf_copy - took 10867 max, 512 min cycles per kb. sum=0x44000077 ntqpfm_copy - took 710 max, 403 min cycles per kb. sum=0x44000077 ntq_copy - took 4535 max, 443 min cycles per kb. sum=0x44000077 ntqpf2_copy - took 563 max, 555 min cycles per kb. sum=0x44000077 Done HOWEVER ... sometimes (say 1/30) I get the following output: Csum benchmark program buffer size: 1 K Each test tried 1024 times, max and min CPU cycles are reported. Please disregard max values. They are due to system interference only. csum tests: kernel_csum - took 958 max, 740 min cycles per kb. sum=0x44000077 kernel_csum - took 748 max, 740 min cycles per kb. sum=0x44000077 kernel_csum - took 752 max, 740 min cycles per kb. sum=0x44000077 kernelpii_csum - took 624 max, 600 min cycles per kb. sum=0x44000077 kernelpiipf_csum - took 877211 max, 601 min cycles per kb. sum=0x44000077 Bad sum Aborted which is to say that pfm_csum and pfm2_csum results are not to be trusted (at least on PIII (or my kernel CONFIG_MPENTIUMIII=y config?)). ~velco

