This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Massive performance regression of glibc string functions
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Petr Baudis <pasky at suse dot cz>
- Cc: drepper at sourceware dot org, libc-alpha at sourceware dot org, matz at suse dot de
- Date: Fri, 6 Nov 2009 10:20:41 -0700
- Subject: Re: Massive performance regression of glibc string functions
- References: <20091106130409.GH3708@machine.or.cz>
I am using the rdtsc timing in glibc string tests. Here is strlen data on
Intel(R) Xeon(R) CPU X3350 @ 2.66GHz
strlen_2_11 builtin_strlen strlen in glibc 2.9
LAT: Pos 1, alignment 0: 8 16 16
LAT: Pos 2, alignment 0: 8 24 16
LAT: Pos 3, alignment 0: 8 24 16
LAT: Pos 4, alignment 0: 8 24 16
LAT: Pos 5, alignment 0: 8 24 16
LAT: Pos 6, alignment 0: 8 24 24
LAT: Pos 7, alignment 0: 8 24 16
LAT: Pos 1, alignment 1: 8 16 8
LAT: Pos 2, alignment 2: 8 24 16
LAT: Pos 3, alignment 3: 8 24 16
LAT: Pos 4, alignment 4: 8 32 24
LAT: Pos 5, alignment 5: 8 32 24
LAT: Pos 6, alignment 6: 16 32 24
LAT: Pos 7, alignment 7: 16 32 24
LAT: Pos 4, alignment 0: 8 24 16
LAT: Pos 4, alignment 1: 16 24 16
LAT: Pos 8, alignment 0: 8 24 16
LAT: Pos 8, alignment 1: 8 40 32
LAT: Pos 16, alignment 0: 16 24 24
LAT: Pos 16, alignment 1: 16 40 32
LAT: Pos 32, alignment 0: 16 32 24
LAT: Pos 32, alignment 1: 16 48 40
LAT: Pos 64, alignment 0: 24 40 40
LAT: Pos 64, alignment 1: 24 56 56
LAT: Pos 128, alignment 0: 32 64 64
LAT: Pos 128, alignment 1: 32 80 80
LAT: Pos 256, alignment 0: 56 136 128
LAT: Pos 256, alignment 1: 56 152 136
LAT: Pos 512, alignment 0: 96 264 256
LAT: Pos 512, alignment 1: 96 272 264
LAT: Pos 1024, alignment 0: 224 512 504
LAT: Pos 1024, alignment 1: 224 528 520
LAT: Pos 1, alignment 0: 8 16 16
LAT: Pos 2, alignment 0: 8 24 16
LAT: Pos 3, alignment 0: 8 24 16
LAT: Pos 4, alignment 0: 8 24 16
LAT: Pos 5, alignment 0: 8 24 16
LAT: Pos 6, alignment 0: 8 24 24
LAT: Pos 7, alignment 0: 8 24 16
LAT: Pos 1, alignment 1: 16 16 8
LAT: Pos 2, alignment 2: 8 24 16
LAT: Pos 3, alignment 3: 8 24 16
LAT: Pos 4, alignment 4: 8 32 24
LAT: Pos 5, alignment 5: 16 32 24
LAT: Pos 6, alignment 6: 8 32 24
LAT: Pos 7, alignment 7: 16 32 24
LAT: Pos 4, alignment 0: 8 24 16
LAT: Pos 4, alignment 1: 8 24 16
LAT: Pos 8, alignment 0: 8 24 16
LAT: Pos 8, alignment 1: 8 40 32
LAT: Pos 16, alignment 0: 16 24 24
LAT: Pos 16, alignment 1: 16 40 32
LAT: Pos 32, alignment 0: 16 32 24
LAT: Pos 32, alignment 1: 16 48 40
LAT: Pos 64, alignment 0: 24 40 40
LAT: Pos 64, alignment 1: 24 56 56
LAT: Pos 128, alignment 0: 32 64 64
LAT: Pos 128, alignment 1: 32 80 80
LAT: Pos 256, alignment 0: 56 136 128
LAT: Pos 256, alignment 1: 56 152 136
LAT: Pos 512, alignment 0: 96 264 256
LAT: Pos 512, alignment 1: 96 272 264
LAT: Pos 1024, alignment 0: 224 512 504
LAT: Pos 1024, alignment 1: 224 528 520
LAT: Pos 0, alignment 0: 8 16 16
LAT: Pos 1, alignment 0: 8 16 16
LAT: Pos 1, alignment 1: 8 16 8
LAT: Pos 2, alignment 0: 8 24 16
LAT: Pos 2, alignment 1: 16 24 8
LAT: Pos 2, alignment 2: 8 24 16
LAT: Pos 3, alignment 0: 8 24 16
LAT: Pos 3, alignment 1: 8 24 16
LAT: Pos 3, alignment 2: 16 24 16
LAT: Pos 3, alignment 3: 16 24 16
LAT: Pos 4, alignment 0: 8 24 16
LAT: Pos 4, alignment 1: 8 24 16
LAT: Pos 4, alignment 2: 16 24 16
LAT: Pos 4, alignment 3: 8 24 16
LAT: Pos 4, alignment 4: 16 32 24
LAT: Pos 5, alignment 0: 8 24 16
LAT: Pos 5, alignment 1: 8 32 24
LAT: Pos 5, alignment 2: 16 32 24
LAT: Pos 5, alignment 3: 16 32 24
LAT: Pos 5, alignment 4: 16 32 24
LAT: Pos 5, alignment 5: 8 32 24
LAT: Pos 6, alignment 0: 8 24 24
LAT: Pos 6, alignment 1: 16 32 24
LAT: Pos 6, alignment 2: 16 32 24
LAT: Pos 6, alignment 3: 8 32 24
LAT: Pos 6, alignment 4: 16 32 24
LAT: Pos 6, alignment 5: 16 32 24
LAT: Pos 6, alignment 6: 16 32 24
LAT: Pos 7, alignment 0: 8 24 16
LAT: Pos 7, alignment 1: 8 40 32
LAT: Pos 7, alignment 2: 16 32 32
LAT: Pos 7, alignment 3: 16 32 24
LAT: Pos 7, alignment 4: 8 32 24
LAT: Pos 7, alignment 5: 16 32 24
LAT: Pos 7, alignment 6: 8 32 24
LAT: Pos 7, alignment 7: 16 32 24
LAT: Pos 8, alignment 0: 8 24 16
LAT: Pos 8, alignment 1: 8 40 32
LAT: Pos 8, alignment 2: 16 32 32
LAT: Pos 8, alignment 3: 16 32 24
LAT: Pos 8, alignment 4: 8 32 32
LAT: Pos 8, alignment 5: 8 32 24
LAT: Pos 8, alignment 6: 8 32 24
LAT: Pos 8, alignment 7: 16 24 24
LAT: Pos 8, alignment 8: 16 24 16
LAT: Pos 9, alignment 0: 8 24 16
LAT: Pos 9, alignment 1: 16 40 32
LAT: Pos 9, alignment 2: 8 40 32
LAT: Pos 9, alignment 3: 16 32 24
LAT: Pos 9, alignment 4: 8 32 32
LAT: Pos 9, alignment 5: 16 32 24
LAT: Pos 9, alignment 6: 8 32 24
LAT: Pos 9, alignment 7: 16 24 16
LAT: Pos 9, alignment 8: 16 24 16
LAT: Pos 9, alignment 9: 8 40 32
LAT: Pos 10, alignment 0: 8 24 16
LAT: Pos 10, alignment 1: 16 40 32
LAT: Pos 10, alignment 2: 8 40 32
LAT: Pos 10, alignment 3: 16 40 32
LAT: Pos 10, alignment 4: 16 32 32
LAT: Pos 10, alignment 5: 8 32 24
LAT: Pos 10, alignment 6: 16 32 16
LAT: Pos 10, alignment 7: 16 24 24
LAT: Pos 10, alignment 8: 16 24 16
LAT: Pos 10, alignment 9: 16 40 32
LAT: Pos 10, alignment 10: 16 40 32
LAT: Pos 11, alignment 0: 8 24 16
LAT: Pos 11, alignment 1: 8 40 32
LAT: Pos 11, alignment 2: 8 40 32
LAT: Pos 11, alignment 3: 8 40 32
LAT: Pos 11, alignment 4: 8 32 32
LAT: Pos 11, alignment 5: 16 32 24
LAT: Pos 11, alignment 6: 16 32 24
LAT: Pos 11, alignment 7: 16 24 24
LAT: Pos 11, alignment 8: 16 24 16
LAT: Pos 11, alignment 9: 16 40 32
LAT: Pos 11, alignment 10: 16 40 32
LAT: Pos 11, alignment 11: 16 40 32
LAT: Pos 12, alignment 0: 8 24 16
LAT: Pos 12, alignment 1: 8 40 32
LAT: Pos 12, alignment 2: 8 40 32
LAT: Pos 12, alignment 3: 8 40 32
LAT: Pos 12, alignment 4: 16 32 32
LAT: Pos 12, alignment 5: 16 32 24
LAT: Pos 12, alignment 6: 16 32 24
LAT: Pos 12, alignment 7: 16 24 24
LAT: Pos 12, alignment 8: 16 24 16
LAT: Pos 12, alignment 9: 16 40 40
LAT: Pos 12, alignment 10: 16 40 32
LAT: Pos 12, alignment 11: 16 40 32
LAT: Pos 12, alignment 12: 16 32 32
LAT: Pos 13, alignment 0: 8 24 24
LAT: Pos 13, alignment 1: 8 40 40
LAT: Pos 13, alignment 2: 8 40 32
LAT: Pos 13, alignment 3: 16 32 32
LAT: Pos 13, alignment 4: 16 32 32
LAT: Pos 13, alignment 5: 16 32 24
LAT: Pos 13, alignment 6: 16 32 24
LAT: Pos 13, alignment 7: 16 24 24
LAT: Pos 13, alignment 8: 16 24 16
LAT: Pos 13, alignment 9: 16 40 40
LAT: Pos 13, alignment 10: 16 40 32
LAT: Pos 13, alignment 11: 16 32 32
LAT: Pos 13, alignment 12: 8 32 32
LAT: Pos 13, alignment 13: 16 32 24
LAT: Pos 14, alignment 0: 8 24 24
LAT: Pos 14, alignment 1: 16 40 32
LAT: Pos 14, alignment 2: 16 40 32
LAT: Pos 14, alignment 3: 16 32 32
LAT: Pos 14, alignment 4: 16 32 32
LAT: Pos 14, alignment 5: 16 32 24
LAT: Pos 14, alignment 6: 16 32 24
LAT: Pos 14, alignment 7: 16 32 24
LAT: Pos 14, alignment 8: 16 32 24
LAT: Pos 14, alignment 9: 16 40 32
LAT: Pos 14, alignment 10: 16 40 32
LAT: Pos 14, alignment 11: 16 40 32
LAT: Pos 14, alignment 12: 16 32 32
LAT: Pos 14, alignment 13: 16 32 24
LAT: Pos 14, alignment 14: 16 32 24
LAT: Pos 15, alignment 0: 8 24 24
LAT: Pos 15, alignment 1: 16 40 32
LAT: Pos 15, alignment 2: 16 40 32
LAT: Pos 15, alignment 3: 16 40 32
LAT: Pos 15, alignment 4: 16 32 32
LAT: Pos 15, alignment 5: 16 32 32
LAT: Pos 15, alignment 6: 16 32 32
LAT: Pos 15, alignment 7: 16 24 24
LAT: Pos 15, alignment 8: 16 24 24
LAT: Pos 15, alignment 9: 16 40 32
LAT: Pos 15, alignment 10: 16 40 32
LAT: Pos 15, alignment 11: 16 40 32
LAT: Pos 15, alignment 12: 8 32 32
LAT: Pos 15, alignment 13: 16 32 32
LAT: Pos 15, alignment 14: 16 32 32
LAT: Pos 15, alignment 15: 16 32 24
Data on memcmp and strcmp show similar results. The new ones
in glibc 2.11 are much better than the old ones in glibc 2.9.
If you believe there is a regression, please provide length as well
as alignments on input data. I will take a look.
Thanks.
H.J.
----
On Fri, Nov 6, 2009 at 6:04 AM, Petr Baudis <pasky@suse.cz> wrote:
> ?Hi!
>
> ?I have been doing some benchmarking of several string functions and
> discovered that some of them are *much* slower than in the past; the
> regressions are measured against glibc-2.9. I'm testing on small
> strings (4..128, though for 128 much bigger sample of calls would be
> needed for good comparison), following the common wisdom that operations
> on small strings are the bulk of the calls.
>
> ?In case of strlen(), there seems to be regression only with very small
> strings on AMD, so this is probably fine.
>
> ?In case of memcmp(), strcmp() and strncmp(), glibc-2.10.1 seems to
> improve performance somewhat especially for larger strings, but
> glibc-2.11 has massive performance drop across all vendors!
> (Interestingly, glibc-2.10.1 is also slightly slower than glibc-2.9 in
> these functions on Core i7.)
>
> ?In case of strcmp(), strncmp(), glibc-2.10.1 seems to improve performance
> somewhat especially for larger strings, but glibc-2.11 has massive
> performance drop on all vendors.
>
> ?I'd like to ask how the string routine changes were benchmarked,
> for what architectures and string sizes are they supposed to be
> optimized and why. I think it would be good to do something about this
> regression. ;-)
>
> ?For the benchmarking, I'm using
>
> ? ? ? ?http://pasky.or.cz/~pasky/dev/glibc/strbench/
>
> that I quickly hacked together. Here is the data I have collected
> on various x86_64 systems, running with 2048 iterations; apply
> reasonable error margins, of course:
>
>
> model name ? ? ?: AMD Opteron (tm) Processor 848
> cache size ? ? ?: 1024 KB
> flags ? ? ? ? ? : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow rep_good nopl
>
> fucn,size ? ? ? 2.9-vanilla ? ? 2.10.1-vanilla ?2.11-vanilla ? ?2.11-amd
> strlen4 ? ? ? ? 5.630000 ? ? ? ?6.890000 ? ? ? ?7.060000 ? ? ? ?5.660000
> strlen8 ? ? ? ? 4.940000 ? ? ? ?3.580000 ? ? ? ?3.700000 ? ? ? ?4.170000
> strlen32 ? ? ? ?2.220000 ? ? ? ?1.340000 ? ? ? ?1.490000 ? ? ? ?2.310000
> strlen128 ? ? ? 1.220000 ? ? ? ?0.830000 ? ? ? ?0.900000 ? ? ? ?1.330000
> memcmp4 ? ? ? ? 3.350000 ? ? ? ?3.330000 ? ? ? ?4.400000 ? ? ? ?3.310000
> memcmp8 ? ? ? ? 1.840000 ? ? ? ?1.740000 ? ? ? ?2.660000 ? ? ? ?2.140000
> memcmp32 ? ? ? ?0.970000 ? ? ? ?0.800000 ? ? ? ?1.770000 ? ? ? ?1.300000
> memcmp128 ? ? ? 0.330000 ? ? ? ?0.310000 ? ? ? ?1.050000 ? ? ? ?0.650000
> strcmp4 ? ? ? ? 2.400000 ? ? ? ?2.290000 ? ? ? ?5.620000 ? ? ? ?2.470000
> strcmp8 ? ? ? ? 1.600000 ? ? ? ?1.280000 ? ? ? ?3.260000 ? ? ? ?1.560000
> strcmp32 ? ? ? ?0.950000 ? ? ? ?0.600000 ? ? ? ?1.630000 ? ? ? ?0.870000
> strcmp128 ? ? ? 0.350000 ? ? ? ?0.210000 ? ? ? ?1.010000 ? ? ? ?0.310000
> strncmp4 ? ? ? ?2.560000 ? ? ? ?2.250000 ? ? ? ?5.880000 ? ? ? ?2.960000
> strncmp8 ? ? ? ?1.400000 ? ? ? ?1.410000 ? ? ? ?3.230000 ? ? ? ?1.700000
> strncmp32 ? ? ? 0.710000 ? ? ? ?0.770000 ? ? ? ?1.370000 ? ? ? ?0.940000
> strncmp128 ? ? ?0.270000 ? ? ? ?0.270000 ? ? ? ?0.670000 ? ? ? ?0.350000
>
>
> model name ? ? ?: Dual Core AMD Opteron(tm) Processor 165
> cache size ? ? ?: 1024 KB
> flags ? ? ? ? ? : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy
>
> func,size ? ? ? 2.9-vanilla ? ? 2.10.1-vanilla ?2.11-vanilla ? ?2.11-amd
> strlen4 ? ? ? ? 6.780000 ? ? ? ?8.350000 ? ? ? ?8.580000 ? ? ? ?6.850000
> strlen8 ? ? ? ? 5.920000 ? ? ? ?4.300000 ? ? ? ?4.420000 ? ? ? ?5.010000
> strlen32 ? ? ? ?2.570000 ? ? ? ?1.440000 ? ? ? ?1.430000 ? ? ? ?2.660000
> strlen128 ? ? ? 1.260000 ? ? ? ?0.910000 ? ? ? ?0.850000 ? ? ? ?1.240000
> memcmp4 ? ? ? ? 3.960000 ? ? ? ?4.040000 ? ? ? ?5.160000 ? ? ? ?2.840000
> memcmp8 ? ? ? ? 2.020000 ? ? ? ?2.060000 ? ? ? ?3.000000 ? ? ? ?1.890000
> memcmp32 ? ? ? ?0.770000 ? ? ? ?0.720000 ? ? ? ?1.350000 ? ? ? ?0.980000
> memcmp128 ? ? ? 0.260000 ? ? ? ?0.240000 ? ? ? ?0.540000 ? ? ? ?0.430000
> strcmp4 ? ? ? ? 2.740000 ? ? ? ?2.750000 ? ? ? ?6.790000 ? ? ? ?2.910000
> strcmp8 ? ? ? ? 1.410000 ? ? ? ?1.410000 ? ? ? ?3.600000 ? ? ? ?1.620000
> strcmp32 ? ? ? ?0.630000 ? ? ? ?0.580000 ? ? ? ?1.260000 ? ? ? ?0.700000
> strcmp128 ? ? ? 0.200000 ? ? ? ?0.180000 ? ? ? ?0.620000 ? ? ? ?0.230000
> strncmp4 ? ? ? ?3.080000 ? ? ? ?2.720000 ? ? ? ?7.180000 ? ? ? ?3.540000
> strncmp8 ? ? ? ?1.580000 ? ? ? ?1.440000 ? ? ? ?3.940000 ? ? ? ?1.880000
> strncmp32 ? ? ? 0.720000 ? ? ? ?0.670000 ? ? ? ?1.310000 ? ? ? ?0.840000
> strncmp128 ? ? ?0.240000 ? ? ? ?0.220000 ? ? ? ?0.550000 ? ? ? ?0.280000
>
>
> model name ? ? ?: Intel(R) Xeon(R) CPU ? ? ? ? ? X3220 ?@ 2.40GHz
> cache size ? ? ?: 4096 KB
> flags ? ? ? ? ? : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>
> func,size ? ? ? 2.9-vanilla ? ? 2.10.1-vanilla ?2.11-vanilla ? ?2.11-amd
> strlen4 ? ? ? ? 3.870000 ? ? ? ?3.050000 ? ? ? ?3.270000 ? ? ? ?3.870000
> strlen8 ? ? ? ? 2.370000 ? ? ? ?1.530000 ? ? ? ?1.640000 ? ? ? ?3.450000
> strlen32 ? ? ? ?1.040000 ? ? ? ?0.480000 ? ? ? ?0.470000 ? ? ? ?1.520000
> strlen128 ? ? ? 0.600000 ? ? ? ?0.290000 ? ? ? ?0.280000 ? ? ? ?0.680000
> memcmp4 ? ? ? ? 2.080000 ? ? ? ?2.260000 ? ? ? ?2.680000 ? ? ? ?1.800000
> memcmp8 ? ? ? ? 1.040000 ? ? ? ?1.130000 ? ? ? ?1.460000 ? ? ? ?1.860000
> memcmp32 ? ? ? ?0.270000 ? ? ? ?0.270000 ? ? ? ?0.350000 ? ? ? ?0.770000
> memcmp128 ? ? ? 0.070000 ? ? ? ?0.070000 ? ? ? ?0.090000 ? ? ? ?0.190000
> strcmp4 ? ? ? ? 1.910000 ? ? ? ?1.910000 ? ? ? ?3.480000 ? ? ? ?1.920000
> strcmp8 ? ? ? ? 0.960000 ? ? ? ?0.950000 ? ? ? ?1.200000 ? ? ? ?0.960000
> strcmp32 ? ? ? ?0.240000 ? ? ? ?0.240000 ? ? ? ?0.290000 ? ? ? ?0.240000
> strcmp128 ? ? ? 0.060000 ? ? ? ?0.060000 ? ? ? ?0.080000 ? ? ? ?0.060000
> strncmp4 ? ? ? ?2.030000 ? ? ? ?1.690000 ? ? ? ?4.240000 ? ? ? ?2.810000
> strncmp8 ? ? ? ?1.020000 ? ? ? ?0.850000 ? ? ? ?1.610000 ? ? ? ?1.410000
> strncmp32 ? ? ? 0.260000 ? ? ? ?0.210000 ? ? ? ?0.380000 ? ? ? ?0.360000
> strncmp128 ? ? ?0.070000 ? ? ? ?0.060000 ? ? ? ?0.100000 ? ? ? ?0.080000
>
>
> model name ? ? ?: Intel(R) Core(TM)2 Duo CPU ? ? E8400 ?@ 3.00GHz
> cache size ? ? ?: 6144 KB
> flags ? ? ? ? ? : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority
>
> func,size ? ? ? 2.9-vanilla ? ? 2.10.1-vanilla ?2.11-vanilla ? ?2.11-amd
> strlen4 ? ? ? ? 3.090000 ? ? ? ?2.960000 ? ? ? ?2.750000 ? ? ? ?3.450000
> strlen8 ? ? ? ? 1.890000 ? ? ? ?1.230000 ? ? ? ?1.360000 ? ? ? ?3.140000
> strlen32 ? ? ? ?0.810000 ? ? ? ?0.370000 ? ? ? ?0.340000 ? ? ? ?1.220000
> strlen128 ? ? ? 0.460000 ? ? ? ?0.220000 ? ? ? ?0.200000 ? ? ? ?0.660000
> memcmp4 ? ? ? ? 2.160000 ? ? ? ?1.820000 ? ? ? ?2.500000 ? ? ? ?1.800000
> memcmp8 ? ? ? ? 1.100000 ? ? ? ?0.910000 ? ? ? ?1.500000 ? ? ? ?1.170000
> memcmp32 ? ? ? ?0.310000 ? ? ? ?0.220000 ? ? ? ?0.320000 ? ? ? ?0.380000
> memcmp128 ? ? ? 0.090000 ? ? ? ?0.060000 ? ? ? ?0.090000 ? ? ? ?0.110000
> strcmp4 ? ? ? ? 1.860000 ? ? ? ?1.910000 ? ? ? ?3.530000 ? ? ? ?1.570000
> strcmp8 ? ? ? ? 0.960000 ? ? ? ?0.960000 ? ? ? ?1.170000 ? ? ? ?0.840000
> strcmp32 ? ? ? ?0.280000 ? ? ? ?0.250000 ? ? ? ?0.300000 ? ? ? ?0.270000
> strcmp128 ? ? ? 0.050000 ? ? ? ?0.050000 ? ? ? ?0.090000 ? ? ? ?0.070000
> strncmp4 ? ? ? ?1.740000 ? ? ? ?1.750000 ? ? ? ?3.790000 ? ? ? ?2.840000
> strncmp8 ? ? ? ?0.940000 ? ? ? ?0.850000 ? ? ? ?1.380000 ? ? ? ?1.380000
> strncmp32 ? ? ? 0.220000 ? ? ? ?0.220000 ? ? ? ?0.320000 ? ? ? ?0.400000
> strncmp128 ? ? ?0.050000 ? ? ? ?0.050000 ? ? ? ?0.090000 ? ? ? ?0.080000
>
>
> model name ? ? ?: Intel(R) Core(TM) i7 CPU ? ? ? ? 920 ?@ 2.67GHz
> cache size ? ? ?: 8192 KB
> flags ? ? ? ? ? : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm ida
>
> func,size ? ? ? 2.9-vanilla ? ? 2.10.1-vanilla ?2.11-vanilla ? ?2.11-amd
> strlen4 ? ? ? ? 3.440000 ? ? ? ?3.500000 ? ? ? ?2.780000 ? ? ? ?3.320000
> strlen8 ? ? ? ? 2.260000 ? ? ? ?1.750000 ? ? ? ?1.440000 ? ? ? ?2.220000
> strlen32 ? ? ? ?0.850000 ? ? ? ?0.500000 ? ? ? ?0.380000 ? ? ? ?0.900000
> strlen128 ? ? ? 0.470000 ? ? ? ?0.260000 ? ? ? ?0.200000 ? ? ? ?0.500000
> memcmp4 ? ? ? ? 2.180000 ? ? ? ?2.060000 ? ? ? ?2.500000 ? ? ? ?1.840000
> memcmp8 ? ? ? ? 1.100000 ? ? ? ?1.050000 ? ? ? ?1.320000 ? ? ? ?1.060000
> memcmp32 ? ? ? ?0.270000 ? ? ? ?0.260000 ? ? ? ?0.350000 ? ? ? ?0.330000
> memcmp128 ? ? ? 0.080000 ? ? ? ?0.070000 ? ? ? ?0.090000 ? ? ? ?0.090000
> strcmp4 ? ? ? ? 1.660000 ? ? ? ?1.930000 ? ? ? ?2.250000 ? ? ? ?1.640000
> strcmp8 ? ? ? ? 0.830000 ? ? ? ?0.970000 ? ? ? ?1.140000 ? ? ? ?0.840000
> strcmp32 ? ? ? ?0.210000 ? ? ? ?0.240000 ? ? ? ?0.240000 ? ? ? ?0.210000
> strcmp128 ? ? ? 0.050000 ? ? ? ?0.070000 ? ? ? ?0.080000 ? ? ? ?0.060000
> strncmp4 ? ? ? ?1.740000 ? ? ? ?1.830000 ? ? ? ?2.490000 ? ? ? ?2.570000
> strncmp8 ? ? ? ?0.870000 ? ? ? ?0.920000 ? ? ? ?1.220000 ? ? ? ?1.300000
> strncmp32 ? ? ? 0.220000 ? ? ? ?0.230000 ? ? ? ?0.260000 ? ? ? ?0.320000
> strncmp128 ? ? ?0.050000 ? ? ? ?0.050000 ? ? ? ?0.090000 ? ? ? ?0.080000
>
>
> ?* numbers after function names indicate string sizes
> ?** 2.11-amd is very old AMD-provided x86_64 string routines patch
> (it doesn't implement some of the new things like bounded pointers
> checks support) that we still use in SUSE glibc:
>
> ? ? ? ?http://pasky.or.cz/~pasky/dev/glibc/amd64-string-2.11.diff
>
> If the regression against 2.10.1 is fixed, it is probably not very
> interesting, it performs better only at very short memcmp()s.)
>
> ?*** I can't seem to find newer AMD processors to test on right now,
> sorry. If you have any, feel free to run the benchmark there - just
> get the /strbench/ directory and run `./strbench.sh outfile`.
>
> ?Kind regards,
>
> --
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Petr "Pasky" Baudis
> A lot of people have my books on their bookshelves.
> That's the problem, they need to read them. -- Don Knuth
>
--
H.J.