Bug 25860

Summary:	stress-ng tsearch regressed ~14% with upgrade glibc from 2.29 to 2.30
Product:	glibc	Reporter:	yu.ma
Component:	libc	Assignee:	Not yet assigned to anyone <unassigned>
Status:	RESOLVED INVALID
Severity:	normal	CC:	adhemerval.zanella, drepper.fsp, fw
Priority:	P2	Flags:	fw: security-
Version:	2.30
Target Milestone:	---
Host:	cascade lake	Target:	clearlinux
Build:	32580	Last reconfirmed:	2020-04-21 00:00:00

Description yu.ma 2020-04-21 02:59:08 UTC

Tried on intel clx clearlinux, stress-ng-1.2.2 in phoronitx-test-suite regressed ~14% with upgrade glibc from 2.29 to 2.30.

Comment 1 Florian Weimer 2020-04-21 09:53:52 UTC

I do not think there have been any tsearch changes between the two versions.

Have you discussed this with the Clear Linux developers?

Comment 2 yu.ma 2020-04-28 06:15:45 UTC

we verified it is not related to clear linux local patches as for the regression point, there is no local patches merged, only change is glibc upstream upgrade...

Comment 3 Adhemerval Zanella 2020-04-28 13:04:18 UTC

The stress-ng from phoronix-test-suite has multiple options that stress different implementations:

Stress-NG 0.11.07:
    pts/stress-ng-1.3.0
    System Test Configuration
        1:  CPU Stress
        2:  Crypto
        3:  Memory Copying
        4:  Glibc Qsort Data Sorting
        5:  Glibc C String Functions
        6:  Vector Math
        7:  Matrix Math
        8:  Forking
        9:  System V Message Passing
        10: Semaphores
        11: Socket Activity
        12: Context Switching
        13: Atomic
        14: CPU Cache
        15: Malloc
        16: MEMFD
        17: MMAP
        18: NUMA
        19: RdRand
        20: SENDFILE


Which one are you seeing regressions and with a profiling which glibc symbols does it stress?

Comment 4 yu.ma 2020-04-29 00:40:15 UTC

it is stress-ng-1.2.2	sub Test: Tsearch	unit:Bogo Ops/s

Comment 5 Adhemerval Zanella 2020-04-29 16:23:09 UTC

Assuming you are evaluating with default phronix testsuite options (-t 30 --metrics-brief --cpu 0 --tsearch 0) it seems an issue with scheduling pressure in fact.  On 2.29 running 3 times I see different results:

stress-ng: info:  [386425] dispatching hogs: 8 cpu, 8 tsearch
stress-ng: info:  [386425] successful run completed in 30.08s
stress-ng: info:  [386425] stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
stress-ng: info:  [386425]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [386425] cpu               30473     30.02    115.61      0.01      1014.98       263.56
stress-ng: info:  [386425] tsearch            1614     30.03    117.01      0.00        53.75        13.79

stress-ng: info:  [390680] dispatching hogs: 8 cpu, 8 tsearch
stress-ng: info:  [390680] successful run completed in 30.10s
stress-ng: info:  [390680] stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
stress-ng: info:  [390680]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [390680] cpu               31081     30.04    118.73      0.00      1034.82       261.78
stress-ng: info:  [390680] tsearch            1747     30.03    118.68      0.10        58.18        14.71

stress-ng: info:  [390726] dispatching hogs: 8 cpu, 8 tsearch
stress-ng: info:  [390726] successful run completed in 30.06s
stress-ng: info:  [390726] stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
stress-ng: info:  [390726]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [390726] cpu               31284     30.02    118.59      0.01      1042.11       263.78
stress-ng: info:  [390726] tsearch            1668     30.02    118.49      0.08        55.55        14.07

And binding with --taskset 0,1,2,3,4,5,6,7, the resulting seems more predictable:

stress-ng: info:  [391052] dispatching hogs: 8 cpu, 8 tsearch 
stress-ng: info:  [391052] successful run completed in 30.07s 
stress-ng: info:  [391052] stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
stress-ng: info:  [391052]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [391052] cpu               31143     30.02    118.17      0.03      1037.31       263.48
stress-ng: info:  [391052] tsearch            1700     30.03    118.58      0.05        56.62        14.33

stress-ng: info:  [391102] dispatching hogs: 8 cpu, 8 tsearch 
stress-ng: info:  [391102] successful run completed in 30.09s 
stress-ng: info:  [391102] stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
stress-ng: info:  [391102]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [391102] cpu               30881     30.03    117.65      0.06      1028.19       262.35
stress-ng: info:  [391102] tsearch            1698     30.03    118.87      0.07        56.55        14.28

And using the same options with biding on 2.30 shows no significant performance difference:

stress-ng: info:  [391146] dispatching hogs: 8 cpu, 8 tsearch
stress-ng: info:  [391146] successful run completed in 30.09s
stress-ng: info:  [391146] stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
stress-ng: info:  [391146]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [391146] cpu               31133     30.04    118.06      0.04      1036.39       263.62
stress-ng: info:  [391146] tsearch            1735     30.05    119.09      0.11        57.74        14.56

And profiling does not show any significant output difference:

x86_64-linux-gnu-2.29$ perf report --stdio -d libc.so
[...]
# Overhead  Command          Symbol                           
# ........  ...............  .................................
#
    10.88%  stress-ng-tsear  [.] __tfind
    10.58%  stress-ng-tsear  [.] __tdelete
     7.31%  stress-ng-tsear  [.] maybe_split_for_insert.isra.0
     5.70%  stress-ng-tsear  [.] __tsearch

x86_64-linux-gnu-2.30$ perf report --stdio -d libc.so
[...]
# Overhead  Command          Symbol
# ........  ...............  .................................
# 
    10.90%  stress-ng-tsear  [.] __tfind
    10.52%  stress-ng-tsear  [.] __tdelete
     7.38%  stress-ng-tsear  [.] maybe_split_for_insert.isra.0
     5.81%  stress-ng-tsear  [.] __tsearch

I don't think there is a regression here.

Comment 6 yu.ma 2020-04-30 05:41:39 UTC

here is the default command of tsearch in PTS, not binding with any CPU set:
./stress-ng-clr -t 30 --metrics-brief --tsearch 0 

and it will initiate as many threads as cpu total numbers

Comment 7 Florian Weimer 2020-05-04 06:56:08 UTC

Again, we did not change tsearch at all between the two releases.

Have you verified that the changed performance is not the result of instruction alignment differences?