Summary: | stress-ng tsearch regressed ~14% with upgrade glibc from 2.29 to 2.30 | ||
---|---|---|---|
Product: | glibc | Reporter: | yu.ma |
Component: | libc | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED INVALID | ||
Severity: | normal | CC: | adhemerval.zanella, drepper.fsp, fw |
Priority: | P2 | Flags: | fw:
security-
|
Version: | 2.30 | ||
Target Milestone: | --- | ||
Host: | cascade lake | Target: | clearlinux |
Build: | 32580 | Last reconfirmed: | 2020-04-21 00:00:00 |
Description
yu.ma
2020-04-21 02:59:08 UTC
I do not think there have been any tsearch changes between the two versions. Have you discussed this with the Clear Linux developers? we verified it is not related to clear linux local patches as for the regression point, there is no local patches merged, only change is glibc upstream upgrade... The stress-ng from phoronix-test-suite has multiple options that stress different implementations: Stress-NG 0.11.07: pts/stress-ng-1.3.0 System Test Configuration 1: CPU Stress 2: Crypto 3: Memory Copying 4: Glibc Qsort Data Sorting 5: Glibc C String Functions 6: Vector Math 7: Matrix Math 8: Forking 9: System V Message Passing 10: Semaphores 11: Socket Activity 12: Context Switching 13: Atomic 14: CPU Cache 15: Malloc 16: MEMFD 17: MMAP 18: NUMA 19: RdRand 20: SENDFILE Which one are you seeing regressions and with a profiling which glibc symbols does it stress? it is stress-ng-1.2.2 sub Test: Tsearch unit:Bogo Ops/s Assuming you are evaluating with default phronix testsuite options (-t 30 --metrics-brief --cpu 0 --tsearch 0) it seems an issue with scheduling pressure in fact. On 2.29 running 3 times I see different results: stress-ng: info: [386425] dispatching hogs: 8 cpu, 8 tsearch stress-ng: info: [386425] successful run completed in 30.08s stress-ng: info: [386425] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s stress-ng: info: [386425] (secs) (secs) (secs) (real time) (usr+sys time) stress-ng: info: [386425] cpu 30473 30.02 115.61 0.01 1014.98 263.56 stress-ng: info: [386425] tsearch 1614 30.03 117.01 0.00 53.75 13.79 stress-ng: info: [390680] dispatching hogs: 8 cpu, 8 tsearch stress-ng: info: [390680] successful run completed in 30.10s stress-ng: info: [390680] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s stress-ng: info: [390680] (secs) (secs) (secs) (real time) (usr+sys time) stress-ng: info: [390680] cpu 31081 30.04 118.73 0.00 1034.82 261.78 stress-ng: info: [390680] tsearch 1747 30.03 118.68 0.10 58.18 14.71 stress-ng: info: [390726] dispatching hogs: 8 cpu, 8 tsearch stress-ng: info: [390726] successful run completed in 30.06s stress-ng: info: [390726] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s stress-ng: info: [390726] (secs) (secs) (secs) (real time) (usr+sys time) stress-ng: info: [390726] cpu 31284 30.02 118.59 0.01 1042.11 263.78 stress-ng: info: [390726] tsearch 1668 30.02 118.49 0.08 55.55 14.07 And binding with --taskset 0,1,2,3,4,5,6,7, the resulting seems more predictable: stress-ng: info: [391052] dispatching hogs: 8 cpu, 8 tsearch stress-ng: info: [391052] successful run completed in 30.07s stress-ng: info: [391052] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s stress-ng: info: [391052] (secs) (secs) (secs) (real time) (usr+sys time) stress-ng: info: [391052] cpu 31143 30.02 118.17 0.03 1037.31 263.48 stress-ng: info: [391052] tsearch 1700 30.03 118.58 0.05 56.62 14.33 stress-ng: info: [391102] dispatching hogs: 8 cpu, 8 tsearch stress-ng: info: [391102] successful run completed in 30.09s stress-ng: info: [391102] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s stress-ng: info: [391102] (secs) (secs) (secs) (real time) (usr+sys time) stress-ng: info: [391102] cpu 30881 30.03 117.65 0.06 1028.19 262.35 stress-ng: info: [391102] tsearch 1698 30.03 118.87 0.07 56.55 14.28 And using the same options with biding on 2.30 shows no significant performance difference: stress-ng: info: [391146] dispatching hogs: 8 cpu, 8 tsearch stress-ng: info: [391146] successful run completed in 30.09s stress-ng: info: [391146] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s stress-ng: info: [391146] (secs) (secs) (secs) (real time) (usr+sys time) stress-ng: info: [391146] cpu 31133 30.04 118.06 0.04 1036.39 263.62 stress-ng: info: [391146] tsearch 1735 30.05 119.09 0.11 57.74 14.56 And profiling does not show any significant output difference: x86_64-linux-gnu-2.29$ perf report --stdio -d libc.so [...] # Overhead Command Symbol # ........ ............... ................................. # 10.88% stress-ng-tsear [.] __tfind 10.58% stress-ng-tsear [.] __tdelete 7.31% stress-ng-tsear [.] maybe_split_for_insert.isra.0 5.70% stress-ng-tsear [.] __tsearch x86_64-linux-gnu-2.30$ perf report --stdio -d libc.so [...] # Overhead Command Symbol # ........ ............... ................................. # 10.90% stress-ng-tsear [.] __tfind 10.52% stress-ng-tsear [.] __tdelete 7.38% stress-ng-tsear [.] maybe_split_for_insert.isra.0 5.81% stress-ng-tsear [.] __tsearch I don't think there is a regression here. here is the default command of tsearch in PTS, not binding with any CPU set: ./stress-ng-clr -t 30 --metrics-brief --tsearch 0 and it will initiate as many threads as cpu total numbers Again, we did not change tsearch at all between the two releases. Have you verified that the changed performance is not the result of instruction alignment differences? |