Bug 27651 - Performance regression after updating to 2.33
Summary: Performance regression after updating to 2.33
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.33
: P1 critical
Target Milestone: 2.34
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-25 22:17 UTC by Fulalas
Modified: 2021-04-28 10:21 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fulalas 2021-03-25 22:17:48 UTC
I tested this many times to make sure: after upgrading to 2.33 I noticed a 15% performance drop on 'Shadow of Tomb Raider' benchmark. Switching back to 2.32 (and nothing else!) the performance goes back to normal.

Although I'm on Slackware current, I also tried glibc 2.33 package from Arch distro, and the behavior is precisely the same.

I want to be clear about this: I didn't change anything else apart from glibc. Same drivers, same kernel, same everything! I'm 100% sure this is glibc's fault.

Not sure if this helps, but I tried to track down which files are causing the performance issue, and it turns out that it's something inside /lib64 folder. I believe it's one of the following files or the combination of them:
ld-2.33.so
libc-2.33.so
libdl-2.33.so
libpthread-2.33.so
libpthread-2.33.so
libthread_db-1.0.so
libutil-2.33

Thanks!
Comment 1 Adhemerval Zanella 2021-03-26 13:45:45 UTC
Could you provide a perf profile before and after update to glibc 2.33 to show where it might be the possible performance regression?

I guess that "'Shadow of Tomb Raider' benchmark" is a closed source binary not easily distributable, so unless we have more information through either how to reproduce (without actually buying the game) or with profile dumps this issue will be most likely ignored.
Comment 2 Fulalas 2021-03-27 00:58:16 UTC
I'll be glad to help! But I've never done a perf profile before. Would you mind telling me how I should do it?

BTW, there's free demo of 'Shadow Of The Tomb Raider', and it has a built-in benchmark, just like the full version.
Comment 3 Adhemerval Zanella 2021-03-29 12:36:10 UTC
The performance profile can be obtained by the 'perf' tool [1], which is supported and distributed by mostly of the distributions (it is a kernel project hosted within the linux kernel).

The most simple usage is the record/report, where you issue:

  $ perf record <program>

  $ perf report --stdio

The first command will generate a large file containing the performance data obtained by the kernel while the second will parse and dump to console a perf per sybmol cpu cycles spent (the default counter).

Doing it with both glibc 2.32 and glibc 2.33 we can have an idea of what is happening.

My wild guess is maybe something related to a memory routine selection, I need to check if something has changed on x86_64 side regarding it. 

[1] https://perf.wiki.kernel.org/index.php/Main_Page
Comment 4 Fulalas 2021-03-30 09:13:46 UTC
Thanks! Since this is the first time I'm doing this, please let me know if I did something wrong:

2.32:
https://www.mediafire.com/file/wkgrv36b5nsl2ac/2.32.txt

2.33:
https://www.mediafire.com/file/wkgrv36b5nsl2ac/2.32.txt
Comment 5 Fulalas 2021-03-30 09:42:56 UTC
Oops! I pasted the wrong link for 2.33, sorry. Here it goes:
https://www.mediafire.com/file/9lgaiskstinquxw/2.33.txt
Comment 6 Adhemerval Zanella 2021-03-30 21:08:26 UTC
Thanks, this was helpful.  It seems that on glibc 2.33 the binary is issuing a lof of more select calls:

     1.83%  ShadowOfTheTomb  libc-2.33.so                   [.] __select
     1.79%  ShadowOfTheTomb  libc-2.33.so                   [.] __libc_disable_asynccancel
     1.75%  ShadowOfTheTomb  libc-2.33.so                   [.] __libc_enable_asynccancel

Although I can't really tell from the date what is actually generating them.  Could you reran the 2.33 profile with the perf record '-g' options? It generated call-graph records that can show from where the __select is being called.
Comment 7 Fulalas 2021-03-31 08:56:14 UTC
Cool! :)

Here it goes: https://www.mediafire.com/file/oii46d32zyx9qoq/2.33-g.txt
Comment 8 Adhemerval Zanella 2021-03-31 12:32:09 UTC
I think I know what is happening here: the 2433d39b69743 changed select to use pselect6 for all architectures.  However, on architecture with __NR_select (x86_64 for instance) the kernel normalizes the passed timeout instead of returning an invalid one. For instance, the passed timeval { 0, 5000000 } is interpreted as { 5, 0 }.

Now that we use pselect we return EINVAL for such inputs.  The kernel normalization is not really portable (POSIX does not specify that the timeout should be normalized) and the code is subject to integer overflow (the tv_sec can overflow). Another issue is the normalization was done only for architecture that originally use __NR_select or __NR__newselect (which is the wrapper for __NR_select on architecture with compat mode, such as i686).

I think we need to keep doing the normalization, at least for architecture that used to define __NR_select/__NR__newselect.
Comment 9 Fulalas 2021-03-31 20:42:28 UTC
I can only understand some fragments of what you said. I hope there's a solution for the performance regression. And if I can help somehow, please let me know :)
Comment 10 Adhemerval Zanella 2021-03-31 20:46:49 UTC
If you could check whether the patch does help the performance regression it would be helpful.
Comment 11 Fulalas 2021-03-31 23:16:12 UTC
I'll be happy to test it! If I git clone https://sourceware.org/git/glibc.git am I going to have your patch?
Comment 12 Adhemerval Zanella 2021-04-01 19:12:14 UTC
It is not upstream yet, but I created a personal branch based on master with the fix applied [1].  If you could check this branch it would be helpful.

[1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/bz27651-select
Comment 13 Fulalas 2021-04-01 20:42:12 UTC
I'm happy to inform that your patch fixed the issue! Thank you very much! :)
Comment 14 Fulalas 2021-04-01 20:42:43 UTC
Fixed
Comment 15 cvs-commit@gcc.gnu.org 2021-04-12 22:01:17 UTC
The master branch has been updated by Adhemerval Zanella <azanella@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9d7c5cc38e58fb0923e88901f87174a511b61552

commit 9d7c5cc38e58fb0923e88901f87174a511b61552
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Wed Mar 31 13:53:34 2021 -0300

    linux: Normalize and return timeout on select (BZ #27651)
    
    The commit 2433d39b697, which added time64 support to select, changed
    the function to use __NR_pselect6 (or __NR_pelect6_time64) on all
    architectures.  However, on architectures where the symbol was
    implemented with __NR_select the kernel normalizes the passed timeout
    instead of return EINVAL.  For instance, the input timeval
    { 0, 5000000 } is interpreted as { 5, 0 }.
    
    And as indicated by BZ #27651, this semantic seems to be expected
    and changing it results in some performance issues (most likely
    the program does not check the return code and keeps issuing
    select with unormalized tv_usec argument).
    
    To avoid a different semantic depending whether which syscall the
    architecture used to issue, select now always normalize the timeout
    input.  This is a slight change for some ABIs (for instance aarch64).
    
    Checked on x86_64-linux-gnu and i686-linux-gnu.
Comment 16 Adhemerval Zanella 2021-04-13 13:11:00 UTC
Fixed on 2.33 and I will backport to 2.32 as well.
Comment 17 Adhemerval Zanella 2021-04-13 13:13:52 UTC
My confusion, I meant fixed on 2.34 and I will backport to 2.33.
Comment 18 cvs-commit@gcc.gnu.org 2021-04-13 21:03:18 UTC
The release/2.33/master branch has been updated by Adhemerval Zanella <azanella@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8380ca5833ef2a11bf0162f2290f4f8c85ce3b90

commit 8380ca5833ef2a11bf0162f2290f4f8c85ce3b90
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Wed Mar 31 13:53:34 2021 -0300

    linux: Normalize and return timeout on select (BZ #27651)
    
    The commit 2433d39b697, which added time64 support to select, changed
    the function to use __NR_pselect6 (or __NR_pelect6_time64) on all
    architectures.  However, on architectures where the symbol was
    implemented with __NR_select the kernel normalizes the passed timeout
    instead of return EINVAL.  For instance, the input timeval
    { 0, 5000000 } is interpreted as { 5, 0 }.
    
    And as indicated by BZ #27651, this semantic seems to be expected
    and changing it results in some performance issues (most likely
    the program does not check the return code and keeps issuing
    select with unormalized tv_usec argument).
    
    To avoid a different semantic depending whether which syscall the
    architecture used to issue, select now always normalize the timeout
    input.  This is a slight change for some ABIs (for instance aarch64).
    
    Checked on x86_64-linux-gnu and i686-linux-gnu.
    
    (cherry picked from commit 9d7c5cc38e58fb0923e88901f87174a511b61552)