Bug 30179 - Generic fmod and remainder are slower than x87 implementations
Summary: Generic fmod and remainder are slower than x87 implementations
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: math (show other bugs)
Version: 2.38
: P2 normal
Target Milestone: ---
Assignee: H.J. Lu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-27 16:35 UTC by H.J. Lu
Modified: 2023-02-27 16:35 UTC (History)
0 users

See Also:
Host:
Target: x86-64
Build:
Last reconfirmed:


Attachments
A testcase (747 bytes, application/octet-stream)
2023-02-27 16:35 UTC, H.J. Lu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2023-02-27 16:35:12 UTC
Created attachment 14722 [details]
A testcase

Performance of generic fmod and remainder highly depends on input values. Performance
of x87 (fprem/fprem1) implementations can be much faster.  On Intel Coffee Lake,

[hjl@gnu-cfl-3 fmod-2]$ make
gcc -O2   -c -o test.o test.c
gcc    -c -o x87.o x87.S
gcc -static -o x87 test.o x87.o
gcc -static -o sse test.o -lm
time ./sse
3.15user 0.00system 0:03.15elapsed 99%CPU (0avgtext+0avgdata 684maxresident)k
0inputs+0outputs (0major+39minor)pagefaults 0swaps
time ./x87
0.25user 0.00system 0:00.25elapsed 99%CPU (0avgtext+0avgdata 680maxresident)k
0inputs+0outputs (0major+37minor)pagefaults 0swaps
[hjl@gnu-cfl-3 fmod-2]$