Szabolcs Nagy [Wed, 4 Jul 2018 10:09:39 +0000 (11:09 +0100)]
Change the return type of converttoint and document the semantics
The roundtoint and converttoint internal functions are only called with small
values, so 32 bit result is enough for converttoint and it is a signed int
conversion so the natural return type is int32_t.
The original idea was to help the compiler keeping the result in uint64_t,
then it's clear that no sign extension is needed and there is no accidental
undefined or implementation defined signed int arithmetics.
But it turns out gcc does a good job with inlining so changing the type has
no overhead and the semantics of the conversion is less surprising this way.
Since we want to allow the asuint64 (x + 0x1.8p52) style conversion, the top
bits were never usable and the existing code ensures that only the bottom
32 bits of the conversion result are used.
In newlib with default settings only aarch64 is affected and there is no
significant code generation change with gcc after the patch.
Szabolcs Nagy [Tue, 3 Jul 2018 11:54:36 +0000 (12:54 +0100)]
Fix code style and comments of new math code
Synchronize code style and comments with Arm Optimized Routines, there
are no code changes in this patch. This ensures different projects using
the same code have consistent code style so bug fix patches can be applied
more easily.
Our local ntsecapi.h wrapper corrects a bug in the definition of
SystemFunction036 which otherwise leads to crashes on 32 bit when
using RtlGenRandom. The fhandler_socket_local.cc file accidentally
included the incorrect w32api version of that file, rather than the
local wrapper. Fix it.
Alexandre Oliva [Sat, 30 Jun 2018 02:49:28 +0000 (23:49 -0300)]
Introduce @unless/@endunless and postbootstrap Makefile targets
This patch turns dependencies of non-bootstrap targets on bootstrap
targets for bootstrap builds into dependencies on stage_last. This
arrangement gets stage1-bubble to run from stage_last if we haven't
started a bootstrap yet, and to use the current stage otherwise. This
was already the case of target libs, just not of non-bootstrapped host
modules.
In order to retain preexisting dependencies in non-bootstrap builds,
or in gcc-less builds, this introduces support for @unless/@endunless
pairs in Makefile.in.
There is a remaining possibility of problem if activating, in a tree
configured for bootstrap, a parallel build of two or more modules, at
least one bootstrapped and one not. In this case, make might decide
to build stage_current and stage_last in parallel, the latter will
start a submake to build stage1 while the initial make, having
satisfied stage_current, proceeds to build the bootstrapped module in
non-bootstrapped configurations. The two builds will overlap and will
likely conflict. This situation does NOT arise in normal settings,
however: a post-bootstrap build of all-host all-target will indeed
activate such targets concurrently, but only after building all
bootstrapped modules successfully, and it will have both stage_last
and stage_current targets already satisfied, so the potential race
between builds will not arise.
Another remaining problem, that is slightly expanded with this patch,
is that of an interrupted build in a tree configured for bootstrap,
continued with a non-bootstrapped target. Target modules that were
not bootstrapped would already fail to complete the current stage when
activated explicitly in the command line for a retry; host modules,
however, would attempt to build their bootstrapped dependencies, which
is what led to the problem of concurrent builds addressed with this
patch. An interrupted or failed build might still recover correctly,
if the non-bootstrapped target is activated in both builds, because
then make will remove stage_last when its build command is
interrupted, so that it will attempt to recreate it with stage1-bubble
in the second try. A bootstrap build, however, will not be attempting
to build stage_last, so the file will remain and the retry won't go
through stage1-bubble. We have lived with that for target modules, so
we can probably live with that for host modules too.
Another undesirable consequence of this change is that non-boostrapped
host modules, in a tree configured for bootstrap, when activated as
make all-<module>, will build all of stage1 instead of only the
module's usual dependencies. This is intentional and necessary to fix
the parallel-build problem. If it's not desirable, disabling the
unnecessary bootstrap configuration will suffice to restore the
original set of dependencies.
for ChangeLog
* configure.ac: Introduce support for @unless/@endunless.
* Makefile.tpl (dep-kind): Rewrite with cond; return
postbootstrap in some cases.
(make-postboot-dep, postboot-targets): New.
(dependencies): Do not output postbootstrap dependencies at
first. Output non-target ones changed for configure to depend
on stage_last @if gcc-bootstrap, and the original deps @unless
gcc-bootstrap.
* configure.in, Makefile.in: Rebuilt.
Corinna Vinschen [Fri, 29 Jun 2018 13:29:36 +0000 (15:29 +0200)]
Cygwin: tape: Handle non-standard "no medium" error code
Certain tape drives (known example: QUANTUM_ULTRIUM-HH6) return
the non-standard ERROR_NOT_READY rather than ERROR_NO_MEDIA_IN_DRIVE
if no media is present. ERROR_NOT_READY is not documented as valid
return code from GetTapeStatus. Without handling this error code
Cygwin's tape code can't report an offline state to user space.
Fix this by converting ERROR_NOT_READY to ERROR_NO_MEDIA_IN_DRIVE
where appropriate.
Add a debug_printf to mtinfo_drive::get_status to allow requesting
user info without having to rebuild the DLL.
Corinna Vinschen [Wed, 27 Jun 2018 15:56:59 +0000 (17:56 +0200)]
Cygwin: Implement pthread_tryjoin_np and pthread_timedjoin_np
- Move pthread_join to thread.cc to have all `join' calls in
the same file (pthread_timedjoin_np needs pthread_convert_abstime
which is static inline in thread.cc)
- Bump API version
Szabolcs Nagy [Tue, 26 Jun 2018 15:25:12 +0000 (16:25 +0100)]
New pow implementation
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.54 ULP with inlined fma and fma contraction. It uses
a 4 KB lookup table in addition to the table in exp_data.c, on aarch64
.text+.rodata size of libm.a is increased by 2295 bytes.
Improvements on Cortex-A72:
latency: 3.3x
thruput: 4.9x
Szabolcs Nagy [Tue, 26 Jun 2018 15:06:54 +0000 (16:06 +0100)]
New log2 implementation
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.547 ULP with inlined fma and fma contraction. It uses
a 1 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1584 bytes.
Note that the math.h header defines log2(x) to be log(x)/Ln2, this is
not changed, so the new code is only used if that macro is suppressed.
Improvements on Cortex-A72:
latency: 2.0x
thruput: 2.2x
Szabolcs Nagy [Tue, 26 Jun 2018 14:27:50 +0000 (15:27 +0100)]
New log implementation
The new implementations are provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.519 ULP with inlined fma and fma contraction. It uses
a 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1703 bytes. The w_log.c wrapper is disabled since error handling is
inline in the new code.
New __HAVE_FAST_FMA and __HAVE_FAST_FMA_DEFAULT feature macros were
added to enable selecting between the code path that uses fma and the
one that does not. Targets supposed to set __HAVE_FAST_FMA_DEFAULT
if they have single instruction fma and the compiler can actually
inline it (gcc has __FP_FAST_FMA macro but that does not guarantee
inlining with -fno-builtin-fma).
Improvements on Cortex-A72:
latency: 1.9x
thruput: 2.3x
Szabolcs Nagy [Fri, 22 Jun 2018 17:12:26 +0000 (18:12 +0100)]
New exp and exp2 implementations
The new implementations are provided under !__OBSOLETE_MATH, they use
ISO C99 code. There are several settings, with the default one the
worst case error in nearest rounding mode is 0.509 ULP for exp and
0.507 ULP for exp2 when a multiply and add is contracted into an fma.
They use a shared 2 KB lookup table, on aarch64 .text+.rodata size
of libm.a is increased by 1868 bytes. The w_*.c wrappers are disabled
for the new code as it takes care of error handling inline.
The old exp2(x) code used to be just pow(2,x) so the speedup there
is more significant.
The file name has no special prefix to avoid any name collision with
existing files.
Szabolcs Nagy [Mon, 25 Jun 2018 16:39:27 +0000 (17:39 +0100)]
Use uint32_t sign argument to math error functions
This change is equivalent to the commit
https://github.com/ARM-software/optimized-routines/commit/c65db17340782d647c49e17cbba244862dc38402
and only affects code that is from the Arm optimized-routines project.
It does not affect the observable behaviour, but the code generation
can be different on 64bit targets. The intention is to make the
portable semantics of the code obvious by using a fixed size type.
Takashi Yano [Mon, 25 Jun 2018 04:34:47 +0000 (13:34 +0900)]
Fix Unicode table.
* (mkcategories): Fix a bug that outputs incorrect Unicode category
table for code point ranges.
* (categories.t): Rebuild it using the bug-fixed mkcategories.
This fixes the problem reported in the following post.
https://cygwin.com/ml/cygwin/2018-06/msg00248.html
Takashi Yano [Thu, 21 Jun 2018 14:11:49 +0000 (23:11 +0900)]
Fix the handling of out-of-band (OOB) data in a socket.
* fhandler.h (class fhandler_socket_inet): Add variable bool oobinline.
* fhandler_socket_inet.cc (fhandler_socket_inet::fhandler_socket_inet):
Initialize variable oobinline.
(fhandler_socket_inet::recv_internal): Make the handling of OOB data
as consistent with POSIX as possible. Add simulation of inline mode
for OOB data as a workaround for broken winsock behavior.
(fhandler_socket_inet::setsockopt): Ditto.
(fhandler_socket_inet::getsockopt): Ditto.
(fhandler_socket_wsock::ioctl): Fix return value of SIOCATMARK command.
The return value of SIOCATMARK of winsock is almost opposite to
expectation.
* fhandler_socket_local.cc (fhandler_socket_local::recv_internal):
Remove the handling of OOB data from AF_LOCAL domain socket. Operation
related to OOB data will result in an error like Linux does.
(fhandler_socket_local::sendto): Ditto.
(fhandler_socket_local::sendmsg): Ditto.
This fixes the issue reported in following post.
https://cygwin.com/ml/cygwin/2018-06/msg00143.html
Wilco Dijkstra [Wed, 20 Jun 2018 12:07:22 +0000 (12:07 +0000)]
Improve performance of sinf/cosf/sincosf
Here is the correct patch with both filenames and int cast fixed:
This patch is a complete rewrite of sinf, cosf and sincosf. The new version
is significantly faster, as well as simple and accurate.
The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all
4 billion inputs. In non-nearest rounding modes the error is 1ULP.
The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction. The code uses approximate integer
comparisons to quickly decide between these cases - on some targets this may
be slow, so this can be configured to use floating point comparisons.
The small range reducer uses a single reduction step to handle values up to
120.0. It is fastest on targets which support inlined round instructions.
The large range reducer uses integer arithmetic for simplicity. It does a
32x96 bit multiply to compute a 64-bit modulo result. This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4. It could be further optimized, however it is
already much faster than necessary.
Simple benchmark showing speedup factor on AArch64 for various ranges:
range 0.7853982 sinf 1.7 cosf 2.2 sincosf 2.8
range 1.570796 sinf 1.9 cosf 1.9 sincosf 2.7
range 3.141593 sinf 2.0 cosf 2.0 sincosf 3.5
range 6.283185 sinf 2.3 cosf 2.3 sincosf 4.2
range 125.6637 sinf 2.9 cosf 3.0 sincosf 5.1
range 1.1259e15 sinf 26.8 cosf 26.8 sincosf 45.2
Wilco Dijkstra [Mon, 18 Jun 2018 17:59:37 +0000 (17:59 +0000)]
Improve performance of sinf/cosf/sincosf
This patch is a complete rewrite of sinf, cosf and sincosf. The new version
is significantly faster, as well as simple and accurate.
The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all
4 billion inputs. In non-nearest rounding modes the error is 1ULP.
The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction. The code uses approximate integer
comparisons to quickly decide between these cases - on some targets this may
be slow, so this can be configured to use floating point comparisons.
The small range reducer uses a single reduction step to handle values up to
120.0. It is fastest on targets which support inlined round instructions.
The large range reducer uses integer arithmetic for simplicity. It does a
32x96 bit multiply to compute a 64-bit modulo result. This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4. It could be further optimized, however it is
already much faster than necessary.
Simple benchmark showing speedup factor on AArch64 for various ranges:
range 0.7853982 sinf 1.7 cosf 2.2 sincosf 2.8
range 1.570796 sinf 1.9 cosf 1.9 sincosf 2.7
range 3.141593 sinf 2.0 cosf 2.0 sincosf 3.5
range 6.283185 sinf 2.3 cosf 2.3 sincosf 4.2
range 125.6637 sinf 2.9 cosf 3.0 sincosf 5.1
range 1.1259e15 sinf 26.8 cosf 26.8 sincosf 45.2
Ken Brown [Wed, 6 Jun 2018 15:45:57 +0000 (11:45 -0400)]
Cygwin: Remove workaround in environ.cc
Commit ebd645e on 2001-10-03 made environ.cc:_addenv() add unneeded
space at the end of the environment block to "work around problems
with some buggy applications." This clutters the code and is
presumably no longer needed.
Thanks to Ken Harris <Ken.Harris@mathworks.com> for the diagnosis.
When backing up tail to handle a "..", the code only checked that
it didn't underrun the destination buffer while removing path
components. It did *not* take into account that the first backslash
in the path had to be kept intact. Example path to trigger the
problem: "C:\A..\..\..\B'
Fix this by moving the dst pointer to the first backslash so subsequent
tests cannot underrun this position. Also make sure that we always
*have* a backslash.
Jeff Johnston [Fri, 25 May 2018 03:53:15 +0000 (23:53 -0400)]
Fix issue with malloc_extend_top
- when calculating a correction to align next brk to page boundary,
ensure that the correction is less than a page size
- if allocating the correction fails, ensure that the top size is
set to brk + sbrk_size (minus any front alignment made)
Signed-off-by: Jeff Johnston <jjohnstn@redhat.com>
Freddie Chopin [Tue, 15 May 2018 18:58:08 +0000 (20:58 +0200)]
Fix 32-bit overflow in mktime() when time_t is 64-bits long
When converting number of days since epoch (32-bits) to seconds,
calculations using 32-bit `long` overflow for years above 2038. Solve
this by casting number of days to `time_t` just before final
multiplication.
There are systems with a MaximumProcessorCount not
reflecting the actually available CPUs. The ActiveProcessorCount
is correct though. So we use ActiveProcessorCount rather than
MaximumProcessorCount per group to set group affinity correctly.
At this point, aadj is 2529648000.0 in our example. The conversion to
32 bit %eax results in a negative int value, thus the conversion is
invalid. With feenableexcept (FE_INVALID), a SIGFPE is raised.
Fix this by always using 64 bit ints here if double is not a 32 bit type
to avoid this type of FP exceptions.
Corinna Vinschen [Sun, 18 Mar 2018 19:46:43 +0000 (20:46 +0100)]
Cygwin: AF_UNIX: Redesign various aspects
* Change set_socket_type/get_socket_type to virtual methods
* Move various variables into af_unix_shmem_t
* Change sun_name_t to match new usage pattern
* Move shut_state definition and add a name for the 0 value
* Allow marking packet as administrative packet. This allows
filtering out info packets exchange between peers and tweak
data accordingly.
* Rename send_my_name to send_sock_info and send credentials
if not called from bind (so the socket was already connected)
* Handle SO_PASSCRED in setsockopt/getsockopt
* Add input size checking to setsockopt/getsockopt
* Use NT functions where appropriate
Corinna Vinschen [Sun, 18 Mar 2018 17:46:15 +0000 (18:46 +0100)]
Cygwin: AF_UNIX: Use spinlock rather than SRWLOCKs
We need to share socket info between threads *and* processes.
SRWLOCKs are single-process only, unfortunately. Provide a
sharable low-profile spinlock instead.
Hakan Lindqvist [Mon, 12 Mar 2018 13:55:01 +0000 (14:55 +0100)]
Reduce qsort stack consumption
Classical function call recursion wastes a lot of stack space.
Each recursion level requires a full stack frame comprising all
local variables and additional space as dictated by the
processor calling convention.
This implementation instead stores the variables that are unique
for each recursion level in a parameter stack array, and uses
iteration to emulate recursion. Function call recursion is not
used until the array is full.
To ensure the stack consumption isn't worsened by this design, the
size of the parameter stack array is chosen to be similar to the
stack frame excluding the array. Each function call recursion level
can handle 8 iterative recursion levels.
Stack consumption will worsen when sorting tiny arrays that do not
need recursion (of 6 elements or less). It will be about equal for
up to 15 elements, and be an improvement for larger arrays. The best
case improvement is a stack size reduction down to about one quarter
of the stack consumption before the change.
A design where the parameter stack array is large enough for the
worst case recursion level was rejected because it would worsen
the stack consumption when sorting arrays smaller than about 1500
elements. The worst case is 31 levels on a 32-bit system.
A design with a dynamic parameter array size was rejected because
of limitations in some compilers.
Hakan Lindqvist [Mon, 12 Mar 2018 12:51:07 +0000 (13:51 +0100)]
Ensure qsort recursion depth is bounded
The qsort algorithm splits the input array in three parts. The
left and right parts may need further sorting. One of them is
sorted by recursion, the other by iteration. This update ensures
that it is the smaller part that is chosen for recursion.
By choosing the smaller part, each recursion level will handle
less than half the array of the previous recursion level. Hence
the recursion depth is bounded to be less than log2(n) i.e. 1
level per significant bit in the array size n.
The update also includes code comments explaining the algorithm.
Richard Earnshaw [Thu, 15 Mar 2018 09:55:11 +0000 (09:55 +0000)]
[arm] Fix syscalls.c for newlib embedded syscalls builds
Newlib has a build configuration where syscalls can be directly
embedded in the newlib library rather than relying on libgloss.
This configuration was broken recently by an update to the libgloss
support for Arm that was not propagated to the syscalls interface in
newlib itself. This patch restores the build. It's essentially a
copy of https://sourceware.org/ml/newlib/2018/msg00128.html but there
are some other minor cleanups and changes that I've made at the same
time. None of those cleanups affect functionality.
The prototypes of the following functions have been updated: _link,
_sbrk, _getpid, _write, _swiwrite, _lseek, _swilseek, _read and
_swiread.
Signed-off-by: Richard Earnshaw <Richard.Earnshaw@arm.com>
Corinna Vinschen [Wed, 14 Mar 2018 09:36:34 +0000 (10:36 +0100)]
ctype: align size of category bit fields to small targets needs
E.g. arm ABI requires -fshort-enums for bare-metal toolchains.
Given there are only 29 category enums, the compiler chooses an
8 bit enum type, so a size of 11 bits for the bitfield leads to
a compile time error:
error: width of 'cat' exceeds its type
enum category cat: 11;
^~~
Fix this by aligning the size of the category members to byte
borders.
Thomas Wolff [Tue, 13 Mar 2018 17:26:19 +0000 (18:26 +0100)]
fix/enhance Unicode table generation scripts
Scripts do not try to acquire Unicode data by best-effort magic anymore.
Options supported:
-h for help
-i to copy Unicode data from /usr/share/unicode/ucd first
-u to download Unicode data from unicode.org first
If (despite of -i or -u if given) the necessary Unicode files are not
available locally, table generation is skipped, but no error code is
returned, so not to obstruct the build process if called from a Makefile.
Corinna Vinschen [Wed, 14 Mar 2018 09:36:34 +0000 (10:36 +0100)]
ctype: align size of category bit fields to small targets needs
E.g. arm ABI requires -fshort-enums for bare-metal toolchains.
Given there are only 29 category enums, the compiler chooses an
8 bit enum type, so a size of 11 bits for the bitfield leads to
a compile time error:
error: width of 'cat' exceeds its type
enum category cat: 11;
^~~
Fix this by aligning the size of the category members to byte
borders.
Corinna Vinschen [Mon, 12 Mar 2018 14:26:12 +0000 (15:26 +0100)]
Cygwin: AF_UNIX: store per-socket info in shared memory
Per-socket info in fhandler isn't correctly shared between multiple
instances of th same descriptor. Implement a basic shared info which
is shared between all instances of a socket.
This also requires to move the fhandler_socket status bits into
fhandler_socket_wsock since the data is moved to the shared region
for AF_UNIX sockets.
Also, drop backing file requirement for socketpair server socket.
This will be handled differently in recvmsg/sendmsg.
Thomas Wolff [Fri, 9 Mar 2018 12:30:33 +0000 (13:30 +0100)]
use generated character data
The tow* functions use an included case conversion table which can be
generated from Unicode data.
The isw* functions use a character categories table (provided by
categories.c) which can be generated from Unicode data.
Delegation between current-locale and specific-locale-dependent functions
was reverted towards the generic locale-dependent functions (*_l.c);
this is however only relevant on systems with non-Unicode wide character
locales, thus not on Cygwin.
Thomas Wolff [Sun, 25 Feb 2018 15:30:27 +0000 (16:30 +0100)]
generated character category data, Unicode 10.0
Table categories.t and tag enumeration categories.cat provide
character class data for most of the isw* functions.
These data are generated from Unicode data.