The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.54 ULP with inlined fma and fma contraction. It uses
a 4 KB lookup table in addition to the table in exp_data.c, on aarch64
.text+.rodata size of libm.a is increased by 2295 bytes.
Improvements on Cortex-A72:
latency: 3.3x
thruput: 4.9x
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.547 ULP with inlined fma and fma contraction. It uses
a 1 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1584 bytes.
Note that the math.h header defines log2(x) to be log(x)/Ln2, this is
not changed, so the new code is only used if that macro is suppressed.
Improvements on Cortex-A72:
latency: 2.0x
thruput: 2.2x
The new implementations are provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.519 ULP with inlined fma and fma contraction. It uses
a 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1703 bytes. The w_log.c wrapper is disabled since error handling is
inline in the new code.
New __HAVE_FAST_FMA and __HAVE_FAST_FMA_DEFAULT feature macros were
added to enable selecting between the code path that uses fma and the
one that does not. Targets supposed to set __HAVE_FAST_FMA_DEFAULT
if they have single instruction fma and the compiler can actually
inline it (gcc has __FP_FAST_FMA macro but that does not guarantee
inlining with -fno-builtin-fma).
Improvements on Cortex-A72:
latency: 1.9x
thruput: 2.3x
The new implementations are provided under !__OBSOLETE_MATH, they use
ISO C99 code. There are several settings, with the default one the
worst case error in nearest rounding mode is 0.509 ULP for exp and
0.507 ULP for exp2 when a multiply and add is contracted into an fma.
They use a shared 2 KB lookup table, on aarch64 .text+.rodata size
of libm.a is increased by 1868 bytes. The w_*.c wrappers are disabled
for the new code as it takes care of error handling inline.
The old exp2(x) code used to be just pow(2,x) so the speedup there
is more significant.
The file name has no special prefix to avoid any name collision with
existing files.
Improvements on Cortex-A72:
exp latency: 3.2x
exp thruput: 4.1x
exp2 latency: 7.8x
exp2 thruput: 18.8x
This change is equivalent to the commit
c65db17340
and only affects code that is from the Arm optimized-routines project.
It does not affect the observable behaviour, but the code generation
can be different on 64bit targets. The intention is to make the
portable semantics of the code obvious by using a fixed size type.
* (mkcategories): Fix a bug that outputs incorrect Unicode category
table for code point ranges.
* (categories.t): Rebuild it using the bug-fixed mkcategories.
This fixes the problem reported in the following post.
https://cygwin.com/ml/cygwin/2018-06/msg00248.html
* fhandler.h (class fhandler_socket_inet): Add variable bool oobinline.
* fhandler_socket_inet.cc (fhandler_socket_inet::fhandler_socket_inet):
Initialize variable oobinline.
(fhandler_socket_inet::recv_internal): Make the handling of OOB data
as consistent with POSIX as possible. Add simulation of inline mode
for OOB data as a workaround for broken winsock behavior.
(fhandler_socket_inet::setsockopt): Ditto.
(fhandler_socket_inet::getsockopt): Ditto.
(fhandler_socket_wsock::ioctl): Fix return value of SIOCATMARK command.
The return value of SIOCATMARK of winsock is almost opposite to
expectation.
* fhandler_socket_local.cc (fhandler_socket_local::recv_internal):
Remove the handling of OOB data from AF_LOCAL domain socket. Operation
related to OOB data will result in an error like Linux does.
(fhandler_socket_local::sendto): Ditto.
(fhandler_socket_local::sendmsg): Ditto.
This fixes the issue reported in following post.
https://cygwin.com/ml/cygwin/2018-06/msg00143.html
Here is the correct patch with both filenames and int cast fixed:
This patch is a complete rewrite of sinf, cosf and sincosf. The new version
is significantly faster, as well as simple and accurate.
The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all
4 billion inputs. In non-nearest rounding modes the error is 1ULP.
The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction. The code uses approximate integer
comparisons to quickly decide between these cases - on some targets this may
be slow, so this can be configured to use floating point comparisons.
The small range reducer uses a single reduction step to handle values up to
120.0. It is fastest on targets which support inlined round instructions.
The large range reducer uses integer arithmetic for simplicity. It does a
32x96 bit multiply to compute a 64-bit modulo result. This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4. It could be further optimized, however it is
already much faster than necessary.
Simple benchmark showing speedup factor on AArch64 for various ranges:
range 0.7853982 sinf 1.7 cosf 2.2 sincosf 2.8
range 1.570796 sinf 1.9 cosf 1.9 sincosf 2.7
range 3.141593 sinf 2.0 cosf 2.0 sincosf 3.5
range 6.283185 sinf 2.3 cosf 2.3 sincosf 4.2
range 125.6637 sinf 2.9 cosf 3.0 sincosf 5.1
range 1.1259e15 sinf 26.8 cosf 26.8 sincosf 45.2
ChangeLog:
2018-05-18 Wilco Dijkstra <wdijkstr@arm.com>
* newlib/libm/common/Makefile.in: Regenerated.
* newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c
sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS.
* newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND,
roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float,
eval_as_double, likely, unlikely.
* newlib/libm/common/cosf.c: New file.
* newlib/libm/common/sinf.c: Likewise.
* newlib/libm/common/sincosf.h: Likewise.
* newlib/libm/common/sincosf.c: Likewise.
* newlib/libm/common/sincosf_data.c: Likewise.
* newlib/libm/math/sf_cos.c: Add #if to build conditionally.
* newlib/libm/math/sf_sin.c: Likewise.
* newlib/libm/math/wf_sincos.c: Likewise.
--
This patch is a complete rewrite of sinf, cosf and sincosf. The new version
is significantly faster, as well as simple and accurate.
The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all
4 billion inputs. In non-nearest rounding modes the error is 1ULP.
The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction. The code uses approximate integer
comparisons to quickly decide between these cases - on some targets this may
be slow, so this can be configured to use floating point comparisons.
The small range reducer uses a single reduction step to handle values up to
120.0. It is fastest on targets which support inlined round instructions.
The large range reducer uses integer arithmetic for simplicity. It does a
32x96 bit multiply to compute a 64-bit modulo result. This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4. It could be further optimized, however it is
already much faster than necessary.
Simple benchmark showing speedup factor on AArch64 for various ranges:
range 0.7853982 sinf 1.7 cosf 2.2 sincosf 2.8
range 1.570796 sinf 1.9 cosf 1.9 sincosf 2.7
range 3.141593 sinf 2.0 cosf 2.0 sincosf 3.5
range 6.283185 sinf 2.3 cosf 2.3 sincosf 4.2
range 125.6637 sinf 2.9 cosf 3.0 sincosf 5.1
range 1.1259e15 sinf 26.8 cosf 26.8 sincosf 45.2
ChangeLog:
2018-06-18 Wilco Dijkstra <wdijkstr@arm.com>
* newlib/libm/common/Makefile.in: Regenerated.
* newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c
sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS.
* newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND,
roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float,
eval_as_double, likely, unlikely.
* newlib/libm/common/cosf.c: New file.
* newlib/libm/common/sinf.c: Likewise.
* newlib/libm/common/sincosf.h: Likewise.
* newlib/libm/common/sincosf.c: Likewise.
* newlib/libm/common/sincosf_data.c: Likewise.
* newlib/libm/math/sf_cos.c: Add #if to build conditionally.
* newlib/libm/math/sf_sin.c: Likewise.
* newlib/libm/math/wf_sincos.c: Likewise.
--
Previously, "test 1 2 3 -a -b -c" was permuted to "test -a -b -c 1 2 3",
but "test 1 2 3 -abc" was left as "test 1 2 3 -abc".
Signed-off-by: Thomas Kindler <mail+newlib@t-kindler.de>
Commit ebd645e on 2001-10-03 made environ.cc:_addenv() add unneeded
space at the end of the environment block to "work around problems
with some buggy applications." This clutters the code and is
presumably no longer needed.
Thanks to Ken Harris <Ken.Harris@mathworks.com> for the diagnosis.
When backing up tail to handle a "..", the code only checked that
it didn't underrun the destination buffer while removing path
components. It did *not* take into account that the first backslash
in the path had to be kept intact. Example path to trigger the
problem: "C:\A..\..\..\B'
Fix this by moving the dst pointer to the first backslash so subsequent
tests cannot underrun this position. Also make sure that we always
*have* a backslash.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Thanks to Ken Harris <Ken.Harris@mathworks.com> for the diagnosis
which led to a buffer underrun in this loop.
Revert before release.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
- when calculating a correction to align next brk to page boundary,
ensure that the correction is less than a page size
- if allocating the correction fails, ensure that the top size is
set to brk + sbrk_size (minus any front alignment made)
Signed-off-by: Jeff Johnston <jjohnstn@redhat.com>
When converting number of days since epoch (32-bits) to seconds,
calculations using 32-bit `long` overflow for years above 2038. Solve
this by casting number of days to `time_t` just before final
multiplication.
Signed-off-by: Freddie Chopin <freddie.chopin@gmail.com>
Previously, hw exception handler stub and interrupt handler stub for microbaze were unable to
be overwritten. Change to weak to fix this.
Signed-off-by: Ben Levinsky <ben.levinsky@xilinx.com>
GCC 7 is able to see straight through this trick, so use a more formal
method to avoid the warning.
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
- From: Cesar Philippidis <cesar@codesourcery.com>
Date: Tue, 10 Apr 2018 14:43:42 -0700
Subject: [PATCH] nvptx port
This port adds support for Nvidia GPU's, which are primarily used as
offload accelerators in OpenACC and OpenMP.
There are systems with a MaximumProcessorCount not
reflecting the actually available CPUs. The ActiveProcessorCount
is correct though. So we use ActiveProcessorCount rather than
MaximumProcessorCount per group to set group affinity correctly.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
The gdtoa implementation uses the type long, defined as Long, in lots
of code. For historical reason newlib defines Long as int32_t instead.
This works fine, as long as floating point exceptions are not enabled.
The conversion to 32 bit int can lead to a FE_INVALID situation.
Example:
const char *str = "121645100408832000.0";
char *ptr;
feenableexcept (FE_INVALID);
strtod (str, &ptr);
This leads to the following situation in strtod
double aadj;
Long L;
[...]
L = (Long)aadj;
For instance, on x86_64 the code here is
cvttsd2si %xmm0,%eax
At this point, aadj is 2529648000.0 in our example. The conversion to
32 bit %eax results in a negative int value, thus the conversion is
invalid. With feenableexcept (FE_INVALID), a SIGFPE is raised.
Fix this by always using 64 bit ints here if double is not a 32 bit type
to avoid this type of FP exceptions.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>