* create new function __get_cpus_per_group to evaluate # of CPU groups
* Call from format_proc_cpuinfo and sched_getcpu
* Bump API minor version
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
fnstenv MUST be followed by fldenv in fegetenv(), as the former disables all
exceptions in the x87 FPU, which is not appropriate here (fegetenv() ).
fldenv after fnstenv should reload the x87 FPU w/ the configuration that was
saved by fnstenv, i.e. a configuration that might have exceptions enabled.
Note: x86_64 uses SSE for floating-point, not the x87 FPU. However, because
feraiseexcept() attempts to provoke an exception using the x87 FPU, the bug
in fegetenv() will make this attempt futile here (x86_64).
Note: WoW uses the x87 FPU for floating-point, not SSE. Here anything that
would normally result in triggering an exception, not only feraiseexcept(),
will not be able to, as result of the bug in fegetenv().
Introduce new host configuration variable "have_init_fini" which is set
to "yes" by default. Override it for RISC-V to "no".
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Restore FreeBSD compatibility for __alloc_size() and __alloc_align().
This is a follow-up to commit e494b56035.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
At least on GCC7 calling __alloc_size(x) twice is not equivalent to
calling using the attribute once with two arguments. The later is the
documented use in GCC documentation so add a new alloc_size(n, x)
alternative to cover for the few places where it is used: basically:
calloc(3), reallocarray(3) and mallocarray(9).
Submitted by: Mark Millard
MFC after: 3 days
Reference:
http://docs.freebsd.org/cgi/mid.cgi?F227842D-6BE2-4680-82E7-07906AF61CD7
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
GCC only activates C11 keywords in C mode, not C++ mode. This means
that when targeting an older C++ standard, we cannot fall back to using
_Static_assert(). In this case, do define _Static_assert() as a macro
that uses a typedef'ed array.
Discussed in: r322875 commit thread
Reported by: Mark MIllard
MFC after: 1 month
The previous version genenerated the following GCC note:
towctrans_l.c:44:1: note: offset of packed bit-field 'diff' has changed in GCC 4.4
caseconv_table [] = {
^~~~~~~~~~~~~~
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
The commit 46ba1675c4 accidently changed a
bit-field from signed to unsigned. The caseconv_entry::delta must be a
signed integer, see also "newlib/libc/ctype/caseconv.t".
Unfortunately, a standard GCC/Newlib build is done without
-Wsign-conversion. Using this warning option would have helped to avoid
this bug:
caseconv.t:2:22: warning: unsigned conversion from 'int' to 'unsigned int:17' changes value from '-32' to '131040' [-Wsign-conversion]
{0x0061, 25, TOUP, -32},
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Updates to misc files to integrate AIO into the Cygwin source tree.
Much of it has to be done when adding any new syscalls. There are
some updates to limits.h for AIO-specific limits. And some doc mods.
This code is where the AIO implementation is wired into existing Cygwin
mechanisms for file and device I/O: the fhandler* functions. It makes
use of an existing internal routine prw_open to supply a "shadow fd"
that permits asynchronous operations on a file the user app accesses
via its own fd. This allows AIO to read or write at arbitrary locations
within a file without disturbing the app's file pointer. (This was
already the case with normal pread|pwrite; we're just adding "async"
to the mix.)
This is the core of the AIO implementation: aio.cc and aio.h. The
latter is used within the Cygwin DLL by aio.cc and the fhandler* modules,
as well as by user programs wanting the AIO functionality.
Make getfacl print two colons instead of one after "other" and "mask".
Change the help text for setfacl to indicate that there can be either
one colon or two.
This prevents errors like this:
newlib/libc/ctype/categories.c:6:3: error: width of 'first' exceeds its type
unsigned int first: 24;
^
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Exotic RTEMS targets can define this back to int32_t as an exception if
there are good reasons.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Replace the simple byte-wise compare in the misaligned case with a
dword compare with page boundary checks in place. For simplicity I've
chosen a 4K page boundary so that we don't have to query the actual
page size on the system.
This results in up to 3x improvement in performance in the unaligned
case on falkor and about 2.5x improvement on mustang as measured using
bench-strcmp in glibc.
This improved memcmp provides a fast path for compares up to 16 bytes
and then compares 16 bytes at a time, thus optimizing loads from both
sources. The glibc memcmp microbenchmark retains performance (with an
error of ~1ns) for smaller compare sizes and reduces up to 31% of
execution time for compares up to 4K on the APM Mustang. On Qualcomm
Falkor this improves to almost 48%, i.e. it is almost 2x improvement
for sizes of 2K and above.
The mutually misaligned inputs on aarch64 are compared with a simple
byte copy, which is not very efficient. Enhance the comparison
similar to strcmp by loading a double-word at a time. The peak
performance improvement (i.e. 4k maxlen comparisons) due to this on
the strncmp microbenchmark in glibc is as follows:
falkor: 3.5x (up to 72% time reduction)
cortex-a73: 3.5x (up to 71% time reduction)
cortex-a53: 3.5x (up to 71% time reduction)
All mutually misaligned inputs from 16 bytes maxlen onwards show
upwards of 15% improvement and there is no measurable effect on the
performance of aligned/mutually aligned inputs.
Bug in current ARM64 WOW64: GetNativeSystemInfo returns
PROCESSOR_ARCHITECTURE_INTEL rather than PROCESSOR_ARCHITECTURE_ARM64.
Provide for this.
Make code better readable.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
The current SYS_EXIT has a bug that when making the call it always uses
the v2 calling convention. This is undefined behavior according to the
semihosting specification:
https://developer.arm.com/docs/100863/latest/semihosting-operations/sys_exit-0x18
This patch fixes it by making sure v1 passes the argument directly in the register instead
of in a block. And for v2 it does the same if the v2 extension isn't supported.
The sequence generated now is
12424: ebfffecd bl 11f60 <_has_ext_exit_extended>
12428: e3500000 cmp r0, #0
1242c: 11a0500d movne r5, sp
12430: 059d5000 ldreq r5, [sp]
12434: e1a00004 mov r0, r4
12438: e1a01005 mov r1, r5
1243c: ef00f000 svc 0x0000f000
Signed-off-by: Tamar Christina <tamar.christina@arm.com>
PREFER_FLOAT_COMPARISON setting was not correct as it could raise
spurious exceptions. Fixing it is easy: just use ISLESS(x, y) instead
of abstop12(x) < abstop12(y) with appropriate non-signaling definition
for ISLESS. However it seems this setting is not very useful (there is
only minor performance difference on various architectures), so remove
this option for now.
Guard the entire operation with the FastPebLock critical section
used by RtlSetCurrentDirectory_U as well, thus eliminating the
race between concurrent chdir/fchdir/SetCurrentDirectory calls.
Streamline comment explaining the fallback method.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Update to the latest versions of config.sub (2018-07-03) and
config.guess (2018-06-26).
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
* fhandler_socket_local.cc (get_inet_addr_local): Change type from
'static int' to 'int' to be callable from syslog.cc.
* syslog.cc (connect_syslogd): Use get_inet_addr_local() instead of
getsockname() to retrieve name information of the syslogd socket.
The !HAVE_FAST_FMA code path split r = z/c - 1 into r = rhi + rlo such
that when z = 1-tiny and c = 1 then rlo and rhi could have much larger
magnitude than r which later caused large rounding errors.
So do a nearest rounding instead of truncation at the split.
In newlib with default settings this was observable on some arm targets
that enable the new math code but has no fma.
The roundtoint and converttoint internal functions are only called with small
values, so 32 bit result is enough for converttoint and it is a signed int
conversion so the natural return type is int32_t.
The original idea was to help the compiler keeping the result in uint64_t,
then it's clear that no sign extension is needed and there is no accidental
undefined or implementation defined signed int arithmetics.
But it turns out gcc does a good job with inlining so changing the type has
no overhead and the semantics of the conversion is less surprising this way.
Since we want to allow the asuint64 (x + 0x1.8p52) style conversion, the top
bits were never usable and the existing code ensures that only the bottom
32 bits of the conversion result are used.
In newlib with default settings only aarch64 is affected and there is no
significant code generation change with gcc after the patch.
Synchronize code style and comments with Arm Optimized Routines, there
are no code changes in this patch. This ensures different projects using
the same code have consistent code style so bug fix patches can be applied
more easily.