Load balance sockets with new SO_REUSEPORT_LB option
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.
Most of the code was copied from a similar patch for DragonflyBSD.
However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.
Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.
Limitations
As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).
Add 32-bit compat for ioctls that take struct ifgroupreq.
Use an accessor to access ifgr_group and ifgr_groups.
Use an macro CASE_IOC_IFGROUPREQ(cmd) in place of case statements such
as "case SIOCAIFGROUP:". This avoids poluting the switch statements
with large numbers of #ifdefs.
brooks [Fri, 30 Mar 2018 18:50:13 +0000 (18:50 +0000)]
Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size). This is believed to be sufficent to
fully support ifconfig on 32-bit systems.
jeff [Thu, 29 Mar 2018 02:54:50 +0000 (02:54 +0000)]
Implement several enhancements to NUMA policies.
Add a new "interleave" allocation policy which stripes pages across
domains with a stride or width keeping contiguity within a multi-page
region.
Move the kernel to the dedicated numbered cpuset #2 making it possible
to assign kernel threads and memory policy separately from user. This
also eliminates the need for the complicated interrupt binding code.
Add a sysctl API for viewing and manipulating domainsets. Refactor some
of the cpuset_t manipulation code using the generic bitset type so that
it can be used for both. This probably belongs in a dedicated subr file.
brooks [Tue, 27 Mar 2018 18:26:50 +0000 (18:26 +0000)]
Fix access to ifru_buffer on freebsd32.
Make all kernel accesses to ifru_buffer go via access functions
which take the process ABI into account and use an appropriate union
to access members in the correct place in struct ifreq.
kib [Tue, 27 Mar 2018 15:29:32 +0000 (15:29 +0000)]
Allow to specify PCP on packets not belonging to any VLAN.
According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be
considered as untagged, and only PCP and DEI values from the VLAN tag
are meaningful. See for instance
https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html.
Make it possible to specify PCP value for outgoing packets on an
ethernet interface. When PCP is supplied, the tag is appended, VLAN
id set to 0, and PCP is filled by the supplied value. The code to do
VLAN tag encapsulation is refactored from the if_vlan.c and moved into
if_ethersubr.c.
Drivers might have issues with filtering VID 0 packets on
receive. This bug should be fixed for each driver.
jtl [Thu, 22 Mar 2018 09:40:08 +0000 (09:40 +0000)]
Add the "TCP Blackbox Recorder"
which we discussed at the developer summits at BSDCan and BSDCam in 2017.
The TCP Blackbox Recorder allows you to capture events on a TCP connection
in a ring buffer. It stores metadata with the event. It optionally stores
the TCP header associated with an event (if the event is associated with a
packet) and also optionally stores information on the sockets.
It supports setting a log ID on a TCP connection and using this to correlate
multiple connections that share a common log ID.
You can log connections in different modes. If you are doing a coordinated
test with a particular connection, you may tell the system to put it in
mode 4 (continuous dump). Or, if you just want to monitor for errors, you
can put it in mode 1 (ring buffer) and dump all the ring buffers associated
with the connection ID when we receive an error signal for that connection
ID. You can set a default mode that will be applied to a particular ratio
of incoming connections. You can also manually set a mode using a socket
option.
This commit includes only basic probes. rrs@ has added quite an abundance
of probes in his TCP development work. He plans to commit those soon.
There are user-space programs which we plan to commit as ports. These read
the data from the log device and output pcapng files, and then let you
analyze the data (and metadata) in the pcapng files.
brooks [Fri, 16 Mar 2018 22:23:04 +0000 (22:23 +0000)]
Add _IOC_NEWLEN() and _IOC_NEWTYPE() macros.
These macros take an existing ioctl(2) command and replace the length
with the specified length or length of the specified type respectively.
These can be used to define commands for 32-bit compatibility with fewer
opportunities for cut-and-paste errors then a whole new definition.
pkelsey [Mon, 26 Feb 2018 02:53:22 +0000 (02:53 +0000)]
This is an implementation of the client side of TCP Fast Open (TFO)
[RFC7413]. It also includes a pre-shared key mode of operation in which
the server requires the client to be in possession of a shared secret in
order to successfully open TFO connections with that server.
The names of some existing fastopen sysctls have changed (e.g.,
net.inet.tcp.fastopen.enabled -> net.inet.tcp.fastopen.server_enable).
ae@FreeBSD.org [Fri, 15 Dec 2017 12:37:32 +0000 (12:37 +0000)]
Follow the RFC6980 and silently ignore following IPv6 NDP messages
that had the IPv6 fragmentation header:
o Neighbor Solicitation
o Neighbor Advertisement
o Router Solicitation
o Router Advertisement
o Redirect
Introduce M_FRAGMENTED mbuf flag, and set it after IPv6 fragment reassembly
is completed. Then check the presence of this flag in correspondig ND6
handling routines.
pfg [Mon, 27 Nov 2017 15:01:59 +0000 (15:01 +0000)]
sys/sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
pfg [Mon, 20 Nov 2017 19:45:28 +0000 (19:45 +0000)]
include: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
pfg [Mon, 20 Nov 2017 19:43:44 +0000 (19:43 +0000)]
sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
kib [Tue, 7 Nov 2017 09:46:26 +0000 (09:46 +0000)]
Use hardware timestamps to report packet timestamps
for SO_TIMESTAMP and other similar socket options.
Provide new control message SCM_TIME_INFO to supply information about
timestamp. Currently it indicates that the timestamp was
hardware-assisted and high-precision, for software timestamps the
message is not returned. Reserved fields are added to ABI to report
additional info about it, it is expected that raw hardware clock value
might be useful for some applications.
kib [Tue, 7 Nov 2017 09:29:14 +0000 (09:29 +0000)]
Add a place for a driver to report rx timestamps
in nanoseconds from boot for the received packets.
The rcv_tstmp field overlaps the place of Ln header length indicators,
not used by received packets. The basic pkthdr rearrangement change
in sys/mbuf.h was provided by gallatin.
There are two accompanying M_ flags: M_TSTMP means that there is the
timestamp (and it was generated by hardware).
Another flag M_TSTMP_HPREC indicates that the timestamp is
high-precision. Practically M_TSTMP_HPREC means that hardware
provided additional precision comparing with the stamps when the flag
is not set. E.g., for ConnectX all packets are stamped by hardware
when PCIe transaction to write out the completion descriptor is
performed, but PTP packet are stamped on port. For Intel cards, when
PTP assist is enabled, only PTP packets are stamped in the limited
number of registers, so if Intel cards ever start support this
mechanism, they would always set M_TSTMP | M_TSTMP_HPREC if hardware
timestamp is present for the given packet.
Add IFCAP_HWRXTSTMP interface capability to indicate the support for
hardware rx timestamping, and ifconfig(8) command to toggle it.
Based on the patch by: gallatin
Reviewed by: gallatin (previous version), hselasky
Sponsored by: Mellanox Technologies
MFC after: 2 weeks (? mbuf KBI issue)
X-Differential revision: https://reviews.freebsd.org/D12638
if: Add ioctls to get RSS key and hash type/function.
It will be needed by hn(4) to configure its RSS key and hash
type/function in the transparent VF mode in order to match VF's
RSS settings. The description of the transparent VF mode and
the RSS hash value issue are here:
https://svnweb.freebsd.org/base?view=revision&revision=322299
https://svnweb.freebsd.org/base?view=revision&revision=322485
These are generic enough to promise two independent IOCs instead
of abusing SIOCGDRVSPEC.
Setting RSS key and hash type/function is a different story,
which probably requires more discussion.
Comment about UDP_{IPV4,IPV6,IPV6_EX} were only in the patch
in the review request; these hash types are standardized now.
kib [Sat, 24 Jun 2017 17:01:11 +0000 (17:01 +0000)]
Implement address space guards.
Guard, requested by the MAP_GUARD mmap(2) flag, prevents the reuse of
the allocated address space, but does not allow instantiation of the
pages in the range. It is useful for more explicit support for usual
two-stage reserve then commit allocators, since it prevents accidental
instantiation of the mapping, e.g. by mprotect(2).
Use guards to reimplement stack grow code. Explicitely track stack
grow area with the guard, including the stack guard page. On stack
grow, trivial shift of the guard map entry and stack map entry limits
makes the stack expansion. Move the code to detect stack grow and
call vm_map_growstack(), from vm_fault() into vm_map_lookup().
As result, it is impossible to get random mapping to occur in the
stack grow area, or to overlap the stack guard page.
glebius [Thu, 8 Jun 2017 21:30:34 +0000 (21:30 +0000)]
Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
fields that belong to normal dataflow, and unionize them. This
shrinks the structure a bit.
- Take out selinfo's from the socket buffers into the socket. The
first reason is to support braindamaged scenario when a socket is
added to kevent(2) and then listen(2) is cast on it. The second
reason is that there is future plan to make socket buffers pluggable,
so that for a dataflow socket a socket buffer can be changed, and
in this case we also want to keep same selinfos through the lifetime
of a socket.
- Remove struct struct so_accf. Since now listening stuff no longer
affects struct socket size, just move its fields into listening part
of the union.
- Provide sol_upcall field and enforce that so_upcall_set() may be called
only on a dataflow socket, which has buffers, and for listening sockets
provide solisten_upcall_set().
o Remove ACCEPT_LOCK() global.
- Add a mutex to socket, to be used instead of socket buffer lock to lock
fields of struct socket that don't belong to a socket buffer.
- Allow to acquire two socket locks, but the first one must belong to a
listening socket.
- Make soref()/sorele() to use atomic(9). This allows in some situations
to do soref() without owning socket lock. There is place for improvement
here, it is possible to make sorele() also to lock optionally.
- Most protocols aren't touched by this change, except UNIX local sockets.
See below for more information.
o Reduce copy-and-paste in kernel modules that accept connections from
listening sockets: provide function solisten_dequeue(), and use it in
the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
infiniband, rpc.
o UNIX local sockets.
- Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
local sockets. Most races exist around spawning a new socket, when we
are connecting to a local listening socket. To cover them, we need to
hold locks on both PCBs when spawning a third one. This means holding
them across sonewconn(). This creates a LOR between pcb locks and
unp_list_lock.
- To fix the new LOR, abandon the global unp_list_lock in favor of global
unp_link_lock. Indeed, separating these two locks didn't provide us any
extra parralelism in the UNIX sockets.
- Now call into uipc_attach() may happen with unp_link_lock hold if, we
are accepting, or without unp_link_lock in case if we are just creating
a socket.
- Another problem in UNIX sockets is that uipc_close() basicly did nothing
for a listening socket. The vnode remained opened for connections. This
is fixed by removing vnode in uipc_close(). Maybe the right way would be
to do it for all sockets (not only listening), simply move the vnode
teardown from uipc_detach() to uipc_close()?
imp [Tue, 28 Feb 2017 23:42:47 +0000 (23:42 +0000)]
Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
ed@FreeBSD.org [Wed, 3 Aug 2016 06:33:04 +0000 (06:33 +0000)]
mprotect(): Change prototype to comply to POSIX.
Our mprotect() function seems to take a "const void *" address to the
pages whose permissions need to be adjusted. POSIX uses "void *". Simply
stick to the POSIX one to prevent us from writing unportable code.
kib [Sun, 28 Feb 2016 17:52:33 +0000 (17:52 +0000)]
Implement process-shared locks support
for libthr.so.3, without breaking the ABI. Special value is stored in
the lock pointer to indicate shared lock, and offline page in the shared
memory is allocated to store the actual lock.
Reviewed by: vangyzen (previous version)
Discussed with: deischen, emaste, jhb, rwatson,
Martin Simmons <martin@lispworks.com>
Tested by: pho
Sponsored by: The FreeBSD Foundation
jhb [Thu, 4 Jun 2015 19:41:15 +0000 (19:41 +0000)]
Add a new file operations hook for mmap
operations. File type-specific logic is now placed in the mmap hook
implementation rather than requiring it to be placed in
sys/vm/vm_mmap.c. This hook allows new file types to support mmap() as
well as potentially allowing mmap() for existing file types that do not
currently support any mapping.
The vm_mmap() function is now split up into two functions. A new
vm_mmap_object() function handles the "back half" of vm_mmap() and accepts
a referenced VM object to map rather than a (handle, handle_type) tuple.
vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a
a VM object and then calling vm_mmap_object() to handle the actual mapping.
The vm_mmap() function remains for use by other parts of the kernel
(e.g. device drivers and exec) but now only supports mapping vnodes,
character devices, and anonymous memory.
The mmap() system call invokes vm_mmap_object() directly with a NULL object
for anonymous mappings. For mappings using a file descriptor, the
descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is
responsible for performing type-specific checks and adjustments to
arguments as well as possibly modifying mapping parameters such as flags
or the object offset. The fo_mmap() hook routines then call
vm_mmap_object() to handle the actual mapping.
The fo_mmap() hook is optional. If it is not set, then fo_mmap() will
fail with ENODEV. A fo_mmap() hook is implemented for regular files,
character devices, and shared memory objects (created via shm_open()).
While here, consistently use the VM_PROT_* constants for the vm_prot_t
type for the 'prot' variable passed to vm_mmap() and vm_mmap_object()
as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines.
Previously some places were using the mmap()-specific PROT_* constants
instead. While this happens to work because PROT_xx == VM_PROT_xx,
using VM_PROT_* is more correct.
to add type-specific information to struct kinfo_file. - Move the
various fill_*_info() methods out of kern_descrip.c and into the various
file type implementations. - Rework the support for kinfo_ofile to
generate a suitable kinfo_file object for each file and then convert
that to a kinfo_ofile structure rather than keeping a second, different
set of code that directly manipulates type-specific file information. -
Remove the shm_path() and ksem_info() layering violations.
kib [Thu, 19 Jun 2014 05:00:39 +0000 (05:00 +0000)]
Add MAP_EXCL flag for mmap(2).
It should be combined with MAP_FIXED, and prevents the request from
deleting existing mappings in the region, failing instead.
Reviewed by: alc
Discussed with: jhb
Tested by: markj, pho (previous version, as part of the bigger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
to request that a mapping use an address in the first 2GB of the
process's address space. This flag should have the same semantics as the
same flag on Linux.
To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address. While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.
kib [Wed, 21 Aug 2013 17:45:00 +0000 (17:45 +0000)]
Implement read(2)/write(2) and neccessary lseek(2)
for posix shmfd. Add MAC framework entries for posix shm read and write.
Do not allow implicit extension of the underlying memory segment past
the limit set by ftruncate(2) by either of the syscalls. Read and
write returns short i/o, lseek(2) fails with EINVAL when resulting
offset does not fit into the limit.
Discussed with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
Sebastian Huber [Thu, 23 Aug 2018 10:13:02 +0000 (12:13 +0200)]
Add __nl_item to <sys/_types.h> and use it
Add __nl_item to <sys/_types.h> for FreeBSD compatibility. Use it in
<langinfo.h> and the Cygwin <nl_types.h>. Make the enum __nl_item in
<langinfo.h> anonymous.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Sebastian Huber [Mon, 20 Aug 2018 11:20:06 +0000 (13:20 +0200)]
RTEMS: Add __tls_get_addr() to crt0
Add __tls_get_addr() for all targets to crt0. This is not only used on
ARM. In particular, it is used on RISC-V. This helps to adequately
support the GCC libgomp.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de
Replacing page_size() with allocation_granularity() was incorrect.
The values returned by get_mem_values() are # of pages of size
page_size(). Multiplying with allocation_granularity() here
results in values 16 times too big.
Masamichi Hosoda [Thu, 16 Aug 2018 00:18:50 +0000 (09:18 +0900)]
Fix strtod ("nan") and strtold ("nan") returns wrong negative NaN
The definition of qNaN for x86_64 and i386 was wrong.
strto{d|ld} ("nan") returned wrong negative NaN
instead of correct positive NaN
since it used the wrong definition.
On the other hand, strtof ("nan") returns correct positive NaN
since it uses nanf ("") instead of the wrong definition.
This commit makes strto{d|ld} ("nan") uses {nan|nanl} ("")
like strtof ("nan") using.
So strto{d|ld} ("nan") returns positive NaN.
Wilco Dijkstra [Thu, 16 Aug 2018 10:39:35 +0000 (10:39 +0000)]
Improve sincosf comments
Improve comments in sincosf implementation to make the code easier
to understand. Rename the constant pi64 to pi63 since it's actually
PI * 2^-63. Add comments for fields of sincos_t structure. Add comments
describing implementation details to reduce_fast.
Keep the denormal-operand exception masked; modify FE_ALL_EXCEPT accordingly.
By excluding the denormal-operand exception from FE_ALL_EXCEPT, it will not
be possible anymore to UNmask this exception by means of the API defined by
/usr/include/fenv.h
Note: terminology has changed since IEEE Std 854-1987; denormalized numbers
are called subnormal numbers nowadays.
This modification has basically been motivated by the fact that it is also
not possible on Linux to manipulate the denormal-operand exception by means
of the interface as defined by /usr/include/fenv.h. This has been the state
of affairs on Linux since 2001 (Andreas Jaeger).
The exceptions required by the standard (IEEE Std 754), in case they can be
supported by the implementation, are:
FE_INEXACT, FE_UNDERFLOW, FE_OVERFLOW, FE_DIVBYZERO and FE_INVALID.
Although it is allowed to define additional exceptions, there is no reason
to support the "denormal-operand exception" in this case (fenv.h), because
the subnormal numbers can be handled almost as fast the normalized numbers
by the hardware of the x86/x86_64 architecture. Said differently, a reason
to trap on the input of subnormal numbers does not exist. At least that is
what William Kahan and others at Intel asserted around 2000.
(that is William Kahan of the K-C-S draft, the precursor to the standard)
This commit modifies winsup/cygwin/include/fenv.h as follows:
- redefines FE_ALL_EXCEPT from 0x3f to 0x3d
- removes the definition for FE_DENORMAL
- introduces __FE_DENORM (0x2) (enum in Linux also uses __FE_DENORM)
- introduces FE_ALL_EXCEPT_X86 (0x3f), i.e. ALL x86/x86_64 FP exceptions
wordexp uses fprintf in a dangerous way. It uses an unchecked
input string as format string, rather than as parameter to a %s.
Replace fprintf with fputs.
fnstenv MUST be followed by fldenv in fegetenv(), as the former disables all
exceptions in the x87 FPU, which is not appropriate here (fegetenv() ).
fldenv after fnstenv should reload the x87 FPU w/ the configuration that was
saved by fnstenv, i.e. a configuration that might have exceptions enabled.
Note: x86_64 uses SSE for floating-point, not the x87 FPU. However, because
feraiseexcept() attempts to provoke an exception using the x87 FPU, the bug
in fegetenv() will make this attempt futile here (x86_64).
Note: WoW uses the x87 FPU for floating-point, not SSE. Here anything that
would normally result in triggering an exception, not only feraiseexcept(),
will not be able to, as result of the bug in fegetenv().
pfg [Sun, 21 Jan 2018 20:27:47 +0000 (20:27 +0000)]
Define a new __alloc_size2 attribute to complement the exiting support.
At least on GCC7 calling __alloc_size(x) twice is not equivalent to
calling using the attribute once with two arguments. The later is the
documented use in GCC documentation so add a new alloc_size(n, x)
alternative to cover for the few places where it is used: basically:
calloc(3), reallocarray(3) and mallocarray(9).
Submitted by: Mark Millard
MFC after: 3 days
Reference:
http://docs.freebsd.org/cgi/mid.cgi?F227842D-6BE2-4680-82E7-07906AF61CD7
pfg [Mon, 20 Nov 2017 19:43:44 +0000 (19:43 +0000)]
sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
ed@FreeBSD.org [Mon, 28 Aug 2017 09:35:17 +0000 (09:35 +0000)]
Make _Static_assert() work with GCC in older C++ standards.
GCC only activates C11 keywords in C mode, not C++ mode. This means
that when targeting an older C++ standard, we cannot fall back to using
_Static_assert(). In this case, do define _Static_assert() as a macro
that uses a typedef'ed array.
Sebastian Huber [Fri, 27 Jul 2018 07:16:53 +0000 (09:16 +0200)]
ctype: Fix integer type for caseconv_entry::delta
The commit 46ba1675c457324b0eeef4670a09101ef3f34c50 accidently changed a
bit-field from signed to unsigned. The caseconv_entry::delta must be a
signed integer, see also "newlib/libc/ctype/caseconv.t".
Unfortunately, a standard GCC/Newlib build is done without
-Wsign-conversion. Using this warning option would have helped to avoid
this bug:
caseconv.t:2:22: warning: unsigned conversion from 'int' to 'unsigned int:17' changes value from '-32' to '131040' [-Wsign-conversion]
{0x0061, 25, TOUP, -32},
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Mark Geisert [Tue, 24 Jul 2018 05:31:59 +0000 (22:31 -0700)]
POSIX Asynchronous I/O support: other files
Updates to misc files to integrate AIO into the Cygwin source tree.
Much of it has to be done when adding any new syscalls. There are
some updates to limits.h for AIO-specific limits. And some doc mods.
Mark Geisert [Tue, 24 Jul 2018 05:31:58 +0000 (22:31 -0700)]
POSIX Asynchronous I/O support: fhandler files
This code is where the AIO implementation is wired into existing Cygwin
mechanisms for file and device I/O: the fhandler* functions. It makes
use of an existing internal routine prw_open to supply a "shadow fd"
that permits asynchronous operations on a file the user app accesses
via its own fd. This allows AIO to read or write at arbitrary locations
within a file without disturbing the app's file pointer. (This was
already the case with normal pread|pwrite; we're just adding "async"
to the mix.)
Mark Geisert [Tue, 24 Jul 2018 05:31:57 +0000 (22:31 -0700)]
POSIX Asynchronous I/O support: aio files
This is the core of the AIO implementation: aio.cc and aio.h. The
latter is used within the Cygwin DLL by aio.cc and the fhandler* modules,
as well as by user programs wanting the AIO functionality.
Ken Brown [Mon, 23 Jul 2018 14:10:03 +0000 (10:10 -0400)]
getfacl and setfacl: Align with Linux
Make getfacl print two colons instead of one after "other" and "mask".
Change the help text for setfacl to indicate that there can be either
one colon or two.
strcmp.S: Improve performance for misaligned strings
Replace the simple byte-wise compare in the misaligned case with a
dword compare with page boundary checks in place. For simplicity I've
chosen a 4K page boundary so that we don't have to query the actual
page size on the system.
This results in up to 3x improvement in performance in the unaligned
case on falkor and about 2.5x improvement on mustang as measured using
bench-strcmp in glibc.
This improved memcmp provides a fast path for compares up to 16 bytes
and then compares 16 bytes at a time, thus optimizing loads from both
sources. The glibc memcmp microbenchmark retains performance (with an
error of ~1ns) for smaller compare sizes and reduces up to 31% of
execution time for compares up to 4K on the APM Mustang. On Qualcomm
Falkor this improves to almost 48%, i.e. it is almost 2x improvement
for sizes of 2K and above.