Cygwin: next_unicode_mbs: return full char prefix from multibyte
Convert multibyte string to wint_t string and return the length
of the multibyte prefix constituting a full unicode char, taking
collating elements into account. Also return the resulting wint_t
string.
Allow to specify if the regex files in newlib/libc/posix are built
or not. Cygein is about to switch to MUSL regex, and the BSD regex
files collide with the definitions in MUSL's regex.h.
Take the opportunity to follow FreeBSD's and Linux's lead in recasting
macro inline code as calls to static inline functions. This allows the
macros to be type-safe. In addition, added a lower bound check to the
functions that use a cpu number to avoid a potential buffer underrun on
a bad argument. h/t to Corinna for the advice on recasting.
Takashi Yano [Sun, 5 Mar 2023 09:17:39 +0000 (18:17 +0900)]
Cygwin: ctty: Remove old 'kludge' code.
Remove old 'kludge' code which does not seem necessary anymore. The
comment of the 'kludge' is as follows.
* syscalls.cc (setsid): On second thought, in the spirit of keeping
things kludgy, set ctty to -2 here as a special flag, and...
(open): ...only eschew setting O_NOCTTY when that case is detected.
Takashi Yano [Sun, 5 Mar 2023 10:02:13 +0000 (19:02 +0900)]
Cygwin: ctty: Replace ctty constant with more descriptive macros.
This patch replaces ctty constants with more descriptive macros
(CTTY_UNINITIALIZED and CTTY_RELEASED) rather than -1 and -2 as
well as checking sign with CTTY_IS_VALID().
Fixes: 3b7df69aaa57 (Cygwin: ctty: Add comments for the special values: -1 and -2.) Suggested-by: Corinna Vinschen <corinna@vinschen.de> Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
Cygwin: glob: fix conversion from UTF-32 to multibyte
The conversion function g_Ctoc missed to drop the flag
values from the wint_t value. That wasn't noticable with
the original version because it used a 64 bit Char type
and the flags were in the upper 32 bit region. So the
flag values were silently dropped when wcrtomb was called.
After converting Char to wint_t, we have to do drop the
flags explicitely.
arm: Restrict processor mode change when in hypervisor mode
If a CPU implements EL2 as its highest exception level then programs
using newlib may start in hypervisor mode. In that state it is not
trivial to switch into the various EL1 modes to configure the
individual exception stacks, so do not try.
Cygwin: fnmatch: support collating symbols in [. .] brackets
This requires quite a few changes in how fnmatch operates.
It always operates on wint_t strings now, just like regex and glob,
and it always keeps a pointer on the character inside the string,
rather than operating on a single character.
As a result, just drop the ifdef's for Cygwin. The code is
non-portable now anyway...
Cygwin: mbsnrtowci: like mbsnrtowcs, just for wint_t
Deviation from standard: If the input is broken, the output will be
broken. I. e., we just copy the current byte over into the wint_t
destination and try to pick up on the next byte. This is in line
with the way fnmatch works.
Corinna Vinschen [Tue, 28 Feb 2023 15:45:52 +0000 (16:45 +0100)]
Cygwin: fnmatch: drop static variable
fnmatch calls fnmatch1 with a static mbstate_t. This breaks
calling fnmatch from multiple threads. Fix it by folding
fnmatch1 into fnmatch and moving all mbstates to local variables.
The comment that the first arg must be the pattern was added
during development, before it turned out that __wscollate_range_cmp
can be implemented in an order independent way.
Better explain why this function uses pointers to strings.
Corinna Vinschen [Sun, 26 Feb 2023 19:14:54 +0000 (20:14 +0100)]
Cygwin: fetch-lc-def-codesets-from-linux: fix locale name handling
As the former call to `locale -av' has the unwanted side effect
to shorten the locale name to <= 15 chars, don't use it. Use
`locale -a' instead and fetch the codeset from another call to
`locale' for each locale.
Corinna Vinschen [Sun, 26 Feb 2023 16:17:33 +0000 (17:17 +0100)]
Cygwin: locale: fix devanagari modifier
Effectively revert commit 57bac33359db. The fact that the
devanagari modifier was called devanagar (missing the trailing 'i')
is a result of `locale -av' shortening the locale name to a maximum
of 15 characters.
Corinna Vinschen [Sun, 26 Feb 2023 16:04:03 +0000 (17:04 +0100)]
Cygwin: introduce /proc/codesets and /proc/locales
So far locale(1) had to have knowledge how to construct, thus
duplicating the effort how Cygwin handles locale strings.
Move locale list and codeset list generation into Cygwin by
providing /proc/codesets and /proc/locales files. /proc/locales
does not list aliases, those are still handled in locale(1).
locale(1) opens the files and ueses that info for printing,
like any other application can do now.
Corinna Vinschen [Sat, 25 Feb 2023 15:11:54 +0000 (16:11 +0100)]
Cygwin: locale: Set default charset from Linux locale -> codeset mapping
Generate lc_def_codeset.h header containing the default mapping from
locale to codeset on Linux. Use this mapping in __set_charset_from_locale
in the first place.
For every locale not covered by this table, just map Windows codepages
to equivalent codesets used on Linux/Unix, getting rid of LCIDs entirely.
Corinna Vinschen [Sat, 25 Feb 2023 15:06:34 +0000 (16:06 +0100)]
Cygwin: locale: new script creating linux default codeset mapping
New script creating a mapping table from locale to default codeset
for this locale. We use that in Cygwin now to generate the own
default codeset mapping based on Linux locale names.
Previously, SNDCTL_DSP_POST and SNDCTL_DSP_SYNC were implemented
wrongly. Due to this issue, module-oss of pulseaudio generates
choppy sound when SNDCTL_DSP_POST is called. This patch fixes that.
Corinna Vinschen [Fri, 24 Feb 2023 15:37:44 +0000 (16:37 +0100)]
Cygwin: convert Windows locale handling from LCID to ISO5646 strings
Since Windows Vista, locale handling is converted from using numeric
locale identifiers (LCID) to using ISO5646 locale strings. In the
meantime Windows introduced new locales which don't even have a LCID
attached. Those were unusable in Cygwin because locale information
for these locales required to call the new locale functions taking
a locale string.
Convert Cygwin to drop LCIDs and use Windows ISO5646 locales instead.
The last place using LCIDs is the __set_charset_from_locale function.
Checking numerically is easier and uslay faster than checking strings.
However, this function is clearly a TODO
Corinna Vinschen [Wed, 22 Feb 2023 23:22:56 +0000 (00:22 +0100)]
Cygwin: locale(1): drop using LCID, use Windows locale names
LCIDs are deprecated since Windows Vista. Worse, lots of new locales
have been added in the meantime which have no LCID attached. They
are only available by locale name.
As first step, rearrange the locale(1) tool to use Windows locale
names, rather than LCIDs, so we can now enumerate *all* locales
available in more recent Windows versions.
Hau Hsu [Wed, 22 Feb 2023 01:59:07 +0000 (09:59 +0800)]
RISC-V: Use the new libm code if possible
Set __OBSOLETE_MATH_DEFAULT to 0 if 'd' extension is supported (i.e.
__riscv_flen == 64).
Base on the comment for __OBSOLETE_MATH_DEFAULT:
> ... it assumes that the toolchain has ISO C99 support (hexfloat
> literals, standard fenv semantics), the target has IEEE-754 conforming
> binary32 float and binary64 double (not mixed endian) representation,
> standard SNaN representation, double and single precision arithmetics
> has similar latency and it has no legacy SVID matherr support, only
> POSIX errno and fenv exception based error handling.
Most locales using latin characters ignore case while sorting.
This is what wcscoll does (correctly so). However, there's an
internal order of collating sequences compared to the base
character, which is case-sensitive, at least in GLibc.
There's no way to express this in Windows, because CompareString
and LCMapString *always* use case-insensitivity in those locales,
even if none of the *IGNORECASE sorting flags are used.
We want to follow glibc's behaviour more closely, so we add an
extra check for the case and make sure upper and lower cased
letters don't comapre as identical.
Corinna Vinschen [Wed, 22 Feb 2023 09:47:54 +0000 (10:47 +0100)]
Cygwin: glob: perform ignore_case_with_glob on input
Rather than converting single chars on the fly to lowercase
in case ignore_case_with_glob is set, perform the conversion
on the entire input (pattern and filenames).
Corinna Vinschen [Mon, 20 Feb 2023 21:47:17 +0000 (22:47 +0100)]
Cygwin: glob: convert internal character datatype to wint_t
uint_fast64_t doesn't allow easy string handling, so convert
the internal "Char" type to wint_t. Given that UTF-32 only
needs 21 bits, we're well off with 28 usable character bits.
Corinna Vinschen [Mon, 20 Feb 2023 21:38:41 +0000 (22:38 +0100)]
Cygwin: nlsfuncs.cc: introduce collating elements and helper functions
lc_collelem.h: autogenerated table of collating element, taken
from glibc
is_unicode_coll_elem: Check if a UTF-32 string is a collating element
next_unicode_char: return length of prefix from a string constituting
a complete character in the current locale, taking
collating elements into acocunt.
After FreeBSD eventually picked up the bugreport from within
only 5 years, rename __collate_range_cmp to __wcollate_range_cmp
as suggested all along, and make it type safe (wint_t instead of
wchar_t for hopefully obvious reasons...)
While at it, drop __collate_load_error and fix the checks for
it in glob and fnmatch.
Takashi Yano [Tue, 14 Feb 2023 13:55:10 +0000 (22:55 +0900)]
Cygwin: dsp: Fix SNDCTL_DSP_GET[IO]SPACE before read()/write().
Even with the commit 3a4c740f59c0, SNDCTL_DSP_GET[IO]SPACE ioctl()
does not return the fragment set by SNDCTL_DSP_SETFRAGMENT if it
is issued before read()/write(). This patch fixes the issue.
Corinna Vinschen [Wed, 15 Feb 2023 21:00:39 +0000 (22:00 +0100)]
Cygwin: is_unicode_equiv: implement Unicode equivalence class check
is_unicode_equiv compares two UTF-32 values and returns 1 if
both are member of the same Unicode equivalence class, 0 otherwise.
Note that this function only works with precomposed characters
per Unicode normalization form C. It doesn't handle decomposed
characters, just like its counterpart in glibc. I.e., equivalence
class comparison using decomposed chars won't work. Example:
Corinna Vinschen [Wed, 15 Feb 2023 13:11:45 +0000 (14:11 +0100)]
Cygwin: glob: handle named character classes
Handle [:<character-class>:] expressions in range brackets.
TODO: Collating symbols [.<collsym>'.] and Equivalence class
expressions [=<equiv-class>=] are recognized but skipped as if
they are not present at all.
Corinna Vinschen [Tue, 14 Feb 2023 19:22:54 +0000 (20:22 +0100)]
Cygwin: fnmatch: handle named character classes
Handle [:<character-class>:] expressions in range brackets.
TODO: Collating symbols [.<collsym>'.] and Equivalence class
expressions [=<equiv-class>=] are recognized but skipped as if
they are not present at all.
So far the input to __collate_range_cmp was handled as a wchar_t.
Change that to handle it as wint_t holding a UTF-32 value and
add creating surrogate pairs for the call to wcscoll.
Corinna Vinschen [Tue, 14 Feb 2023 11:20:20 +0000 (12:20 +0100)]
Cygwin: mbrtowi: define replacement for mbrtowc, returning UTF-32 value
Given how UTF-16 isn't capable to hold all Unicode chars in a single
wchar_t, we need a function returning a wint_t value representing
a UTF-32 value for comparison functions. Fortunately the important
wide character functions like towupper/towlower, isw<class>, iswctype,
etc, already take wint_t values and newlib handles them as UTF-32.
If only we had switched wchar_t to 32 bit way back when... sigh.
Corinna Vinschen [Sat, 11 Feb 2023 11:53:34 +0000 (12:53 +0100)]
Cygwin: cygcheck: fix dependency search
Spaces are filtered out by PathMatchSpecA so they can't
be used as pattern anchors. Overwrite all spaces with
commas and fix the search expresion accordingly.
Cygwin: get_posix_access: Make mode_t parameter mandatory
Avoid the mistake fixed in the preceeding commit by passing
the mode_t argument by reference. This also affects a couple
other functions calling get_posix_access in turn.
Cygwin: chmod: don't drop default ACEs from directory ACLs
commit bc444e5aa4ca introduced a call to get_posix_access()
with a NULL pointer for the mode_t parameter because the value
is not needed later on... entirely ignoring the fact that the
mode_t bits are checked for the object being a directory.
In turn, the get_posix_access() call never checked for default
ACEs and returned only the standard ACEs. Thus, every chmod call
on a directory dropped the default ACEs from its permissions, as
well as the default NULL deny-ACE used to store specific bits.
It got also impossible to set the sgid bit on directories.
Cygwin: mkdir: use correct default permissions filtered by umask
Older coreutils created directories with mode bits filtered through
umask. Newer coreutils creates directories with full permissions,
0777 by default.
This new coreutils behaviour uncovered the fact that default ACEs for
newly created directories were not filtered by umask starting with
commit bc444e5aa4ca.
setlocale: create LC_ALL string when changing locale
This patch is for the sake of gnulib.
gnulib implements some form of a thread-safe setlocale variant
called setlocale_null_r, which is supposed to return the locale
strings in a thread-safe manner. This only succeeds if the system's
setlocale already handles this thread-safe, otherwise gnulib adds
some locking on its own.
Newlib's setlocale always writes the global string array holding the
LC_ALL value anew on each invocation of setlocale(LC_ALL, NULL).
Since that doesn't allow to call setlocale(LC_ALL, NULL) in a
thread-safe manner, so locking in gnulib is required.
And here's the problem...
The lock is decorated as dllexport when building for Cygwin. This
collides with the default behaviour of ld to export all symbols.
If it finds one decorated symbol, it will only export this symbol
to the DLL import lib.
Change setlocale so that it writes the global string array
holding the LC_ALL value at the time the locale gets changed.
On setlocale(LC_ALL, NULL), just return the pointer to the
global LC_ALL string array, just as in GLibc. The burden of
doing so is negligibly for all targets, but adds thread-safety
for gnulib's setlocal_null_r() function, and gnulib can drop
the lock entirely when building for Cygwin.
When compiling Newlib for arm targets with GCC 12.1 onward, the
passing of architecture extension information to the assembler is
automatic, making the use of .fpu and .arch_extension directives
in assembly files redundant.
With older versions of GCC, however, these directives must be
hard-coded into the `arm/setjmp.S' file to allow the assembly of
instructions concerning the storage and subsequent reloading of the
floating point registers to/from the jump buffer, respectively.
This patch conditionally adds the `.fpu vfpxd' and `.arch_extension
mve' directives based on compile-time preprocessor macros concerning
GCC version and target architectural features, such that both the
assembly and linking of setjmp.S succeeds for older versions of
Newlib.
dumper: avoid linker problem when `libbfd` depends on `libsframe`
A recent binutils version introduced `libsframe` and made it a
dependency of `libbfd`. This caused a linker problem in the MSYS2
project, and once Cygwin upgrades to that binutils version it would
cause the same problems there.
Let's preemptively detect the presence of `libsframe` and if detected,
link to it in addition to `libbfd`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Corinna Vinschen [Sun, 29 Jan 2023 13:13:25 +0000 (14:13 +0100)]
Cygwin: cygcheck: package info / available package search, take 2
- if the user has no perms to write to /etc/setup, don't try
to fetch user homedir from Cygwin (crashes galore). Use
LOCALAPPDATA path instead.
- info is more rpm like
- print info of installed package
- added info selectors --inst, --curr, --prev, --test
- add installation date
TODO:
- Human-readable filesize
- url and license needs to be added to setup.ini yet
-