character class "alpha"
Bruno Haible
bruno@clisp.org
Mon Jul 31 21:37:08 GMT 2023
Brian Inglis wrote:
> It seems to me that most application developers needing to support
> non-Western-European languages might want a non-POSIX interpretation of digits.
Sure. GNU libunistring has dedicated API for this:
- https://www.gnu.org/software/libunistring/manual/html_node/Object-oriented-API.html
UC_DECIMAL_DIGIT_NUMBER.
- https://www.gnu.org/software/libunistring/manual/html_node/Decimal-digit-value.html
- https://www.gnu.org/software/libunistring/manual/html_node/Digit-value.html
- https://www.gnu.org/software/libunistring/manual/html_node/Properties-as-objects.html
UC_PROPERTY_DECIMAL_DIGIT
- https://www.gnu.org/software/libunistring/manual/html_node/Properties-as-functions.html
uc_is_property_decimal_digit
I'm sure ICU4C has similar APIs too.
> Are the Unicode character attribute classes supported for those application use
> cases that need more than POSIX limitations allow?
POSIX allows the libc to define additional character classes. But these will be
platform and locale dependent, and I don't know of any application which makes
use of such additional character classes via wctype() and iswctype().
> I know that I sometimes want to see some alternative numeric digit forms and
> expect to be able to find those with an appropriate grep expression.
I think you can do so with GNU 'grep', when it was built with PCRE support.
PCRE includes support for Unicode character classes.
<https://www.pcre.org/current/doc/html/pcre2pattern.html>
Bruno
More information about the Cygwin
mailing list