This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Unicode Regex



Hello,

I understand that people are interested in upgrading regex to work w/
Unicode.  I would like to suggest that you consider the most excellent
preg library from Larry Wall found in Perl 5.6.0.  It has full Unicode
support and is an extremely powerfule regex library.  Sizewise, at least
on my system, the non-Unicode version is only about 10% bigger than the
regular regex library by Henry Spencer.

The regex engine for Perl supports not only UNICODE but locale for case
mapping and character ranges.  From what I can tell from
http://www.perl.com/pub/doc/manual/html/pod/perlre.html and from using
preg myself, it is Version 8 RegEx compatible.  Also, it is
cross-platform and, being so widely used, is heavily tested.

The nice thing about including this library is that it is very fast and
has much expanded functionality -- it really is for me an indispensable
tool for text parsing.  Once of the compile flags could be 'PCRE' for
the Perl extended mode.  As I understand it PCRE is a superset of POSIX
regex and although it might take a bit of twiddling to get it to conform
fully to POSIX 1003.2 it certainly would appear to be easier than
starting from scratch.

As I am not subscribed to this list, please CC: me on any responses.

Ciao,

Andreas Pour

http://www.kde.com/ :  Everything KDE
http://apps.kde.com/:  The Latest in KDE Applications

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]