coding neutral regexec to do UTF-8 ranges

Florian Weimer fweimer@redhat.com
Sat Jun 24 15:17:00 GMT 2017


On 06/23/2017 11:50 AM, Joël Krähemann wrote:
> My application is not able to apply any gettext translations. Here is
> a sample of such an expression used by my application:
> 
> static const char *chars_pattern =
> "^(([0-9])|(\xC2\xB7)|((\xCC[\x80-\xBF])|(\xCD[\x80-\xAF]))|((\xE2\x80\xBF)|(\xE2\x81\x80)))";
> 
> The situation now is as using any UTF-8 encoding on my system. The
> expression above causes program failure. Since it does interpret the
> ranges as multi-byte sequence. What is definitely wrong in this
> situation.

I think you need to switch to a POSIX/C locale using uselocale because
evidently, the string you want to match is nothing close to UTF-8.

Thanks,
Florian



More information about the Libc-help mailing list