Using a regular expression range like [C-a] works fine if compiled with regcomp() with just the REG_EXTENDED flag, but if the REG_ICASE flag is added too, regcomp() returns an error "Invalid range end". Testing other ranges with REG_ICASE reveals: [A-Z^-z] is invalid: Invalid range end (11) [A-Z^_`a-z] is ok [C-a] is invalid: Invalid range end (11) [C-f] is ok [_-a] is invalid: Invalid range end (11) [<-a] is ok [z-{] is ok It appears that regcomp() is capitalizing the range if the REG_ICASE flag is used, thus [C-a] becomes [C-A] and since A comes before C, the range is invalid. Likewise, in locales that match ASCII, ^ becomes before z, but after Z, so [A-Z^-z] becomes invalid, and _ comes after A but before a, so [_-a] becomes invalid. If this is not considered a bug, then at the very least, the regex(3) man page should note the side-effects of using REG_ICASE.
Created attachment 4004 [details] test case
Note [C-a] is invalid anyway: $ sed -n '/[C-a]/p' /dev/null sed: -e expression #1, char 7: Invalid range end However [c-A] is not and shows the bug: $ sed -n '/[c-A]/p' /dev/null $ sed -n '/[c-A]/I p' sed: -e expression #1, char 9: Invalid range end
In which locale? In the POSIX locale with an ASCII (or similar) encoding, [C-a] is well defined: $ LC_ALL=C sed -n '/[C-a]/p' /dev/null $ LC_ALL=en_US.UTF-8 sed -n '/[C-a]/p' /dev/null sed: -e expression #1, char 7: Invalid range end And since range expressions are only well-defined in the POSIX locale, the point still remains that the case-insensitive flag is messing things up: $ LC_ALL=C sed -n '/[C-a]/I p' /dev/null sed: -e expression #1, char 9: Invalid range end Also, the resolution of this bug should consider http://sources.redhat.com/bugzilla/show_bug.cgi?id=12045, which is unrelated to the REG_ICASE flag.
I was using LC_ALL=en_US.UTF-8 in comment #2.
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.