Combining characters such as U+0301 COMBINING ACUTE ACCENT are misclassified as punct, while this should be alpha. With glibc 2.37 under Debian/unstable, I get for this character: Property alnum : no Property alpha : no Property cntrl : no Property digit : no Property graph : yes Property lower : no Property print : yes Property punct : yes Property space : no Property upper : no Property xdigit: no This affects grep: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=27681 (where it is said that the bug is in the GNU libc). Corresponding Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=868654 (which was reported on 2017-07-17 and hasn't got any activity yet).
Isn't the larger issue here that it's reasonable to expect that [[:alpha:]] matches a single letter as perceived by the user: an entire grapheme cluster comprising the base character(s), its associated combining characters and other marks. We do not implement any of that in glibc, and there are no plans to do so.
(In reply to Florian Weimer from comment #1) > Isn't the larger issue here that it's reasonable to expect that [[:alpha:]] > matches a single letter as perceived by the user: an entire grapheme cluster > comprising the base character(s), its associated combining characters and > other marks. [...] I don't think so. The functions iswctype(), iswalpha(), etc. take a single code-point (type wint_t), and the regex(7) man page says: Within a bracket expression, the name of a character class enclosed in "[:" and ":]" stands for the list of all characters belonging to that class. Standard character class names are: alnum digit punct alpha graph space blank lower upper cntrl print xdigit These stand for the character classes defined in wctype(3). [...] so that it is expected that [[:alpha:]] matches a single character, like the above functions.