Summary: | fnmatch("??") matches on one two byte valid character (as well as two any-length characters) | ||
---|---|---|---|
Product: | glibc | Reporter: | Stephane Chazelas <stephane+sourceware> |
Component: | glob | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | UNCONFIRMED --- | ||
Severity: | normal | Flags: | stephane+sourceware:
security?
|
Priority: | P2 | ||
Version: | 2.34 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: |
Description
Stephane Chazelas
2023-11-18 10:03:11 UTC
(In reply to Stephane Chazelas from comment #0) > Regression introduced in 2.34 by commit > a79328c745219dcb395070cdcd3be065a8347f24 reproduced on Ubuntu 22.04, Debian > sid libc6:amd64 2.37-12, and current git HEAD > (dae3cf4134d476a4b4ef86fd7012231d6436c15e) built on that sid system. > > find . -name '??' [...] To clarify, "find" is used here to demonstrate the behaviour of the libc's fnmatch(). Here with GNU find. $ LD_DEBUG=bindings find é -name '??' |& grep fnmatch 323895: binding file /lib/x86_64-linux-gnu/libselinux.so.1 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `fnmatch' [GLIBC_2.2.5] 323895: binding file find [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `fnmatch' [GLIBC_2.2.5] $ ltrace -e 'fnmatch' find é éé $'\U10FFFF\U10FFFF' -name '??' find->fnmatch("foo", "foo", 0) = 0 find->fnmatch("Foo", "foo", 0) = 1 find->fnmatch("Foo", "foo", 16) = 0 find->fnmatch("??", "\303\251", 0) = 0 é find->fnmatch("??", "\303\251\303\251", 0) = 0 éé find->fnmatch("??", "\364\217\277\277\364\217\277\277", 0) = 0 ?? +++ exited (status 0) +++ (here showing ?? matching one 2-byte character, two 2-byte characters and two 4-byte characters). Can likely be considered a security issue as that means patterns match things that where not intended be matched (I'll let you guys decide on that), but on the other hand that bug works around long-standing issues whereby for instance find . ! -name '*evil*' -exec ... {} + was failing to exclude file names containing "evil" when what's on either side is not valid text in the users locale (a common issue these days where UTF-8 is the norm). Though of course falling back to treating both pattern and subject as char[] arrays when the subject cannot be decoded as text like bash does (and what might have been the intent of a79328c745219dcb395070cdcd3be065a8347f24) is incorrect (see https://lists.gnu.org/archive/html/bug-bash/2021-02/msg00054.html for more details). |