Bug 17070

Summary: regcomp with REG_EXTENDED uses unbounded CPU or RAM
Product: glibc Reporter: Kostya Serebryany <konstantin.s.serebryany>
Component: regexAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: bugdal, drepper.fsp, fweimer
Priority: P2 Flags: fweimer: security-
Version: 2.20   
Target Milestone: ---   
See Also: https://sourceware.org/bugzilla/show_bug.cgi?id=12896
https://sourceware.org/bugzilla/show_bug.cgi?id=18013
Host: Target:
Build: Last reconfirmed:

Description Kostya Serebryany 2014-06-19 07:45:38 UTC
[not sure how useful these reports are, but filing just in case.]


#include <regex.h>                                                                                
int main(int argc, char **argv) {                                                                 
  regex_t r;                                                                                      
  regcomp(&r,                                                                                     
#if 1          
"([\\u]\\N|||){85,}[:ascii:]l[:(?!graph:]x?x)",
#else
"[(?{x<]})x{146}{,78}{,154}{,211}\\P{(?>^Latin}"
"x\\w\\p{^So}\\P{Alphabetic}[:punct:]\\P{^Mc}xxx)"
"[:alnum:]{-9,}[:blankcntrl:][:upperword:][:punct:]\\e",
#endif
          REG_EXTENDED);                                                                          
  regfree(&r);                                                                                    
}                                    

% gcc r1.c && ./a.out

The first pattern just never ends, most of the time is spent 
in deep recursive call to calc_eclosure_iter

The second case is much worse -- it quickly eats all available RAM on the machine,
doing tons of allocations here: 
#1  0x00007ffff7a9cf95 in __GI___libc_malloc (bytes=968) at malloc.c:2924
#2  0x00007ffff7af1e3b in create_token_tree 
#3  duplicate_tree 
#4  0x00007ffff7af7f6f in parse_dup_op 
#5  parse_expression
#6  0x00007ffff7af6470 in parse_branch
#7  0x00007ffff7af67be in parse_reg_exp
#8  0x00007ffff7af6cc0 in parse 
#9  re_compile_internal 


Checked with 2.15 and fresh trunk, tests were generated by regfuzz
Comment 1 Rich Felker 2014-06-19 15:05:07 UTC
I'm guessing -9 got interpreted as (size_t)-9, in which case that many states are really needed in the compiled regex. This is why POSIX allows implementations to place a limit on the number of repetitions supported in {n,m} (IIRC only up to 256 need to be supported) and why glibc should make use of this allowance.
Comment 2 Florian Weimer 2015-02-24 12:40:29 UTC
Per our Security Exceptions, this is not a security bug:

https://sourceware.org/glibc/wiki/Security%20Exceptions

Also see the relatted bugs.