Bug 17070 - regcomp with REG_EXTENDED uses unbounded CPU or RAM
Summary: regcomp with REG_EXTENDED uses unbounded CPU or RAM
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: regex (show other bugs)
Version: 2.20
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-19 07:45 UTC by Kostya Serebryany
Modified: 2015-02-24 12:40 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kostya Serebryany 2014-06-19 07:45:38 UTC
[not sure how useful these reports are, but filing just in case.]


#include <regex.h>                                                                                
int main(int argc, char **argv) {                                                                 
  regex_t r;                                                                                      
  regcomp(&r,                                                                                     
#if 1          
"([\\u]\\N|||){85,}[:ascii:]l[:(?!graph:]x?x)",
#else
"[(?{x<]})x{146}{,78}{,154}{,211}\\P{(?>^Latin}"
"x\\w\\p{^So}\\P{Alphabetic}[:punct:]\\P{^Mc}xxx)"
"[:alnum:]{-9,}[:blankcntrl:][:upperword:][:punct:]\\e",
#endif
          REG_EXTENDED);                                                                          
  regfree(&r);                                                                                    
}                                    

% gcc r1.c && ./a.out

The first pattern just never ends, most of the time is spent 
in deep recursive call to calc_eclosure_iter

The second case is much worse -- it quickly eats all available RAM on the machine,
doing tons of allocations here: 
#1  0x00007ffff7a9cf95 in __GI___libc_malloc (bytes=968) at malloc.c:2924
#2  0x00007ffff7af1e3b in create_token_tree 
#3  duplicate_tree 
#4  0x00007ffff7af7f6f in parse_dup_op 
#5  parse_expression
#6  0x00007ffff7af6470 in parse_branch
#7  0x00007ffff7af67be in parse_reg_exp
#8  0x00007ffff7af6cc0 in parse 
#9  re_compile_internal 


Checked with 2.15 and fresh trunk, tests were generated by regfuzz
Comment 1 Rich Felker 2014-06-19 15:05:07 UTC
I'm guessing -9 got interpreted as (size_t)-9, in which case that many states are really needed in the compiled regex. This is why POSIX allows implementations to place a limit on the number of repetitions supported in {n,m} (IIRC only up to 256 need to be supported) and why glibc should make use of this allowance.
Comment 2 Florian Weimer 2015-02-24 12:40:29 UTC
Per our Security Exceptions, this is not a security bug:

https://sourceware.org/glibc/wiki/Security%20Exceptions

Also see the relatted bugs.